Commit Graph

21 Commits

Author SHA1 Message Date
marfrit 0700dce881 repl: enforce budget per Norris step, not just post-loop (closes #51)
PHASE3.md §2 specifies sliding-window eviction "including mid-Norris-
session if the loop runs long". Implementation only called
enforce_budget() once, after the planning loop exited — so for a tight
max_turns with a multi-step Norris session the model saw the FULL
conversation throughout, defeating context budgeting and preventing
R-C3 (NORRIS suffix goal anchor surviving eviction) from being
exercised end-to-end.

Move status_evictions(ctx:enforce_budget()) inside the while loop so
it runs after every safety.norris_step return. Drop the now-redundant
post-loop call.

Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the
"oldest 4 turns evicted" status arrived AFTER NORRIS DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:34 +00:00
marfrit 0c93e31186 repl: warn on stale MCP auto_approve keys (closes #33)
Auto-approve policy keys that point at unconnected aliases, mistyped
tool names, or malformed forms were silently ignored — leaving the user
with surprise confirm prompts and no diagnostic.

validate_auto_approve() now walks config.mcp.auto_approve at startup
(after the MCP connect loop) and after each :mcp connect. For each key:

  - "alias__*"       — warn if alias has no live session
  - "alias__tool"    — warn if alias unknown OR tool not in registry
  - anything else    — warn as malformed (not in alias__tool form)

Non-fatal. The re-run on :mcp connect lets a key that referenced a
not-yet-connected alias become live without a restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:08 +00:00
marfrit 299dcce78f repl: validate MCP tool names against Bedrock regex (closes #32)
Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names.
We already moved the alias separator from "." to "__" (commit f26cbd9),
but a future MCP server could still register a tool whose name (or whose
combination with the alias) contains characters outside that class —
silently breaking calls to strict providers.

connect_mcp now warns at startup for:
  - aliases containing "__" (would misparse on tool dispatch)
  - emitted alias__name strings that violate the regex or exceed 128 chars

Behavior preserved: validation is informative-only. tools_schema() still
emits the offending tool; local llama.cpp users accept lenient names
and shouldn't be penalized for downstream strictness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:04:29 +00:00
marfrit 8e0e735e15 repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT)
Surfaced by autonomous run of TC #48: pointing models.fast at
http://localhost:9999 (port closed, host resolves) emits
"transport: Could not connect to server" — CURLE_COULDNT_CONNECT
(7) which the Phase 5 fallback pattern set didn't include.

Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua.
Now fallback fires for the full set of common libcurl/HTTP transport
failure shapes:

  HTTP 5xx              server-side
  HTTP 404 model_not_found
  HTTP 408              gateway request timeout
  Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST
  Could not connect to server   CURLE_COULDNT_CONNECT  (← added)
  Connection refused
  Timeout was reached   CURLE_OPERATION_TIMEDOUT (variant A)
  Operation timed out   CURLE_OPERATION_TIMEDOUT (variant B)

Re-tested #48 end-to-end:
  fast pointed at dead port → fast fails → status fires →
  cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:49:13 +00:00
marfrit 40ea0b49b0 repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3)
Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5
machinery into the REPL.

make_summarize_fn():
  Returns a closure that maps (prior_summary, evicted_turns) onto a
  broker.chat call against cfg.context.summarizer_model (default
  "fast"). Three dispatch paths matching the R-B1 callback contract:
    evicted == nil      → compress signal
    prior present       → additive ("extend the prior summary ...")
    prior nil           → first-time ("summarize the following turns")
  All use a system prompt enforcing "exactly one short paragraph",
  max_tokens=300, timeout_ms=30000. Broker failure returns nil so
  Context falls back to silent eviction. Renderer status is logged
  on failure for visibility.

Context construction:
  Build ctx_opts as a fresh table (copies config.context to avoid
  mutating it), adds summarize_fn ONLY when
  config.context.summarize_on_evict == true. Defaults stay OFF —
  Phase 4 regression coverage.

Fallback machinery:
  - FALLBACK_PATTERNS table with 7 transport-error signatures
    (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused,
    "Timeout was reached", "Operation timed out")
  - fallback_reason(err) strips the "transport: " prefix and matches.
  - should_fallback(err) gates on cfg.routing.fallback.
  - call_broker(cfg, name, msgs, on_delta, opts) wraps
    broker.chat_stream:
      • tracks any_delta via wrapped on_delta callback
      • retries ONCE against cfg.routing.fallback_model (default
        "cloud") when err matches AND no deltas arrived (N3:
        mid-stream failures aren't retried — partial text would
        duplicate)
      • emits "[aish] local <name> failed (<reason>); retrying via
        <fb>" status before the retry call

ask_ai routing:
  - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg
    locals carry the choice through every tool-sub-loop iteration.
  - active_name/active_cfg are NOT mutated — user's :model selection
    survives the request.
  - When config.routing.auto is true, classify_model(text, config) is
    invoked. Non-nil model + non-active → swap req_cfg + status line.
  - broker.chat_stream call replaced with call_broker (fallback wrap).

Meta cmds:
  :route on/off           — toggle cfg.routing.auto at runtime
  :route classes          — show class → model mapping
  :route check <text>     — report classify_model result with
                            "(routing currently disabled)" suffix when
                            auto is off (N1)
  :fallback on/off        — toggle cfg.routing.fallback at runtime

HELP updated with the four new commands.

Smoke-tested: aish boots, all four metas behave correctly, classify_model
returns reasoning class for "Explain how MMAP works on Linux" (the model
slot is nil because no classes are configured by default — N2 cost-safety).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:31:14 +00:00
marfrit f22d21d754 repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4)
Phase 4 commit #4 per docs/PHASE4.md §6.

:memory summarize:
  1. Source-of-truth: session log file via history.load(session_path),
     NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize"
     so prior summarize exchanges don't self-amplify across multiple
     calls within the same session.
  2. Pick summarizer model from cfg.memory.summarizer_model (default
     active model).
  3. Build a transcript string ("role: content" per turn, 800 chars max
     per turn) and feed it as a single user turn alongside a system
     instruction asking for "(fact|pref|context): <content>" lines.
  4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep
     model can take a while; we don't want a 15s probe-cap here).
  5. Log the response as an assistant turn with meta="summarize" so the
     next :memory summarize call filters it out.
  6. Parse response lines tolerating markdown bullets and bold markup:
     ^%s*[-*]?%s*[*_]*(fact|pref|context)[*_]*:%s*(.+)$
  7. Per-candidate prompt: y / N / edit.
       y    → memory:add(kind, content)
       edit → readline prompt for replacement text
       any other → drop
  8. status: "summarize: added N / M candidates".

Live-tested against hossenfelder/fast:
  Pipeline correct end-to-end. Model emitted one candidate; user
  confirmation prompt fired; item persisted; :memory list showed it.
  Candidate quality from the 1.5B model is poor — typical
  small-model behavior; deep/cloud models would do better but this
  isn't an aish bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:53:36 +00:00
marfrit 3b074afaee repl: memory handle + :remember + :memory meta (Phase 4 commit #3)
Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring.

Startup:
  - Opens memory handle at <history.dir>/memory.jsonl via
    history.open_memory(). Status-logs failure (e.g. flock held by
    another aish) and continues without memory.
  - inject_memory(): loads via history.load_memory(), truncates by
    cfg.memory.inject_max_chars (default 2000), populates
    ctx.memory_items. Status line announces N items injected.
  - shutdown_session() now also closes memory (releases flock).

Meta commands:
  :remember <text>       — shortcut for :memory add fact <text>;
                            auto-refreshes ctx.memory_items so the
                            next AI turn sees the new item without
                            restart
  :memory list           — show id / ts / kind / content (truncated
                            at 80 chars per line)
  :memory add <kind> <t> — fact|pref|context required; rejects other
                            kinds
  :memory forget <id>    — N1: checks active-set first, surfaces
                            "id N not active (already forgotten or
                            never existed)" without appending if
                            the id isn't live
  :memory clear          — [y/N] confirm prompt; tombstones every
                            active item
  :memory inject         — N4: reload memory.jsonl into
                            ctx.memory_items, replacing existing.
                            Useful after manual file edits.

Help block extended with the new commands.

End-to-end verified:
  Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all
           three with timestamps
  Boot 2 → memory: 3 items injected (startup status); :memory list
           same three; ctx.turns empty (history is sessions/, memory
           is separate)
  Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not
           active" status without writing a tombstone; :memory list
           shows 2 items; :memory clear → confirm prompt → "cleared 2
           items"; :memory list → "(no memory items)"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 05:11:48 +00:00
marfrit a404b2a152 repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5)
Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step
(commit #4) into the REPL with the user-facing surface.

ffi/readline.lua extensions (A1 + R-C4):
  - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text
    and M.redisplay wrappers exposed.
  - M.bind: removed `:free()` on previous callback. Now keeps every
    bound callback pinned for process lifetime in `_pinned` list
    (alongside `_bound[seq]` for current lookup). Avoids the
    use-after-free window between unbind and rebind that R-C4 flagged.
    Memory cost is bounded — one closure per key sequence binding.

context.lua Norris suffix (R-C3 / §8):
  - to_messages() composes a dynamic NORRIS MODE block onto the
    system prompt when ctx.norris_active is set. The block carries
    ctx.norris_goal so eviction of the user's "[norris] goal:" turn
    doesn't lose the anchor. Returns to plain system prompt when
    Norris exits.

repl.lua Norris driver:
  - prompt() now shows  marker when ctx.norris_active per PHASE0.md §9.
  - \C-n bound to a real handler — inserts ":norris " at the cursor
    (replaces Phase 1 status placeholder).
  - run_norris(goal) function: sets norris_active + norris_goal,
    appends a "[norris] <goal>" user turn, renders the banner, then
    loops calling safety.norris_step with an injected helpers table
    until a terminal status returns. Renders the closing banner.
  - norris_halt(): the [N] proceed/skip/abort prompt called by
    safety.norris_step via helpers.halt. Empty input → abort (safe).
  - dispatch_tool(): factored from the Phase 2 ask_ai code so
    safety.norris_step can call it.
  - norris_exec(): factored exec path for autonomous mode (skips
    the interactive run_shell cd-status renderer).
  - :norris <goal>  meta — launches autonomous mode
  - :norris off     meta — drops Norris flag (rare; usually 'abort')
  - :safety patterns meta — lists active is_destructive rules
  - :safety check <cmd> meta — probes a hypothetical command

End-to-end mock-driven test:
  Submitted ":norris find files in /tmp" → banner → step 1 emits
  tool_call (auto_approved per policy) → dispatched → frame rendered
  → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner.
  2 broker invocations, no stalls.

config.lua safety example block lands in commit #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:42:14 +00:00
marfrit f26cbd9a3a phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics
Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
marfrit 3fa6279f5b repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors
Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no
dot) printed "unknown alias: nil" instead of a useful diagnostic. The
parse failure was being conflated with the alias-not-found case.

Now:
  :mcp tool list_dir          -> tool name missing alias prefix: list_dir
  :mcp tool unknown_alias.x   -> unknown alias: unknown_alias
  :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:55:01 +00:00
marfrit 7e9cfff04d repl: tool-call sub-loop + :mcp meta + system-prompt augmentation
Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP
tool-call flow on top of broker/safety/context/renderer/mcp.

repl.lua additions:
  - mcp_sessions table populated from config.mcp.servers at startup.
    connect_mcp() helper does initialize + caches tools/list. Failures
    status-logged once; absent from mcp_sessions until manual reconnect
    (C4 — no auto-retry).
  - tools_schema() flattens connected sessions' tools into the OpenAI
    {type:"function", function:{name,description,parameters}} shape with
    "<alias>.<name>" namespacing.
  - flatten_content() concatenates content[type="text"] blocks; one-shot
    status warning when non-text blocks (image/resource) are dropped
    (§4 normative spec, v1 only handles text).
  - dispatch_tool_call(name, args_table) splits alias.tool, looks up
    session, calls. Returns (content_string, is_error). Errors of every
    flavor (missing alias, no session, rpc_error, transport_error)
    yield a synthesized "[aish] ..." string so callers always have a
    body for the role:"tool" turn — alternation preserved per C5/C7.
  - ask_ai rewritten as a sub-loop that re-issues the broker request
    until the model returns pure text or max_tool_depth (default 8) is
    hit. Each iteration: stream response → if tool_calls present,
    confirm-gate each → dispatch → append role:"tool" turn → continue.
    Argument-JSON parse failure produces a synthesized tool turn (C7).
    Decline at confirm produces "[aish] tool call declined by user"
    tool turn (alternation guarantee).
  - :mcp meta with sub-commands: list / tools / tool <a.n> / connect
    <url> [alias] / disconnect <alias>. HELP block extended.

context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8
(hybrid prompt: static frame about MCP + dynamic tools list in the
request body). Block is always present even when no MCP servers
configured — ~60 tokens for clarity that 'CMD:' remains the fallback.

CMD: extraction unchanged — runs on the FINAL pure-text response only
(not on intermediate iterations of the tool sub-loop). Substrate §3
invariant preserved.

End-to-end verified two ways:
  (1) Direct broker probe: aish's tools_schema fed through
      broker.chat_stream against hossenfelder → qwen-1.5b emits one
      tool_call payload with correct id + name="boltzmann.list_dir"
      + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string
      across fragmented deltas.
  (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock
      emits text + tool_call, sub-loop dispatches against LIVE
      boltzmann lmcp (auto_approve via policy), 80+ files rendered
      inside the tool_call frame, broker re-invoked with the
      extended context, mock returns pure text, sub-loop terminates.
      Total broker invocations: 2.

Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..."
suggestions even when an MCP tool is the better path; the small
model's system-prompt compliance is weak. Larger models and the
analyze-time direct probe confirm the tools_schema and tool_calls
flow is wire-correct — Phase 7 verify will exercise this against
qwen3-30b or cloud models when available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:20:42 +00:00
marfrit efdc7281c7 broker: opts.tools passthrough + streaming tool_call accumulator
Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows
tool-call support without taking a dependency on mcp.lua (caller
supplies the tools array — B5 from review).

chat_stream signature widens to (cfg, msgs, on_delta, opts):
  opts.tools  - optional array, passed to the request body as the
                OpenAI-shape tools field. OMITTED entirely when nil or
                empty (#tools == 0) — some servers reject "tools": [].

on_delta callback shape widens to (kind, payload):
  kind = "text",      payload = string         (Phase 1 path; unchanged
                                                semantics, signature
                                                changes from (delta) to
                                                ("text", delta))
  kind = "tool_call", payload = {id, name, arguments}
                                                emitted ONCE per call on
                                                finish_reason "tool_calls"
                                                after the streaming
                                                accumulator pulls
                                                fragmented JSON-string
                                                arguments together.

Accumulator behavior:
  - Keyed by delta.tool_calls[i].index.
  - If index is absent on a delta (some llama.cpp builds omit it on
    single-call streams; C2 in review), default to 0 with a one-shot
    stderr debug status per stream.
  - id and name captured from the opening delta of each slot.
  - function.arguments concatenated across all deltas as the raw
    JSON-string; caller (repl.lua / future Phase 2 commit #6) does
    dkjson.decode.
  - On finish_reason "tool_calls" the accumulator emits all collected
    calls in index order and resets.

M.chat external contract unchanged (C1): wrapper now uses the new
(kind, payload) shape internally but exposes the same text-string
return. No caller of M.chat passes opts.tools so tool_call kinds are
silently dropped.

repl.lua minimal companion edit: ask_ai's chat_stream callback updated
to the new shape. Text path unchanged; tool_call kinds are no-op
placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming
functional between #5 and #6.

Smoke-tested against hossenfelder/8082 (post-#23 fix):
  - text-only: ok=true, kind="text" deltas received
  - with opts.tools: model emitted one tool_call,
    accumulator collected id + name=get_weather + args={"city":"Paris"}
    correctly across fragmented deltas
  - opts.tools={}: server accepted (field omitted as required)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:20:32 +00:00
marfrit 7d62eb5659 review followups: pcall shield, :resume guard, shell quoting, nits
CONCERNs from the Phase 1 review pass:

ffi/curl.lua:
  - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or
    in the parse loop itself) is captured into cb_error and surfaced
    after curl_easy_perform rather than propagating across the FFI
    callback boundary (which LuaJIT documents as process-fatal). The
    EOS flush path gets the same shield. Errors return
    (nil, "callback: <msg>") from post_sse.

history.lua:
  - sh_singlequote() escapes shell metacharacters; the mkdir -p and
    ls -1 shell-outs no longer double-quote (where $(...) and $VAR
    still expand) — single-quote with embedded-' escaping is the
    safe form.
  - M.load now returns (turns, meta) instead of (meta, turns). turns
    is ALWAYS a table on success, never nil-when-no-header; failure
    path is the unambiguous (nil, err). Callers can `if not turns
    then` without the previous ambiguity. repl.lua :resume updated
    to the new shape.

repl.lua :resume:
  - Refuse to resume into a non-empty ctx — silent overwrite was the
    Q15 default, but the review surfaced the no-undo / no-warning
    failure mode. User must :reset (or :save then re-launch) to
    express intent. The current session's on-disk log is unaffected
    either way.

NITs:
  - ffi/libc.lua READ_BUF: comment noting it's module-shared and
    Phase 1 has no reentrant readers; revisit when that changes.
  - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred
    from Phase 1 — no consumer here") rather than the previous
    dangling "(or here)".

Regression suite verifies:
  - history.load new signature on success + failure paths
  - shell-quoted history.dir with $ doesn't trip
  - aish scripted run: ctx with 2 turns refuses :resume anchor with
    a clear status; user must :reset first

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:05:23 +00:00
marfrit 1f1065157e review BLOCKER: PTY input forwarding + raw mode toggle
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).

ffi/libc:
  - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
  - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
    (nil, err) when fd isn't a tty (scripted / piped-stdin runs)
  - M.restore_termios(fd, saved)
  - struct pollfd + M.poll (POLLIN constant)

executor:
  - multiplex(sess): poll(stdin, master); reads master on any revents
    (POLLHUP fires when child closes its slave end, not POLLIN — the
    revents != 0 check catches both); forwards stdin keystrokes to
    master; loop exits when master read returns 0 (EOF / child gone)
  - stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
    piped-stdin runs (tests / scripted) would otherwise drain queued
    aish commands into the child of the *current* cmd, swallowing them
  - raw mode is restored before returning so the user lands back at the
    aish prompt in canonical mode

renderer + repl:
  - exec_output(out, code) split into exec_begin() (top rule, before
    spawn) + exec_end(code) (closing rule with exit, after wait). PTY
    multiplex streams the body live to stdout in between; the renderer
    never re-prints the body.

PHASE1.md §3:
  - tcgetattr/tcsetattr changed from "optional" to "required for
    single-key UIs to work — done-criteria #2"; poll added to the libc
    row description.

Verified:
  - non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
    printf multi-line) — all exit codes correct, output streamed live,
    a\nb\nc\n preserved byte-for-byte
  - scripted-stdin run reaches all expected lines (no stdin draining
    into a non-interactive child)
  - aish prompt + framed exec block + exit-code line all render in
    correct order

Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:00:53 +00:00
marfrit a75118b2ae readline: bind() via rl_bind_keyseq; repl reserves \C-n no-op
Phase 1 readline binding wiring per PHASE1.md §7.

ffi/readline:
  M.bind(seq, lua_fn) -> bool
    Wraps lua_fn as a C callback (signature `int (int, int)` per
    readline's rl_command_func_t) and registers it via
    rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0).
    Trampolines are pinned in module-local state so they outlive the
    bind call — readline retains the function pointer for the process
    lifetime. Rebinding the same seq frees the previous trampoline.
    Bound handlers are pcall-wrapped so a Lua error doesn't crash
    readline's input loop.

repl:
  Binds \C-n to a no-op that emits
    "[aish] Norris mode not yet implemented (Phase 3)"
  Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode)
  replaces the body with the actual toggle.

Smoke covers bind / rebind-same-seq (exercises the :free path) /
bind-different-seq with no errors. Live keyboard verification waits
on user-test.

Phase 1's 8(+1) inner loop is now functionally through `implement`;
next inner phase is `verify` (review pass) followed by memory-update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:26:58 +00:00
marfrit 9d586870e8 repl: session persistence wiring — auto-log, :save, :resume, :sessions
Phase 1 session log integration per PHASE1.md §6.

On every M.run(), open a session file at
  <config.history.dir>/sessions/<utc-iso8601>.jsonl
with a meta header (started, model, aish_version). If history.dir is
unset or unwritable, status-log the disable and continue without
persistence.

ask_ai logs the merged user turn (after pending exec output is folded
in) and the assistant turn (after streaming completes). run_shell does
NOT log [exec output] — that becomes part of the next user turn when
ctx.pending_exec_output is flushed.

New meta commands:
  :sessions       list session files; "*" marks the active one
  :save <name>    rename current session log to <name>.jsonl (auto-
                  appends .jsonl); reopens for continued append
  :resume <name>  load <name>.jsonl into ctx (replaces current turns
                  via ctx:reset + append loop). The current process's
                  own session log is unaffected — Phase 1 chooses
                  per-process logs over chained continuations.

:quit and EOF (Ctrl-D) both close the session file via shutdown_session
before exiting.

HELP text updated (no longer "Phase 0:" header since meta set has
grown). Q15 noted in PHASE1.md §10 (resume into non-empty context) is
resolved by the ctx:reset() in :resume — silent overwrite for Phase 1,
revisit if anyone cares.

End-to-end live verified: chat -> auto-log; :save renames; :sessions
listings; :resume + :history shows the round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:23:05 +00:00
marfrit a722f576ac repl + renderer: streaming assistant output (Phase 1)
repl.ask_ai now drives broker.chat_stream and pumps each delta into
renderer.assistant_delta(delta) as it arrives. renderer.assistant_flush
is called when the stream ends to add a trailing newline if missing.
The full reassembled response is then handed to executor.extract_cmd_lines
for the CMD: confirm-and-execute path (unchanged from Phase 0).

renderer.assistant() is kept for non-streaming callers (none in tree
right now, but cheap to keep around). assistant_delta/flush share no
state with assistant(); they use a module-local stream_buf that tracks
the in-progress streamed block.

Q12 deferred: incremental CMD: highlighting (cursor-positioning re-
render on flush) is not implemented in Phase 1 — deltas emit raw. The
§6 CMD: marker is still extractable on the reassembled string post-
stream, which is what executor cares about. Renderer's bold+cyan
treatment for CMD: lines stays available via M.assistant().

Broker error / SSE-framed api-error path still pops the user turn and
restores ctx.pending_exec_output. Order: assistant_flush always runs
(even on error) so the cursor lands on a fresh line before the broker-
error status renders.

Live verification: `Count one to ten` against hossenfelder fast streams
deltas through to stdout incrementally; CMD: extraction works on the
reassembled string; confirm gate intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:17:27 +00:00
marfrit 16490e6905 fix: buffer exec output for next user turn; alternation for strict templates
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:

    api: ... Error: Jinja Exception: After the optional system message,
    conversation roles must alternate user/assistant/user/assistant/...

Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.

Fix:

context.lua:
  - new field `pending_exec_output` (initially nil)
  - new method `:append_exec_output(out)` buffers (concat on subsequent
    captures so multi-shell-then-ai still merges everything)
  - new method `:append_user(content)` flushes buffered exec output as
    a `[exec output]\n...\n\n` prefix and appends a user turn
  - `:reset()` also clears the buffer

repl.lua:
  - run_shell calls ctx:append_exec_output(out) instead of
    ctx:append({role="user", content="[exec output]\n"..out})
  - ask_ai calls ctx:append_user(text) instead of raw :append; saves
    prev_pending so a broker error can restore the buffer for retry

PHASE0.md §6:
  - amended the role-injection paragraph to describe the buffer-and-
    prepend policy; the §3 invariants list is untouched (this was a §6
    design detail, not a locked invariant)

Verification:
  - context unit tests cover: alternation after the failing sequence,
    multi-shell merge, reset clears buffer, broker-error retry path
  - live reproduction against `deep` (mistral-nemo) of the exact
    user-reported sequence succeeds; model responds with a sensible
    `CMD: ls | grep '^lor'` instead of a Jinja exception

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:41:21 +00:00
marfrit abc993aa49 review followup: empty-input guards, ~/ symmetry, CMD: filter
Addresses three concerns + one nit from the Phase 0 review pass.

executor.lua:
  - M.exec guards empty / whitespace-only cmd up front, returns
    "(empty command)" / -1 instead of running the wrapper on nothing.
  - On sentinel-parse failure with empty output (typical of shell
    parse errors — the syntax error itself escapes to the popen
    parent's stderr because 2>&1 is inside the unparsable subshell),
    surface "(no output — possible shell parse error)" rather than
    a silent empty frame.
  - extract_cmd_lines now skips whitespace-only / empty bodies; a
    bare `CMD: ` line in assistant output no longer turns into an
    "execute ''? [y/N]" prompt.
  - "what" comments cleaned in maybe_chdir.

router.lua:
  - path_like now matches `~` and `~/foo` so `~/scripts/build.sh`
    classifies as shell (was: ai). Restores symmetry with executor's
    maybe_chdir, which already expands `~` on `cd`.

repl.lua:
  - :exec and :ask trim args and renderer.status a usage line on
    empty rather than running an empty cmd / sending an empty turn
    to broker.

Regression: full prior smoke suite still passes — known_commands
shell paths, all maybe_chdir branches, CMD: extraction with non-empty
bodies, exec exit-code recovery, all router branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 17:41:35 +00:00
marfrit e0e69f839b repl: readline loop, dispatch, all Phase 0 meta commands
Phase 0 implementation per PHASE0.md §5, §9.

Wires the lower-half modules into a single REPL:
  ffi/readline -> input + history
  router       -> classify(line) -> meta/shell/ai
  executor     -> run_shell with cd interception, frame output, capture
  broker       -> ask_ai, then extract+confirm CMD: lines from response
  context      -> turn list + eviction; status line on evict
  renderer     -> assistant text + exec frame + status

Prompt format `[aish:<model>]> ` per §9.

Meta commands all wired (§5.2): :quit/:q, :clear, :reset, :model <name>,
:models, :history, :exec <cmd>, :ask <text>, :help. Unknown meta names
report via renderer.status rather than crashing.

End-of-input (Ctrl-D on empty line) breaks the loop cleanly. Empty /
whitespace-only lines are skipped silently before dispatch — router
would otherwise classify them as ai with empty payload and pollute
context.

`CMD: ` extraction + confirm-and-execute is wired: when broker returns
an assistant turn, the response is scanned for §6 CMD: lines; each is
prompted via readline ("execute '...'? [y/N]") when config.shell
.confirm_cmd is true (default), else auto-executed.

On broker error, the user turn just appended is popped so the context
isn't polluted with a turn that has no assistant response.

Smoke covers :help, :models, shell exec via known_commands allowlist,
and Ctrl-D break. Live broker exchange deferred per issue #12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:17:40 +00:00
claude-noether 4310207738 Phase 0: scaffold tree + manifest
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
  with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
  snappy/32k + cloud via OpenRouter through hossenfelder)

File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:16:07 +00:00