marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	76a8f97009	repl: cloud preplanner + local executor split for Norris (closes #89 ) Phase 10 C4 — the orchestration commit. Splits Norris autonomous mode into a one-shot cloud preplan + per-step local executor flow, with graceful fall-back to single-model Norris when preplan is disabled or fails. run_norris additions (in order): 1. R4 fix: clear ctx.norris_active/_goal/_tasks at the TOP so a prior crashed Norris can't leak stale state into the new launch. 2. Preplan block (gated on cfg.norris.preplanner): - Look up the preplanner preset in cfg.models; warn + skip if absent. - Build a system prompt asking for TASK: <imperative> lines (R1: %d via string.format — gsub("N", ...) would corrupt "No prose / commentary / numbering" to "16o prose"). - Scrub messages per the preplan model's redact policy; run broker.chat (non-streaming, per Q-PP2) with category "norris-preplan"; R7: respect pre_cfg.timeout_ms. - On success: rehydrate; record usage via _record_usage; extract_task_lines; cap to tasks_max; populate ctx.norris_tasks = { current = 1, list = parsed }. - On ANY failure (transport err / empty list / bogus preset): status log + leave ctx.norris_tasks nil → single-model fall-back. R3 design: NOT routed via call_broker; a fallback retry would silently swap planning models which is worse than a clean hard-fail. 3. Executor cfg resolution (independent of preplan per Q-PP1): cfg.norris.executor names a preset → executor_cfg = that cfg. Unset / missing preset → executor_cfg = active_cfg (existing :model-selection behavior). 4. Loop body: pass executor_cfg (not active_cfg) to safety.norris_step. After each "continue" result, advance ctx.norris_tasks.current. When current > #list, exit with synthesized status "tasks_complete" + reason "all N preplanned tasks executed". 5. Exit cleanup: clear ctx.norris_tasks alongside the existing norris_active/_goal clears so a re-launch starts fresh. renderer.norris_end gains "tasks_complete" as a non-error status (cyan, same as "done"). Distinct from "done" (executor said GOAL: complete) — executor exhausted the plan but didn't confirm goal, which is a clean exit, not an error. E2E verified (preplanner=fast, executor=fast on hossenfelder:8082): :norris print the date and the current uptime → preplanned 2 tasks via fast → ─ step 1/3 ─ Print the current date. → CMD: date → Sun May 17 ... → ─ step 2/3 ─ Print the current uptime. → CMD: uptime → ... up 1 day ... → NORRIS TASKS COMPLETE: all 2 preplanned tasks executed :cost detail correctly shows two rows for the same model: norris-preplan 1 calls, 95 / 12 tokens norris 1 calls, 364 / 9 tokens Fall-back verified: cfg.norris.preplanner = "doesnotexist" → "[aish] preplanner 'doesnotexist' is not in cfg.models; running single-model" → Norris runs as Phase 6. No-preplan path verified (no cfg.norris block): Norris runs exactly as Phase 6, no behavior change. Regression: 87/87 safety, 31/31 router_model, repl loads. Closes #89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:21:25 +00:00
marfrit	c55077bc07	context + repl + config: route-aware context compression (closes #87 ) Small local models effectively use a fraction of their advertised context window. Per-request compression for routes that hit a local-compress-flagged model preset: keeps only the last N turns and tail-truncates oversized content. Cloud routes get the full context unchanged. Changes: - context.lua _compress_turns(turns, keep, max_chars): returns a new list (self.turns NEVER mutated) with the last `keep` turns preserved + content tail-truncated to `max_chars`. Defensive: drops tool turns at the slice head (orphaned without their assistant-with-tool_calls anchor — strict chat templates would reject them; same gotcha PHASE0 §6 warned about for user/user). - Context:to_messages(opts) — opts.compress = { keep_turns, max_turn_chars } swaps the turn iterable for the compressed view. Affects BOTH the use_tool_role=true path and the use_tool_role=false fallback (PHASE2.md Q18 strict-template workaround). Persistence + display via :history see the full uncompressed ctx.turns. - repl.lua ask_ai: when req_cfg (the routed model's cfg) has `local_compress = true`, build compress_opts from config.context.compress (defaults keep_turns=2, max_turn_chars=800). Pass through ctx:to_messages alongside the existing system_prompt_override (#86) — orthogonal opts that compose. - Norris unaffected: safety.norris_step builds its own messages array; the planner needs full history per PHASE3 design. - config.lua gains a header comment explaining the per-model opt-in + the context.compress defaults block + the documented tool-turn truncation trade-off. 13 unit cases verified: - no opts -> full turn list (no regression) - keep_turns=2 -> exactly last 2 emitted - long content tail-truncated to max_chars - self.turns unchanged after render - orphan tool-turn at slice head dropped (no chat-template violation) - tool turn included WITH its assistant anchor when keep_turns >= 3 E2E against live local broker: - models.fast.local_compress = true; keep_turns=1; max=200 - 4-turn session: each broker call sees ONLY the current turn (verified by short coherent CMD replies despite no cross-turn memory available to the model). FR-promised small-model friendliness in action; conversation continuity is the documented trade-off. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:50:07 +00:00
marfrit	74e4bffb37	broker + repl + safety: GBNF grammar-sampling passthrough (closes #88 ) llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:00:36 +00:00
marfrit	047d629a66	context + repl + config: per-class system_prompt override (closes #86 ) Small local models follow precise structured instructions better than natural language. Per-routing-class system_prompt override gives them tighter instructions for THAT request while preserving ambient context. Changes: - Context:to_messages(opts) — opts.system_prompt_override REPLACES the base system_prompt for THIS render only (state unchanged). Dynamic blocks ([background], [project], [earlier summary], NORRIS suffix) still compose on top. opts is optional; nil-safe for old callers. - repl.lua ask_ai — captures req_class from router.classify_model (already returned by Phase 5; previously discarded after the status line). Looks up config.routing.system_prompts[req_class]; passes as opts.system_prompt_override to ctx:to_messages each iteration of the tool-sub-loop. - Gating: override fires only when routing.auto is on (no class -> no override). If system_prompts[class] absent for a class, fall through to the default system_prompt (no surprise). - Norris unaffected: safety.norris_step builds its own messages array; doesn't go through this path. - config.lua gains a commented-out example showing routing.system_ prompts with the code/default examples from the FR body. Smoke verified: - 12-case context.lua unit test: opts nil/absent/present, override replaces base, dynamic blocks still compose, state unchanged after call, Norris-mode coexistence (suffix still present; background still suppressed). - E2E against cloud broker with routing.system_prompts.code set: triple-backtick prompt -> code class -> override fires; model emits terse code-only output. Non-code prompt -> default class -> no override -> normal verbose-ish reply. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 05:41:15 +00:00
marfrit	5b6ee553db	repl: :config show meta + HELP (Phase 9 commit #3 ) User-facing diagnostic for the project-overlay layer. Reads config._sources (R3 cfg-embedded by main.lua's load_config_with_ overlay in commit #2) + the effective config; surfaces which file contributed each top-level key. :config show top-level keys + which source set each (nested tables collapsed to inner-key list) :config show full recursive dump with sensitive-key masking Masking heuristic (any key containing token/secret/auth/key, case-insensitive) -> "(set)" instead of the value. R6: applied RECURSIVELY in full mode so the actual leak vector (mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught. Defensive depth cap (5) prevents pathological recursion. When config._sources is absent (caller didn't go through load_config_with_overlay), status: "(unknown — main didn't pass _sources)" — meta still runs, just labels source as "?". N2 known cosmetic false-positive: `key_env` / `auth_env` config fields hold env-var NAMES (not secrets) but match the heuristic. Future polish exempts `*_env` patterns. Same for `token_budget` (contains "token") — also masked despite being a plain number. Acceptable; errs toward over-masking. HELP gains 1 :config line. E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: A. No project overlay: 6 user keys; nested tables collapsed. `secrets` masked as (set) at top level. B. Project overlay accepted: source map cleanly partitioned (user has 4 keys; project has 2 — default_model + models); each top-level row tagged [user] or [project]. C. :config show full: nested dump; auth_token in models.cloud correctly masked as (set); SECRET_VAL never appears in output (grep count = 0). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #4 next: config.lua template comment + PHASE9.md status header -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:30 +00:00
marfrit	94b7d86926	repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4 ) Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised :cost detail trailing line. Changes: - When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is set to `function(text) return broker.token_count(active_cfg, text) end` before Context.new fires. R4: the closure body references active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call time, so subsequent :model switches re-route to the new model's tokenizer automatically (verified by E2E: :model cloud after the fast call still produces clean estimate row). - :cost detail gains a trailing line per R3: estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used) N comes from ctx:estimate_tokens() (current in-memory snapshot, NOT a comparison against the accumulator sum above which is cumulative across calls + evicted turns). Gives at-a-glance budget utilization. E2E verified against live broker: - fast model call -> 168 tokens estimated (real BPE via /tokenize) - :model cloud + cloud call -> 178 tokens estimated (closure follows :model switch correctly per R4) - 21% / 22.3% budget utilization shown - Accumulator sums and estimate are intentionally different (sums are cumulative, estimate is current snapshot) — R3- correctly displayed as separate lines Regression: test_safety 87/87, test_router_model 31/31, repl loads. With this commit landed, Phase 8 is functionally complete; commit #5 is config example + status bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:31:40 +00:00
marfrit	0d6ff93134	repl: :cost meta surface (Phase 7 commit #5 ) User-facing reporter of the per-session accumulator. Three shapes: :cost one-line summary (calls / tokens / cost) :cost detail per-model + per-category breakdown :cost reset zero the meter; clears warn flags All read-only against ctx.usage_totals; no broker calls. R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile cost==0 heuristic. Summary line classifies: cloud only -> "cost=$X.XXXXXX" cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens but no cost field)" local only -> "cost=$X.XXXXXX (local only; no cost field)" R7 — :cost detail rows sort by (cost desc, model asc, category asc). Three-level key for deterministic output across equal-cost rows (table.sort is unstable; identical costs would otherwise reorder). R10 — all dollar values use $%.6f formatting. Sub-cent precision is critical: a Haiku call can cost $0.000028; $%.4f would round it to $0.0000 — indistinguishable from local $0. Column width widened to %-26s to fit fully-qualified cloud model names (e.g. "anthropic/claude-haiku-4.5" = 25 chars). E2E verified against live cloud + local broker: :cost (empty session) -> "0 calls, $0.000000" ...after mixed-mode session... :cost -> "5 calls, prompt=472 / completion=26 tokens, cost=$0.000377 (cloud only; local: tokens but no cost field)" :cost detail -> 4 rows: main cloud $0.000219, probe cloud $0.000128, delegate cloud $0.000030, main local $0.000000 (local). Sort by cost desc within model. :cost reset -> "cost meter reset"; subsequent :cost shows zeros. All 5 categories appeared in the same session: main (twice — cloud + local), delegate, probe (x2 from :safety check). Warn-threshold firing already verified in commit #3 + #4. HELP gains 3 :cost lines. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:24 +00:00
marfrit	b30212af0f	safety + repl: opts.category for Norris + probe (Phase 7 commit #4 ) Closes the last two broker call sites that flow through safety.lua. Together with commits #1-#3, all 7 broker call sites in aish now attribute usage to the cost accumulator under the right category. Changes: safety.lua: - llm_probe (the YES/NO destructive checker) — broker.chat call gains opts.category = "probe". Captures (text, usage) via (reply, second) and, when opts.on_usage is provided AND the call succeeded, routes second through opts.on_usage(model, category, payload). N4 signature chain: opts already flowed through llm_second_opinion -> M.is_destructive from #52's work; opts.on_usage rides along naturally with no further signature change. - M.norris_step (Norris main broker round-trip): * opts to broker.chat_stream gains category = "norris" * probe_opts (passed to is_destructive inside the loop) gains on_usage = helpers.on_usage so the LLM probe's cost lands under "probe" too * on_delta wrapper adds elseif kind == "usage" branch that calls helpers.on_usage(payload.model, payload.category, payload). Coexists cleanly with the existing text (rehydrator) and tool_call branches. repl.lua: - Norris helpers table gains on_usage = _record_usage. The R5 central chokepoint (commit #3) does the warn-threshold check AND ctx:add_usage atomically. - :safety check meta's probe_opts always carries on_usage now (independently of whether secrets_session is set). secrets-aware scrub_msgs/rehydrate added conditionally as before. E2E verified against live broker (safety.llm_model = "cloud"): - :safety check ls -la /tmp -> 2 cloud probe calls - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100" - probe category visible in accumulator (would appear in :cost detail once commit #5 ships the meta). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:01:21 +00:00
marfrit	8adebd52cc	repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3 ) Wires broker.lua's on_delta("usage", payload) and broker.chat's (text, usage) return to the ctx accumulator via a single chokepoint. Changes: - Forward decl `local _record_usage` near _bg_spawn — same pattern; the summarize-on-evict closure in make_summarize_fn (built at line 299) needs lexical access to _record_usage (assigned at line 695), so forward-declare and assign-without-`local`. - _record_usage(model, category, usage) — R5 central chokepoint: routes to ctx:add_usage, then checks the per-threshold warn state. R4: cost_warn_state has two independent flags (dollars and tokens) so first-to-fire doesn't suppress the other. R10: warn message uses $%.6f for sub-cent precision. - call_broker wrapper: wrapped on_delta now branches on kind == "usage" -> _record_usage(payload.model, payload.category, payload). R2: keys by payload.model (set inside broker.lua from model_cfg.model). When fallback fires, broker is called with fb_cfg, so payload.model IS the fallback's name automatically — wrapper doesn't track primary-vs-fallback itself. - 5 caller sites wired with opts.category: ask_ai call_broker -> category="main" summarize-on-evict -> category="summarize" DELEGATE: handler -> category="delegate" :memory summarize -> category="memory_summarize" :delegate meta -> category="delegate" - All 4 broker.chat call sites switched from local reply, err = broker.chat(...) to local reply, second = broker.chat(...) branching on reply nil-ness to interpret second (err on failure, usage on success). Captured usage routes through _record_usage. E2E verified against live cloud broker: - cloud prompt -> reply "Hi! 👋" - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010" - R10 sub-cent precision visible in both numbers. Norris + safety paths still untouched — commit #4 wires those. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:00:06 +00:00
marfrit	955bd82efb	safety + repl: wire secrets into safety.lua (closes #52 ) Closes the last #13 gap — Norris broker call + is_destructive LLM second-opinion probe were the two egress points NOT covered by the scrub-at-egress design in commit `d852aca`. Approach: option (b) per #52's fix sketch — callback-via-helpers/opts. safety.lua does NOT gain a require("secrets") dependency (acceptance criteria 3); integration is purely through the convention the rest of the helpers table already uses. safety.lua changes: - llm_probe gains an opts table. When opts.scrub_msgs is set, the {system, user(cmd)} message pair is scrubbed before broker.chat. When opts.rehydrate is set, the YES/NO reply is rehydrated before parsing (defensive — the verdict shouldn't carry placeholders but rehydration is a safe no-op if it doesn't). - llm_second_opinion threads opts through to llm_probe. - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is backwards-compatible (no scrub, original behavior). - M.norris_step: * outbound broker.chat_stream message scrubbed via helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided. * on_delta wrapped with helpers.streaming_rehydrator():push / :flush so the user sees rehydrated text AND text_parts accumulates rehydrated chunks (parity with ask_ai in repl.lua). * both M.is_destructive call sites (tool_call probe + CMD: probe) now pass probe_opts = {scrub_msgs, rehydrate} when the helpers carry them. repl.lua changes: - Norris helpers table gains scrub_msgs / rehydrate / streaming_rehydrator closures, all nil-safe (return identity / nil when secrets_session is nil). - :safety check meta passes probe_opts to is_destructive when secrets_session is configured. Without secrets, behavior unchanged. Unit-test verified end-to-end: - Stubbed broker.chat captures the messages it receives. - Without opts: probe SEES `ghp_realsecretvalue_...` (control). - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:40:30 +00:00
marfrit	11d0e599cd	repl + renderer: tree-sitter highlighter (Phase 6 commit #5 ) The largest Phase 6 commit — fence-aware stream filter in renderer.lua + external tree-sitter dispatch + :highlight meta in repl.lua. renderer.lua — fence-aware filter wrapping assistant_delta: M.set_highlight(enabled, detected, highlight_fn) Called by repl.lua at startup AND on every :highlight toggle. Stores state in module-locals (off by default). State machine inside _hl_push: outside: pass chunks through; HOLD trailing partial-fence chars (per R1 — local llama.cpp splits ```python as `'``'` then `'`python\n'`, so naive pass-through drops the leading "``" and never recovers). inside: buffer cumulatively until "\n```" appears; emit highlight_fn(body, lang) then the closing fence verbatim. Recursive call handles "rest" after the closing fence. N1: fences only open at start-of-stream OR after a newline (`^```` or `\n```` only). Inline backticks in prose ("use ``` to mark code") do not open a fence. R3 (PTY raw-mode toggle per highlight call): no change here — every executor.exec call already toggles raw-mode (existing behavior since Phase 1). The risk is theoretical; smoke-test interactively after install if multi-fence renders show flicker. assistant_flush handles end-of-stream gracefully: drains any held partial-fence tail OR an unterminated inside-fence buffer. repl.lua — _detect_treesitter + highlighted + :highlight meta: _detect_treesitter() one-shot popen probe of `tree-sitter --version`. Run once at startup; cached as highlight_detected. highlighted(body, lang_tag) R2-placed in repl.lua (has _shq + executor access). Translates the fence tag (`py`, `python`, `lua`, etc.) to a canonical lang via LANG_TAG, picks the canonical extension via LANG_EXTENSION, writes body to a tmpfile with that extension, runs `tree-sitter highlight <tmpfile>` via executor.exec, returns the output. On ANY failure (CLI absent, non-zero exit, empty output), returns `body` unchanged — silent pass-through. R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help` on noether; confirmed: - NO `--lang` flag exists (formulate-time assumption wrong) - takes a PATH; language inferred from file extension - alternative `--scope source.X` exists but also unreliable without configured grammars Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]` and pass the path. Matches the documented upstream contract. B4-followup: even with the CLI installed, highlighting requires `~/.config/tree-sitter/config.json` parser-directories with cloned + built `tree-sitter-<lang>` grammars. Without parsers, every call exits non-zero and we silently pass through. The :highlight install hint surfaces all three install steps so the user knows what's actually needed. :highlight [on\|off\|status] meta: no arg -> flip on/off -> set explicit status -> report toggle + CLI detection state When toggled on AND CLI absent: emit a 4-line install hint (CLI install, init-config, grammar clone reminder). When toggled on AND CLI present: emit a 1-line note that parser-directories must be set up for actual highlighting. HELP gains :highlight entry. Tested: 10/10 unit cases on the renderer state machine, including: - plain prose passthrough - single-chunk fence - B2 split fence ("``" + "`python\n" + "x=42" + "\n```") - N1 SOL anchor (mid-line ``` does not open) - trailing \n properly emitted across chunks - SOL-only fence open - prose after closing fence preserved - two fences in one stream - highlight off = passthrough (callback never fires) E2E :highlight meta verified: :highlight status -> off / detected :highlight on -> toggles + emits parser-dir reminder :highlight status -> on / detected :highlight off -> off Regression: test_safety 87/87, test_router_model 31/31, repl loads. Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config example block + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:04 +00:00
marfrit	0d63f01601	repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4 ) Per A6 (tiered resolution): @<token> tries file lookup first; if the file doesn't exist AND the token contains "..", retry as a git ref-range and substitute with a fenced `diff` block. Preserves the existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels the comma, resolves the ref, restores the comma after the closing fence). Resolution order for @<token>: 1. io.open(token, "rb") -- file lookup, with trailing-punct peel 2. if (1) fails and token contains "..": git --no-pager -c color.ui=never diff <r1>..<r2> on exit 0 + non-empty body: substitute as ```diff fenced block 3. else: leave literal `@token` + emit "[aish] @X: not found" status Examples: @README.md -> file (path branch) @../sibling.txt -> file (path branch; `..` only triggers retry when path lookup FAILS, so existing paths with `..` segments are unaffected) @HEAD~1..HEAD -> diff (path fails, ref succeeds) @origin/main..feature -> diff (path fails — no such literal file; ref succeeds; `/` in ref is fine because we don't use the path's `/`-absence as a discriminator) @nonsense..gibberish -> literal preserved (both fail) Required restructuring: - _shq and _git_clean_cmd lifted from M.run closure scope to module scope (above expand_mentions). Single source of truth for the B1 prefix shared with commit #3's :diff. The in-M.run duplicates are removed. - expand_mentions now references `executor` (already required at module scope on line 7) for the diff retry. Status messages updated: - File expansion: "@<path> expanded (N bytes, truncated)" (existing) - Diff expansion: "@<path> expanded (N bytes, diff)" (new) Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14): ref-range expansion shape, body contains `diff --git`, trailing prose preserved, @../path stays as file (not diff), neither-path- nor-ref preserves literal, trailing-comma peel composes with ref retry. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:20:25 +00:00
marfrit	4d5f93aaa5	repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3 ) User-driven git diff injection. The model sees the diff on the next ask_ai turn through the existing exec_output channel. Changes: - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree. B1: every git invocation that flows into context MUST use `--no-pager -c color.ui=never`. Forkpty makes git think stdout is a TTY, enabling both color and the pager's keypad/line-clear escapes — these would pollute the captured context block. The helper is the single chokepoint; commit #4's @<r1>..<r2> retry will reuse it. - :diff [<args>] meta: - Reads cwd at meta invocation (R6: differs from :tree's scan-time cwd capture; documented in §5). - Runs `_git_clean_cmd("diff " .. args)` via executor.exec. - Empty output -> "(no diff): <label>" status, no context append. - Non-zero exit -> "diff failed (exit N): <label>" status, no context append. git's stderr already streamed to the user via executor.exec's live multiplex, so the failure reason is visible. - Success -> appends "[diff <label>]\n<output>" via ctx:append_exec_output. Label is "(working tree)" for empty args, else verbatim args. - Status confirms injection size: "diff injected: <label> (N bytes)". - HELP gains :diff line with three example arg shapes; N3-resolved (no `staged` alias — the meta is thin pass-through to git's grammar). Smoke verified across four scenarios in an ephemeral test repo: - Working-tree dirty -> 110-byte diff injected, no ANSI escapes - --cached -> 118-byte staged diff injected, clean - garbage..nonexistent -> exit 128, status + skip - Clean working tree -> "(no diff)", status + skip Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:17:18 +00:00
marfrit	d1dce832da	repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2 ) First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '/.' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>\|refresh\|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:14:36 +00:00
marfrit	d852acadc2	repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args Plumbs the secrets.lua module (commit `e4b818b`) into the conversation pipeline. Hook points: ask_ai — scrub_messages(ctx:to_messages(), mode) before call_broker; rehydrate streamed deltas via streaming_rehydrator so the user sees real values while text_parts accumulates rehydrated chunks (final_resp is plain — CMD: / DELEGATE: extractors see plain values) MCP dispatch — dispatch_tool_call rehydrates the args table before sess:call_tool so the trusted MCP server receives real values (the model emitted placeholders because it saw a scrubbed context) DELEGATE: & :delegate — scrub sub_msgs before broker.chat; rehydrate sub_text before appending to context, so future turns see real values restored Phase 5 summarize-on-evict — scrub sum_msgs before broker.chat; rehydrate the reply that becomes ctx.summary :memory summarize — same scrub + rehydrate pair Mode resolution per call: model_cfg.redact → config.secrets.default → "vault+autodetect" if vault loaded, else "off". ctx storage convention: PLAIN values throughout. The scrub happens at the egress (broker call) per the active redact mode; ctx.turns never holds placeholders for content the user typed or executor produced. The model's own emissions (assistant tool_call arguments) may carry placeholders because the model saw the scrubbed context — rehydrated at MCP dispatch and otherwise harmless on re-serialization (idempotent re-scrubbing). New meta: :secrets [status] vault entries, placeholders allocated this session, active broker mode. Never prints actual values (vault file is itself a secret per gotcha 7). :secrets check <text> dry-run scrub against the active broker's mode — shows the output transformation. Documented in config.lua with a commented-out block + per-broker redact field example. Deferred to a follow-up issue (clearly scoped): - safety.lua broker call sites (Norris main loop, is_destructive LLM second-opinion probe) — same wiring pattern, but they don't currently see secrets_session; needs threading through helpers. - @-mention file content is appended PLAIN to ctx and scrubbed at egress alongside the rest of the user turn (covered by the ask_ai scrub). - exec output streamed live to terminal is pre-scrub (user sees real values in their own shell — by design); the captured-for- context copy is scrubbed at egress alongside the rest. This is the "full scope" implementation chosen via AskUserQuestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:38:23 +00:00
marfrit	cdf4e86679	repl: sub-broker delegation via DELEGATE: marker (closes #6 ) Cost and context-window control: a "heavy" preset's model can offload work to a cheaper preset without spending its own tokens on the result. Example: deep model is mid-conversation and asks fast to summarize a 20k-line build log; the summary comes back as exec-output for the next turn, deep stays small. Marker syntax: DELEGATE: <preset> "<prompt>" (Single or double quotes; one DELEGATE per line; lines without the quoted shape are dropped — let the user write about delegation in prose without accidental dispatch.) Dispatch flow (mirrors CMD: / CMD&: extraction): 1. ask_ai's stream completes 2. extract_delegate_lines walks the final response 3. For each {preset, prompt}: broker.chat(config.models[preset], ...) synchronously; result is appended via ctx:append_exec_output as "[delegate <preset>]: <result>" 4. The model sees the delegate result on its next turn Implementation choice — marker over tool: option 1 from the issue ("inline delegate marker") works with any model regardless of tool_calls support. Option 2 (aish_delegate as a tool dispatched in the existing Phase 2 sub-loop) is the better UX for capable models since it returns the result mid-turn — filed as follow-up if needed. Meta surface: :delegate <preset> <prompt> one-shot direct invocation (useful for testing without depending on the model emitting DELEGATE:, and as a manual "ask <preset> something" verb) Scope: - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch - Norris: not extended; the planner's model anchor would conflict with mid-plan switching (R-C3-adjacent risk) - No self-delegation guard: each DELEGATE is a separate broker call, not recursive; a delegate result reaching the next turn could contain another DELEGATE but that's bounded by max_tool_depth-style iteration cap on the parent - No cost prompt: configuring a paid cloud preset already implies consent to spend on it - Unknown preset → error status + exec-output note "[delegate X failed: unknown preset]" Extractor unit-tested with 8 cases (single-quote, double-quote, multi- line prose, empty prompt, no-quotes, case-sensitive, wrong prefix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:29:09 +00:00
marfrit	f94d16fc89	repl: background CMD&: with handle/poll (closes #8 ) Builds, long-running network calls, and file watches no longer block the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL to spawn the command in the background, return immediately, and poll for completion between user inputs. Process model: shell-wrapped to avoid needing fork()/execv() FFI. nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null >/dev/null 2>&1 & echo $! The child is reparented to init; we hold only the PID and the path to the .status sidecar. Completion is detected by the .status file existing (the wrapper writes it as its last act). No waitpid needed — the child isn't ours after the popen subshell exits. Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is created lazily at startup (mkdir -p). Requires history.dir to be configured; without it CMD&: emits an error status and the model sees an "[bg failed to start]" exec-output note. check_bg_done() runs at the top of each main-loop iteration alongside check_every_due(). When a job is detected as exited, the REPL: - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>" - appends the same string to ctx as exec output, so the model sees the completion on its next turn (natural follow-up: "ok the build finished; let me check the log") Meta surface: :bg-spawn <cmd> start a bg job directly (no AI needed; also useful for testing without depending on the model emitting CMD&:) :bg-list show running/done jobs (id, pid, state, runtime, cmd) :bg-output <id> dump the log file to stdout :bg-kill <id> SIGTERM (note: only delivers if the PID is still the actual command — long-lived shells may need pkill by name) Scope (deliberately limited for v1): - No callback-mode readline: bg completion detection is pre-prompt, not mid-readline. If a build finishes while the user is typing, notification comes when they hit Enter. - Permission policy DSL (#9) does NOT apply to CMD&: — the asynchronous gating model wasn't designed for the y/N flow. Filed as follow-up if needed. - Norris not extended: helpers.exec_cmd is still synchronous; the planner doesn't dispatch bg jobs. - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>" and a "[plan] would bg-run: <cmd>" exec-output note, no spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:25:55 +00:00
marfrit	67d80e1047	repl: :every recurring prompts via pre-prompt due-check (closes #11 ) In-session timer that re-injects a prompt every N seconds. "Watch this thing" workflows (`:every 5m "check journalctl -u nginx for errors"`) without spawning a separate aish process. Approach: minimum viable. check_every_due() runs at the top of each main-loop iteration — timers fire BETWEEN user inputs, not during readline waits or active broker calls. Mid-stream firing would require rewriting ffi/readline to callback mode (substantial scope). If the on-the-fly firing requirement matters in practice it can land as a follow-up issue against the readline FFI. Meta: :every <interval> <prompt> schedule (interval: 30s \| 5m \| 2h \| bare int) :every list show jobs (id, interval, time-until-next, model, prompt) :every cancel <id> remove Defaults: - Model: "fast" preset if defined in config.models, else active model (per the issue's "recurring prompts should default to fast preset"). - In-memory only — jobs don't persist across restarts. - Suppressed while ctx.norris_active (planner stays on goal anchor). - Quotes around the prompt are stripped if present. - Each tick fires the job once, re-schedules next_fire = now + interval (no catch-up if the interval elapsed multiple times during a long user input). Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed), load via require, end-to-end :every list / cancel surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:23:07 +00:00
marfrit	17e62c0326	safety: permission policy DSL — allow/confirm/deny rule lists (closes #9 ) The confirm_cmd boolean was too coarse: true interrupts every harmless ls; false ungates everything. Most workflows want trust for read-only ops while still gating writes/network/sudo. New config: permissions = { allow = { "^ls%s", "^cat%s", "^git status" }, confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" }, deny = { "^ssh%s+root@", "^curl%s+http[^s]" }, } Verdict order: deny > confirm > allow. First match in the chosen category wins. Unmatched defaults to "confirm". Patterns are Lua patterns (not regex) per PHASE0.md §3 — no compiled extensions. Verdict behavior in the interactive CMD: loop: - allow → run without prompt - deny → status line, skip - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true) Backward compat: - permissions unset + confirm_cmd=true → always confirm - permissions unset + confirm_cmd=false → always allow - permissions set → policy table is authoritative Scope deliberately limited to the interactive AI-suggested CMD: gate. Norris autonomous mode keeps its own safety.is_destructive machinery (combining the two would double-gate or replace the LLM probe — both non-obvious behavioral changes that belong in their own issues). User-typed shell-routed lines (`router.classify → "shell"`) and :exec also bypass the policy by design — those are direct user intent. New introspection: :perms list — show the configured rule lists :perms check <cmd> — report verdict + matching rule (debug) safety.classify_command is exported and unit-tested with 12 cases covering each category, priority order (deny > allow on overlap), and both fallback paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:20:56 +00:00
marfrit	518c01a9f5	repl: user-defined skills loader (closes #2 ) PHASE0.md §5.2 froze the meta-command set at compile time. Skills let the user package repeatable workflows (project queries, prompt templates, audit routines) without forking aish. Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever $AISH_SKILLS_DIR points at — used both by users with non-XDG layouts and by CI). Each module exports: return { name = "<meta-cmd-name>", -- must match [%w_-]+ description = "<one-line>", -- shown by :skills run = function(args, h) ... end, } Helpers passed to run(): h.ask(text) — same path as :ask (with @path expansion) h.status(s) — emit "[aish] s" h.exec(cmd) — run a shell command (subject to plan_mode, hooks) h.model() — current active model name h.ctx — raw Context object (advanced) h.config — the loaded config table Validation rejects modules that miss name/run, use whitespace in the name, or collide with an existing meta command (built-in or earlier skill). Each rejection emits a status line so the user sees why a skill didn't appear. New meta command :skills lists what's loaded (sorted, with description). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:00 +00:00
marfrit	fb15f7a690	repl: pre/post CMD hooks via config.hooks (closes #3 ) Optional shell scripts trigger around every CMD: execution. Use cases: audit logging, auto-format-after-edit, custom safety gates beyond the existing confirm_cmd boolean. Config shape: hooks = { pre_cmd = "/path/to/pre-script", post_cmd = "/path/to/post-script", } Contract per hook invocation: - The command line is piped to the hook on stdin. - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the moment of dispatch), AISH_CWD (libc.getcwd() result). - Hook stdout is streamed live to the terminal via executor.exec (so the user sees its output regardless of exit status). Pre-hook: non-zero exit aborts the command and emits a status line including the exit code. last_exec_code is set to the hook's exit so the {last_status} prompt template variable reflects the abort. Post-hook: exit code is ignored (the spec says so); only the visible stdout matters. Runs after the command's exec_end frame. Tested with success, abort, and stdin-matches-env paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:16:11 +00:00
marfrit	ce1378edee	repl: fix {name} pattern to accept underscores (#10 follow-up) %w excludes underscore in Lua patterns, so {ctx_used}, {ctx_max}, {cwd_short}, {last_status} were left literal in the prompt. Use [%w_] to accept identifiers with underscores. Surfaced during higgs smoke test of the new template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:57 +00:00
marfrit	d738f339cb	repl: configurable prompt template via config.shell.prompt (closes #10 ) At-a-glance situational awareness: see the active model, context fill, mode flags, and cwd in the prompt itself — prevents "wait, am I still in plan mode?" surprises. Example config: shell = { prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ", } Variables (substituted via {name}): {model} active preset name {ctx_used} char/4 token heuristic (Phase 0 §8; accurate is Q1) {ctx_max} config.context.token_budget {turn} #ctx.turns {cwd} libc.getcwd() (chdir-aware; PWD env may drift) {cwd_short} cwd with $HOME -> ~ {last_status} last exec exit code, "" if none yet {mode} "norris" \| "plan" \| "normal" Default behavior unchanged when shell.prompt is unset — keeps the "[aish:<model>]>" form with norris ⚡ and plan markers. Side wiring: - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD). - run_shell records exit code into last_exec_code for {last_status}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:43 +00:00
marfrit	10d2501cff	repl: peel trailing punctuation from @path mentions (#7 follow-up) Natural-language prose like "look at @README.md, then..." or "@foo.lua." at sentence end previously failed to expand because the trailing comma/period was included in the path. Now: if the raw token doesn't resolve, peel trailing chars from [.,;:?!)] one at a time until the path resolves or no more peels are possible. On success, the peeled chars are emitted verbatim AFTER the closing fence so the original punctuation is preserved. Surfaced during higgs smoke test (TC: "say the first line of @README.md, then stop" — the trailing comma broke resolution). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:11:22 +00:00
marfrit	bb374c2ad2	repl: @path mention expansion in input lines (closes #7 ) Saves the user from manual copy/paste: typing "show me @repl.lua" or "compare @config.lua and @config.example.lua" auto-expands each mention to a fenced code block carrying the file contents, language-tagged by extension, and feeds the composed text to the broker. Wired on the "ai" branch of the input loop and inside :ask. Meta and shell branches pass through unchanged — "@foo" in shell context is a literal program argument; meta commands store text verbatim. Trigger rule: "@" must follow start-of-string or whitespace — avoids false positives on email addresses ("user@example.com") and shell short-options. Path extends to next whitespace. Other behavior: - Language tag derived from extension via a small lookup; unknown extensions yield an untagged fence. - Files over 32 KB are truncated head/tail (16K + 8K) with a marker. - Missing files leave the literal "@path" token in place and emit a "[aish] @path: not found" status — non-fatal, lets the user correct the path and re-type. - Each successful expansion emits "[aish] @path expanded (N bytes [, truncated])" so the user sees what was inlined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:10:54 +00:00
marfrit	dccd9e90cc	repl: :plan toggle — CMD: lines become PLAN: notes (closes #5 ) Plan mode is a safer entry point than going straight to Norris: the user iterates with the model on what to do, sees each CMD: as a PLAN: line, and the would-have-run notes feed back into the next-turn context so the model can refine without side effects. Toggle with :plan (flip), :plan on, :plan off. Off by default. When plan_mode is true: - CMD: lines extracted from the assistant turn print as "PLAN: <cmd>" - The note "[plan] would run: <cmd>" is appended via the existing append_exec_output channel — same context flow as a real exec, so the model sees its proposed action on the next turn. - run_shell is NOT called; no executor, no cd intercept, no capture. The prompt shows "[aish:<model> plan]>" while active (mirrors the norris ⚡ marker convention). Orthogonal to Norris: plan_mode only gates the interactive CMD: extraction path. Norris has its own halt protocol; combining them is not supported (the planner would be confused by skipped actions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:09:08 +00:00
marfrit	0700dce881	repl: enforce budget per Norris step, not just post-loop (closes #51 ) PHASE3.md §2 specifies sliding-window eviction "including mid-Norris- session if the loop runs long". Implementation only called enforce_budget() once, after the planning loop exited — so for a tight max_turns with a multi-step Norris session the model saw the FULL conversation throughout, defeating context budgeting and preventing R-C3 (NORRIS suffix goal anchor surviving eviction) from being exercised end-to-end. Move status_evictions(ctx:enforce_budget()) inside the while loop so it runs after every safety.norris_step return. Drop the now-redundant post-loop call. Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the "oldest 4 turns evicted" status arrived AFTER NORRIS DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:34 +00:00
marfrit	0c93e31186	repl: warn on stale MCP auto_approve keys (closes #33 ) Auto-approve policy keys that point at unconnected aliases, mistyped tool names, or malformed forms were silently ignored — leaving the user with surprise confirm prompts and no diagnostic. validate_auto_approve() now walks config.mcp.auto_approve at startup (after the MCP connect loop) and after each :mcp connect. For each key: - "alias__*" — warn if alias has no live session - "alias__tool" — warn if alias unknown OR tool not in registry - anything else — warn as malformed (not in alias__tool form) Non-fatal. The re-run on :mcp connect lets a key that referenced a not-yet-connected alias become live without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:08 +00:00
marfrit	299dcce78f	repl: validate MCP tool names against Bedrock regex (closes #32 ) Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names. We already moved the alias separator from "." to "__" (commit `f26cbd9`), but a future MCP server could still register a tool whose name (or whose combination with the alias) contains characters outside that class — silently breaking calls to strict providers. connect_mcp now warns at startup for: - aliases containing "__" (would misparse on tool dispatch) - emitted alias__name strings that violate the regex or exceed 128 chars Behavior preserved: validation is informative-only. tools_schema() still emits the offending tool; local llama.cpp users accept lenient names and shouldn't be penalized for downstream strictness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:29 +00:00
marfrit	8e0e735e15	repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT) Surfaced by autonomous run of TC #48: pointing models.fast at http://localhost:9999 (port closed, host resolves) emits "transport: Could not connect to server" — CURLE_COULDNT_CONNECT (7) which the Phase 5 fallback pattern set didn't include. Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua. Now fallback fires for the full set of common libcurl/HTTP transport failure shapes: HTTP 5xx server-side HTTP 404 model_not_found HTTP 408 gateway request timeout Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST Could not connect to server CURLE_COULDNT_CONNECT (← added) Connection refused Timeout was reached CURLE_OPERATION_TIMEDOUT (variant A) Operation timed out CURLE_OPERATION_TIMEDOUT (variant B) Re-tested #48 end-to-end: fast pointed at dead port → fast fails → status fires → cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:49:13 +00:00
marfrit	40ea0b49b0	repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3 ) Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5 machinery into the REPL. make_summarize_fn(): Returns a closure that maps (prior_summary, evicted_turns) onto a broker.chat call against cfg.context.summarizer_model (default "fast"). Three dispatch paths matching the R-B1 callback contract: evicted == nil → compress signal prior present → additive ("extend the prior summary ...") prior nil → first-time ("summarize the following turns") All use a system prompt enforcing "exactly one short paragraph", max_tokens=300, timeout_ms=30000. Broker failure returns nil so Context falls back to silent eviction. Renderer status is logged on failure for visibility. Context construction: Build ctx_opts as a fresh table (copies config.context to avoid mutating it), adds summarize_fn ONLY when config.context.summarize_on_evict == true. Defaults stay OFF — Phase 4 regression coverage. Fallback machinery: - FALLBACK_PATTERNS table with 7 transport-error signatures (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused, "Timeout was reached", "Operation timed out") - fallback_reason(err) strips the "transport: " prefix and matches. - should_fallback(err) gates on cfg.routing.fallback. - call_broker(cfg, name, msgs, on_delta, opts) wraps broker.chat_stream: • tracks any_delta via wrapped on_delta callback • retries ONCE against cfg.routing.fallback_model (default "cloud") when err matches AND no deltas arrived (N3: mid-stream failures aren't retried — partial text would duplicate) • emits "[aish] local <name> failed (<reason>); retrying via <fb>" status before the retry call ask_ai routing: - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg locals carry the choice through every tool-sub-loop iteration. - active_name/active_cfg are NOT mutated — user's :model selection survives the request. - When config.routing.auto is true, classify_model(text, config) is invoked. Non-nil model + non-active → swap req_cfg + status line. - broker.chat_stream call replaced with call_broker (fallback wrap). Meta cmds: :route on/off — toggle cfg.routing.auto at runtime :route classes — show class → model mapping :route check <text> — report classify_model result with "(routing currently disabled)" suffix when auto is off (N1) :fallback on/off — toggle cfg.routing.fallback at runtime HELP updated with the four new commands. Smoke-tested: aish boots, all four metas behave correctly, classify_model returns reasoning class for "Explain how MMAP works on Linux" (the model slot is nil because no classes are configured by default — N2 cost-safety). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:31:14 +00:00
marfrit	f22d21d754	repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4 ) Phase 4 commit #4 per docs/PHASE4.md §6. :memory summarize: 1. Source-of-truth: session log file via history.load(session_path), NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize" so prior summarize exchanges don't self-amplify across multiple calls within the same session. 2. Pick summarizer model from cfg.memory.summarizer_model (default active model). 3. Build a transcript string ("role: content" per turn, 800 chars max per turn) and feed it as a single user turn alongside a system instruction asking for "(fact\|pref\|context): <content>" lines. 4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep model can take a while; we don't want a 15s probe-cap here). 5. Log the response as an assistant turn with meta="summarize" so the next :memory summarize call filters it out. 6. Parse response lines tolerating markdown bullets and bold markup: ^%s[-]?%s[_](fact\|pref\|context)[_]:%s(.+)$ 7. Per-candidate prompt: y / N / edit. y → memory:add(kind, content) edit → readline prompt for replacement text any other → drop 8. status: "summarize: added N / M candidates". Live-tested against hossenfelder/fast: Pipeline correct end-to-end. Model emitted one candidate; user confirmation prompt fired; item persisted; :memory list showed it. Candidate quality from the 1.5B model is poor — typical small-model behavior; deep/cloud models would do better but this isn't an aish bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:36 +00:00
marfrit	3b074afaee	repl: memory handle + :remember + :memory meta (Phase 4 commit #3 ) Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring. Startup: - Opens memory handle at <history.dir>/memory.jsonl via history.open_memory(). Status-logs failure (e.g. flock held by another aish) and continues without memory. - inject_memory(): loads via history.load_memory(), truncates by cfg.memory.inject_max_chars (default 2000), populates ctx.memory_items. Status line announces N items injected. - shutdown_session() now also closes memory (releases flock). Meta commands: :remember <text> — shortcut for :memory add fact <text>; auto-refreshes ctx.memory_items so the next AI turn sees the new item without restart :memory list — show id / ts / kind / content (truncated at 80 chars per line) :memory add <kind> <t> — fact\|pref\|context required; rejects other kinds :memory forget <id> — N1: checks active-set first, surfaces "id N not active (already forgotten or never existed)" without appending if the id isn't live :memory clear — [y/N] confirm prompt; tombstones every active item :memory inject — N4: reload memory.jsonl into ctx.memory_items, replacing existing. Useful after manual file edits. Help block extended with the new commands. End-to-end verified: Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all three with timestamps Boot 2 → memory: 3 items injected (startup status); :memory list same three; ctx.turns empty (history is sessions/, memory is separate) Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not active" status without writing a tombstone; :memory list shows 2 items; :memory clear → confirm prompt → "cleared 2 items"; :memory list → "(no memory items)" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 05:11:48 +00:00
marfrit	a404b2a152	repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5 ) Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step (commit #4) into the REPL with the user-facing surface. ffi/readline.lua extensions (A1 + R-C4): - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text and M.redisplay wrappers exposed. - M.bind: removed `:free()` on previous callback. Now keeps every bound callback pinned for process lifetime in `_pinned` list (alongside `_bound[seq]` for current lookup). Avoids the use-after-free window between unbind and rebind that R-C4 flagged. Memory cost is bounded — one closure per key sequence binding. context.lua Norris suffix (R-C3 / §8): - to_messages() composes a dynamic NORRIS MODE block onto the system prompt when ctx.norris_active is set. The block carries ctx.norris_goal so eviction of the user's "[norris] goal:" turn doesn't lose the anchor. Returns to plain system prompt when Norris exits. repl.lua Norris driver: - prompt() now shows ⚡ marker when ctx.norris_active per PHASE0.md §9. - \C-n bound to a real handler — inserts ":norris " at the cursor (replaces Phase 1 status placeholder). - run_norris(goal) function: sets norris_active + norris_goal, appends a "[norris] <goal>" user turn, renders the banner, then loops calling safety.norris_step with an injected helpers table until a terminal status returns. Renders the closing banner. - norris_halt(): the [N] proceed/skip/abort prompt called by safety.norris_step via helpers.halt. Empty input → abort (safe). - dispatch_tool(): factored from the Phase 2 ask_ai code so safety.norris_step can call it. - norris_exec(): factored exec path for autonomous mode (skips the interactive run_shell cd-status renderer). - :norris <goal> meta — launches autonomous mode - :norris off meta — drops Norris flag (rare; usually 'abort') - :safety patterns meta — lists active is_destructive rules - :safety check <cmd> meta — probes a hypothetical command End-to-end mock-driven test: Submitted ":norris find files in /tmp" → banner → step 1 emits tool_call (auto_approved per policy) → dispatched → frame rendered → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner. 2 broker invocations, no stalls. config.lua safety example block lands in commit #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:14 +00:00
marfrit	f26cbd9a3a	phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:04:57 +00:00
marfrit	3fa6279f5b	repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no dot) printed "unknown alias: nil" instead of a useful diagnostic. The parse failure was being conflated with the alias-not-found case. Now: :mcp tool list_dir -> tool name missing alias prefix: list_dir :mcp tool unknown_alias.x -> unknown alias: unknown_alias :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:55:01 +00:00
marfrit	7e9cfff04d	repl: tool-call sub-loop + :mcp meta + system-prompt augmentation Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP tool-call flow on top of broker/safety/context/renderer/mcp. repl.lua additions: - mcp_sessions table populated from config.mcp.servers at startup. connect_mcp() helper does initialize + caches tools/list. Failures status-logged once; absent from mcp_sessions until manual reconnect (C4 — no auto-retry). - tools_schema() flattens connected sessions' tools into the OpenAI {type:"function", function:{name,description,parameters}} shape with "<alias>.<name>" namespacing. - flatten_content() concatenates content[type="text"] blocks; one-shot status warning when non-text blocks (image/resource) are dropped (§4 normative spec, v1 only handles text). - dispatch_tool_call(name, args_table) splits alias.tool, looks up session, calls. Returns (content_string, is_error). Errors of every flavor (missing alias, no session, rpc_error, transport_error) yield a synthesized "[aish] ..." string so callers always have a body for the role:"tool" turn — alternation preserved per C5/C7. - ask_ai rewritten as a sub-loop that re-issues the broker request until the model returns pure text or max_tool_depth (default 8) is hit. Each iteration: stream response → if tool_calls present, confirm-gate each → dispatch → append role:"tool" turn → continue. Argument-JSON parse failure produces a synthesized tool turn (C7). Decline at confirm produces "[aish] tool call declined by user" tool turn (alternation guarantee). - :mcp meta with sub-commands: list / tools / tool <a.n> / connect <url> [alias] / disconnect <alias>. HELP block extended. context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8 (hybrid prompt: static frame about MCP + dynamic tools list in the request body). Block is always present even when no MCP servers configured — ~60 tokens for clarity that 'CMD:' remains the fallback. CMD: extraction unchanged — runs on the FINAL pure-text response only (not on intermediate iterations of the tool sub-loop). Substrate §3 invariant preserved. End-to-end verified two ways: (1) Direct broker probe: aish's tools_schema fed through broker.chat_stream against hossenfelder → qwen-1.5b emits one tool_call payload with correct id + name="boltzmann.list_dir" + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string across fragmented deltas. (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock emits text + tool_call, sub-loop dispatches against LIVE boltzmann lmcp (auto_approve via policy), 80+ files rendered inside the tool_call frame, broker re-invoked with the extended context, mock returns pure text, sub-loop terminates. Total broker invocations: 2. Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..." suggestions even when an MCP tool is the better path; the small model's system-prompt compliance is weak. Larger models and the analyze-time direct probe confirm the tools_schema and tool_calls flow is wire-correct — Phase 7 verify will exercise this against qwen3-30b or cloud models when available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:20:42 +00:00
marfrit	efdc7281c7	broker: opts.tools passthrough + streaming tool_call accumulator Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows tool-call support without taking a dependency on mcp.lua (caller supplies the tools array — B5 from review). chat_stream signature widens to (cfg, msgs, on_delta, opts): opts.tools - optional array, passed to the request body as the OpenAI-shape tools field. OMITTED entirely when nil or empty (#tools == 0) — some servers reject "tools": []. on_delta callback shape widens to (kind, payload): kind = "text", payload = string (Phase 1 path; unchanged semantics, signature changes from (delta) to ("text", delta)) kind = "tool_call", payload = {id, name, arguments} emitted ONCE per call on finish_reason "tool_calls" after the streaming accumulator pulls fragmented JSON-string arguments together. Accumulator behavior: - Keyed by delta.tool_calls[i].index. - If index is absent on a delta (some llama.cpp builds omit it on single-call streams; C2 in review), default to 0 with a one-shot stderr debug status per stream. - id and name captured from the opening delta of each slot. - function.arguments concatenated across all deltas as the raw JSON-string; caller (repl.lua / future Phase 2 commit #6) does dkjson.decode. - On finish_reason "tool_calls" the accumulator emits all collected calls in index order and resets. M.chat external contract unchanged (C1): wrapper now uses the new (kind, payload) shape internally but exposes the same text-string return. No caller of M.chat passes opts.tools so tool_call kinds are silently dropped. repl.lua minimal companion edit: ask_ai's chat_stream callback updated to the new shape. Text path unchanged; tool_call kinds are no-op placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming functional between #5 and #6. Smoke-tested against hossenfelder/8082 (post-#23 fix): - text-only: ok=true, kind="text" deltas received - with opts.tools: model emitted one tool_call, accumulator collected id + name=get_weather + args={"city":"Paris"} correctly across fragmented deltas - opts.tools={}: server accepted (field omitted as required) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:20:32 +00:00
marfrit	7d62eb5659	review followups: pcall shield, :resume guard, shell quoting, nits CONCERNs from the Phase 1 review pass: ffi/curl.lua: - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or in the parse loop itself) is captured into cb_error and surfaced after curl_easy_perform rather than propagating across the FFI callback boundary (which LuaJIT documents as process-fatal). The EOS flush path gets the same shield. Errors return (nil, "callback: <msg>") from post_sse. history.lua: - sh_singlequote() escapes shell metacharacters; the mkdir -p and ls -1 shell-outs no longer double-quote (where $(...) and $VAR still expand) — single-quote with embedded-' escaping is the safe form. - M.load now returns (turns, meta) instead of (meta, turns). turns is ALWAYS a table on success, never nil-when-no-header; failure path is the unambiguous (nil, err). Callers can `if not turns then` without the previous ambiguity. repl.lua :resume updated to the new shape. repl.lua :resume: - Refuse to resume into a non-empty ctx — silent overwrite was the Q15 default, but the review surfaced the no-undo / no-warning failure mode. User must :reset (or :save then re-launch) to express intent. The current session's on-disk log is unaffected either way. NITs: - ffi/libc.lua READ_BUF: comment noting it's module-shared and Phase 1 has no reentrant readers; revisit when that changes. - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred from Phase 1 — no consumer here") rather than the previous dangling "(or here)". Regression suite verifies: - history.load new signature on success + failure paths - shell-quoted history.dir with $ doesn't trip - aish scripted run: ctx with 2 turns refuses :resume anchor with a clear status; user must :reset first Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:05:23 +00:00
marfrit	1f1065157e	review BLOCKER: PTY input forwarding + raw mode toggle Phase 1 review caught a structural gap: executor.exec only drained the PTY master fd, never forwarded user keystrokes — vim/less/htop/nano would render and hang on input. PHASE1.md §5 specified bidirectional multiplex but only the read leg landed. tcgetattr/tcsetattr were also missing, so even with input forwarding the parent's line discipline would buffer until newline (breaking single-key UIs). ffi/libc: - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or (nil, err) when fd isn't a tty (scripted / piped-stdin runs) - M.restore_termios(fd, saved) - struct pollfd + M.poll (POLLIN constant) executor: - multiplex(sess): poll(stdin, master); reads master on any revents (POLLHUP fires when child closes its slave end, not POLLIN — the revents != 0 check catches both); forwards stdin keystrokes to master; loop exits when master read returns 0 (EOF / child gone) - stdin polling is only enabled when stdin_is_tty (set_raw succeeded); piped-stdin runs (tests / scripted) would otherwise drain queued aish commands into the child of the current cmd, swallowing them - raw mode is restored before returning so the user lands back at the aish prompt in canonical mode renderer + repl: - exec_output(out, code) split into exec_begin() (top rule, before spawn) + exec_end(code) (closing rule with exit, after wait). PTY multiplex streams the body live to stdout in between; the renderer never re-prints the body. PHASE1.md §3: - tcgetattr/tcsetattr changed from "optional" to "required for single-key UIs to work — done-criteria #2"; poll added to the libc row description. Verified: - non-interactive smoke (echo / false / exit 7 / ls /nonexistent / printf multi-line) — all exit codes correct, output streamed live, a\nb\nc\n preserved byte-for-byte - scripted-stdin run reaches all expected lines (no stdin draining into a non-interactive child) - aish prompt + framed exec block + exit-code line all render in correct order Live interactive verification (vim / less / htop in a real terminal) still needs a user-test pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:00:53 +00:00
marfrit	a75118b2ae	readline: bind() via rl_bind_keyseq; repl reserves \C-n no-op Phase 1 readline binding wiring per PHASE1.md §7. ffi/readline: M.bind(seq, lua_fn) -> bool Wraps lua_fn as a C callback (signature `int (int, int)` per readline's rl_command_func_t) and registers it via rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0). Trampolines are pinned in module-local state so they outlive the bind call — readline retains the function pointer for the process lifetime. Rebinding the same seq frees the previous trampoline. Bound handlers are pcall-wrapped so a Lua error doesn't crash readline's input loop. repl: Binds \C-n to a no-op that emits "[aish] Norris mode not yet implemented (Phase 3)" Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode) replaces the body with the actual toggle. Smoke covers bind / rebind-same-seq (exercises the :free path) / bind-different-seq with no errors. Live keyboard verification waits on user-test. Phase 1's 8(+1) inner loop is now functionally through `implement`; next inner phase is `verify` (review pass) followed by memory-update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:26:58 +00:00
marfrit	9d586870e8	repl: session persistence wiring — auto-log, :save, :resume, :sessions Phase 1 session log integration per PHASE1.md §6. On every M.run(), open a session file at <config.history.dir>/sessions/<utc-iso8601>.jsonl with a meta header (started, model, aish_version). If history.dir is unset or unwritable, status-log the disable and continue without persistence. ask_ai logs the merged user turn (after pending exec output is folded in) and the assistant turn (after streaming completes). run_shell does NOT log [exec output] — that becomes part of the next user turn when ctx.pending_exec_output is flushed. New meta commands: :sessions list session files; "*" marks the active one :save <name> rename current session log to <name>.jsonl (auto- appends .jsonl); reopens for continued append :resume <name> load <name>.jsonl into ctx (replaces current turns via ctx:reset + append loop). The current process's own session log is unaffected — Phase 1 chooses per-process logs over chained continuations. :quit and EOF (Ctrl-D) both close the session file via shutdown_session before exiting. HELP text updated (no longer "Phase 0:" header since meta set has grown). Q15 noted in PHASE1.md §10 (resume into non-empty context) is resolved by the ctx:reset() in :resume — silent overwrite for Phase 1, revisit if anyone cares. End-to-end live verified: chat -> auto-log; :save renames; :sessions listings; :resume + :history shows the round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:23:05 +00:00
marfrit	a722f576ac	repl + renderer: streaming assistant output (Phase 1) repl.ask_ai now drives broker.chat_stream and pumps each delta into renderer.assistant_delta(delta) as it arrives. renderer.assistant_flush is called when the stream ends to add a trailing newline if missing. The full reassembled response is then handed to executor.extract_cmd_lines for the CMD: confirm-and-execute path (unchanged from Phase 0). renderer.assistant() is kept for non-streaming callers (none in tree right now, but cheap to keep around). assistant_delta/flush share no state with assistant(); they use a module-local stream_buf that tracks the in-progress streamed block. Q12 deferred: incremental CMD: highlighting (cursor-positioning re- render on flush) is not implemented in Phase 1 — deltas emit raw. The §6 CMD: marker is still extractable on the reassembled string post- stream, which is what executor cares about. Renderer's bold+cyan treatment for CMD: lines stays available via M.assistant(). Broker error / SSE-framed api-error path still pops the user turn and restores ctx.pending_exec_output. Order: assistant_flush always runs (even on error) so the cursor lands on a fresh line before the broker- error status renders. Live verification: `Count one to ten` against hossenfelder fast streams deltas through to stdout incrementally; CMD: extraction works on the reassembled string; confirm gate intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:17:27 +00:00
marfrit	16490e6905	fix: buffer exec output for next user turn; alternation for strict templates User-test surfaced the bug: with `deep` (mistral-nemo-12b) active, running `list files` -> y on `CMD: ls` -> `Are there directory entries beginning with "lor"?` returned a Jinja exception: api: ... Error: Jinja Exception: After the optional system message, conversation roles must alternate user/assistant/user/assistant/... Cause: §6 specified "exec output injected into context uses role 'user' with a prefix tag '[exec output]'." This works for permissive templates (qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back user/user pair on strict templates that enforce the OpenAI alternation contract — `[exec output]` user turn followed by the user's actual follow-up question. Fix: context.lua: - new field `pending_exec_output` (initially nil) - new method `:append_exec_output(out)` buffers (concat on subsequent captures so multi-shell-then-ai still merges everything) - new method `:append_user(content)` flushes buffered exec output as a `[exec output]\n...\n\n` prefix and appends a user turn - `:reset()` also clears the buffer repl.lua: - run_shell calls ctx:append_exec_output(out) instead of ctx:append({role="user", content="[exec output]\n"..out}) - ask_ai calls ctx:append_user(text) instead of raw :append; saves prev_pending so a broker error can restore the buffer for retry PHASE0.md §6: - amended the role-injection paragraph to describe the buffer-and- prepend policy; the §3 invariants list is untouched (this was a §6 design detail, not a locked invariant) Verification: - context unit tests cover: alternation after the failing sequence, multi-shell merge, reset clears buffer, broker-error retry path - live reproduction against `deep` (mistral-nemo) of the exact user-reported sequence succeeds; model responds with a sensible `CMD: ls \| grep '^lor'` instead of a Jinja exception Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:41:21 +00:00
marfrit	abc993aa49	review followup: empty-input guards, ~/ symmetry, CMD: filter Addresses three concerns + one nit from the Phase 0 review pass. executor.lua: - M.exec guards empty / whitespace-only cmd up front, returns "(empty command)" / -1 instead of running the wrapper on nothing. - On sentinel-parse failure with empty output (typical of shell parse errors — the syntax error itself escapes to the popen parent's stderr because 2>&1 is inside the unparsable subshell), surface "(no output — possible shell parse error)" rather than a silent empty frame. - extract_cmd_lines now skips whitespace-only / empty bodies; a bare `CMD: ` line in assistant output no longer turns into an "execute ''? [y/N]" prompt. - "what" comments cleaned in maybe_chdir. router.lua: - path_like now matches `~` and `~/foo` so `~/scripts/build.sh` classifies as shell (was: ai). Restores symmetry with executor's maybe_chdir, which already expands `~` on `cd`. repl.lua: - :exec and :ask trim args and renderer.status a usage line on empty rather than running an empty cmd / sending an empty turn to broker. Regression: full prior smoke suite still passes — known_commands shell paths, all maybe_chdir branches, CMD: extraction with non-empty bodies, exec exit-code recovery, all router branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:41:35 +00:00
marfrit	e0e69f839b	repl: readline loop, dispatch, all Phase 0 meta commands Phase 0 implementation per PHASE0.md §5, §9. Wires the lower-half modules into a single REPL: ffi/readline -> input + history router -> classify(line) -> meta/shell/ai executor -> run_shell with cd interception, frame output, capture broker -> ask_ai, then extract+confirm CMD: lines from response context -> turn list + eviction; status line on evict renderer -> assistant text + exec frame + status Prompt format `[aish:<model>]> ` per §9. Meta commands all wired (§5.2): :quit/:q, :clear, :reset, :model <name>, :models, :history, :exec <cmd>, :ask <text>, :help. Unknown meta names report via renderer.status rather than crashing. End-of-input (Ctrl-D on empty line) breaks the loop cleanly. Empty / whitespace-only lines are skipped silently before dispatch — router would otherwise classify them as ai with empty payload and pollute context. `CMD: ` extraction + confirm-and-execute is wired: when broker returns an assistant turn, the response is scanned for §6 CMD: lines; each is prompted via readline ("execute '...'? [y/N]") when config.shell .confirm_cmd is true (default), else auto-executed. On broker error, the user turn just appended is popped so the context isn't polluted with a turn that has no assistant response. Smoke covers :help, :models, shell exec via known_commands allowlist, and Ctrl-D break. Live broker exchange deferred per issue #12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:17:40 +00:00
claude-noether	4310207738	Phase 0: scaffold tree + manifest - README, .gitignore, CLAUDE.md (project conventions) - docs/PHASE0.md — full Phase 0 manifest (locked substrate) - 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented with module-scoped responsibilities matching the manifest - config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b snappy/32k + cloud via OpenRouter through hossenfelder) File names match docs/PHASE0.md §4 exactly. Module bodies fill in across later phases; the tree shape is locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:16:07 +00:00

47 Commits