marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	7ef2a6ed5c	broker: token_count + endpoint capability cache (Phase 8 commit #1 ) Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize where supported, char/4 fallback otherwise. Changes: - `M.token_count(model_cfg, text)`: Empty text -> 0. No endpoint -> char/4 immediately. Capability cache says false -> char/4. Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`, 2s timeout. On 200 + parseable `{tokens=[...]}`: cache true, return #tokens. Anything else (non-200 / parse-fail / transport err / timeout): cache false, char/4. - `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1 confirmed /tokenize ignores the model field, so same-endpoint presets share one cache entry. If a future broker honors the model field, revisit. - `M.tokenize_supported(model_cfg)`: returns nil/true/false for the cached state (introspection for tests + future :tokenize meta). - `M._reset_tokenize_cache()`: test hook so the session-local cache doesn't leak between test runs sharing a LuaJIT VM. Live verified against hossenfelder + a deliberately-broken endpoint: - "hello world" -> 2 tokens (matches manual curl probe) - 901-char text -> 201 real tokens vs 225 char/4 (24-token gap; real is LOWER here, opposite direction from the README probe where it was higher — confirms heuristic is inaccurate in both directions) - Pre-probe: tokenize_supported() returns nil - Post-probe: tokenize_supported() returns true (local) / false (broken) - Broken endpoint second call: still char/4, no re-probe - Empty / nil text edge cases handled Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:17 +00:00
marfrit	467e573d24	docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs Sonnet-reviewed per reviews-use-sonnet memory directive. BLOCKERs (RESOLVED in-place): R1. §5 estimate_tokens pseudocode missing per-turn cache pattern. Prose described it; code block called tokenize_fn unconditionally. Implementer following code verbatim would hit the O(N round- trips per call) perf gap the prose flagged. Code block now shows explicit `if t._tokens then ... else t._tokens = ... end`. R2. enforce_budget loop can spin forever when system_prompt alone exceeds token_budget (e.g. 5KB project block + budget=4096 + zero turns -> turns can't shrink further but OR-condition stays true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit 3 row shows the explicit Lua-syntax condition. CONCERNs (FOLDED): R3. :cost detail per-slot ~est=N annotation was semantically undefined — accumulator sum (cumulative across calls + evicted turns) vs current-snapshot estimate are incommensurable. §6 reworked: ONE trailing summary line "[estimated session ctx: N tokens; token_budget=M (X% used)]" instead of per-slot annotations. §13 commit 4 aligned. R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT capture by value). Subtle but easy to miss — §13 commit 4 now spells out the correct vs wrong patterns explicitly. R5. 2s tokenize timeout can spuriously cache-as-unsupported when llama.cpp is busy with a concurrent completion (single-threaded inference; /tokenize queues behind). Documented in §9; v1 ships 2s, revisit during verify if it bites. R6. Per-endpoint cache key conflated two same-endpoint/different- model presets (B1: /tokenize ignores the model field). Cache key simplified to endpoint-only. One probe per endpoint per session; if a future broker honors the model field, revisit. NITs (APPLIED): N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`. N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1). N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach (trailing summary line; per-slot annotation dropped). N4. Status header tree-hash updated to current (`aa64ad3` -> stays fresh through review fold-in; commit 5 will refresh again at "Implement" status). PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for implementation phase 6 of the inner loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:27 +00:00
marfrit	aa64ad3eec	docs/PHASE8: plan — §13 commit roadmap (5 commits) Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1). 5-commit roadmap, bottom-up: 1. broker.lua — M.token_count helper + per-endpoint capability cache. <endpoint>/tokenize probe with 2s timeout; cache true/false per (endpoint, model) for the session. char/4 fallback on any non-200 / parse-fail / transport err. M.tokenize_supported introspection helper. 2. context.lua — Context.new accepts opts.tokenize_fn; estimate_ tokens widens to use it when set, with per-turn `_tokens` cache. char/4 path unchanged when tokenize_fn nil. 3. context.lua — enforce_budget consults token_budget too (pillar 5 from A1). Loop condition: turns>max_turns OR estimate_tokens >token_budget. Existing summarize-on-evict callback unchanged. 4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true. Closure captures active_cfg upval (A5 — follows :model switches naturally). :cost detail extension: trailing line showing estimated session ctx tokens for comparison with the per-slot prompt_tokens sums in the accumulator. 5. config.lua commented `tokenize = { use_endpoint = true }` example + PHASE8.md status -> Implement. Per-commit risk index covers: probe latency cap (2s, one-shot), per-turn cache correctness (immutable post-append), enforce_budget performance (O(N) per call after cache fill), and the intentional behavior change of token_budget actually being enforced (sessions fitting under char/4 may evict earlier under accurate counts — documented in §9). Two items open at plan, resolve at implement: - exact :cost detail layout for estimated session ctx row - whether to add a :tokenize debug meta (defer unless useful in verify) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:24:41 +00:00
marfrit	79bd40db79	docs/PHASE8-baseline: live /tokenize probes Four findings, all align with formulate/analyze: B1. /tokenize IGNORES the `model` request field — returns the tokenization of whichever model is currently loaded on the proxy backend, NOT the requested model. Acceptable: a real BPE count is still much better than char/4, and the gap between Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s regardless, so cloud falls back to char/4 via the capability cache. B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars. Network round-trip dominates. Per-turn _tokens cache amortizes to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time cost on first enforce_budget call. Acceptable. B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs; we use #response.tokens for count, discard the IDs). JSON not SSE; ffi.curl.M.post is the right call. B4. Cloud /tokenize 404s as expected. Capability cache marks it unsupported on first probe; char/4 fallback silent thereafter. No design change. Q-T5 RESOLVED per B1. All open questions now resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:22:05 +00:00
marfrit	1a136d81b7	docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget) Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs resolved in-place (Q-T5 deferred to baseline). MAJOR FINDING: A1. enforce_budget ONLY checks max_turns, NOT token_budget — even with accurate tokenization, eviction decisions are unaffected. The new estimate_tokens() would just feed the prompt template display. Pillar 5 added: enforce_budget evicts when EITHER max_turns OR token_budget is exceeded. This is the real motivation for accurate tokenization. Other findings: A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err). A3. Single caller of estimate_tokens today; enforce_budget becomes the second (more frequent) caller — per-turn _tokens cache becomes important. A4. Q-T1: cache lives on turn dict; dies with turns on :reset. A5. Q-T2: closure captures active_cfg upval; follows :model switch naturally. A6. Q-T3: opt-out skips the probe entirely (no wiring). A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per session; under-count bounded). A8. _tokens cache invalidation: only :reset; turn content is immutable after append. A9. Probe latency ~50ms/call locally; per-turn cache amortizes to O(1) after first count. A10. estimate_tokens called OUTSIDE streaming callback; no race. A11. role:"tool" turns tokenize identically; per-turn cache works. A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal — different endpoints, different code paths. §1 expanded to 5 pillars (pillar 5 = enforce_budget extension). §3 context.lua row updated to reference the enforce_budget change + per-turn _tokens cache. §9 risk row added: accurate counts mean the default token_budget=4096 is finally ENFORCED — sessions that spilled silently under char/4 may now evict earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:21:24 +00:00
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00
marfrit	1f34b6dce8	config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit `3bad07b` (formulate); not re-applied here. config.lua: - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block with parity to the Phase 1-6 example blocks. - Notes warn flags are independent (R4) and per-turn usage flows to session/*.jsonl for after-the-fact analysis. docs/PHASE7.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits inline for traceability: `7364963` broker: usage capture + opts widening `7b4a9be` context: accumulator helpers `8adebd5` repl: _record_usage + opts.category at 5 sites `b30212a` safety + repl: opts.category for Norris + probe `0d6ff93` repl: :cost meta surface this config example + status bump Phase 7 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:55 +00:00
marfrit	0d6ff93134	repl: :cost meta surface (Phase 7 commit #5 ) User-facing reporter of the per-session accumulator. Three shapes: :cost one-line summary (calls / tokens / cost) :cost detail per-model + per-category breakdown :cost reset zero the meter; clears warn flags All read-only against ctx.usage_totals; no broker calls. R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile cost==0 heuristic. Summary line classifies: cloud only -> "cost=$X.XXXXXX" cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens but no cost field)" local only -> "cost=$X.XXXXXX (local only; no cost field)" R7 — :cost detail rows sort by (cost desc, model asc, category asc). Three-level key for deterministic output across equal-cost rows (table.sort is unstable; identical costs would otherwise reorder). R10 — all dollar values use $%.6f formatting. Sub-cent precision is critical: a Haiku call can cost $0.000028; $%.4f would round it to $0.0000 — indistinguishable from local $0. Column width widened to %-26s to fit fully-qualified cloud model names (e.g. "anthropic/claude-haiku-4.5" = 25 chars). E2E verified against live cloud + local broker: :cost (empty session) -> "0 calls, $0.000000" ...after mixed-mode session... :cost -> "5 calls, prompt=472 / completion=26 tokens, cost=$0.000377 (cloud only; local: tokens but no cost field)" :cost detail -> 4 rows: main cloud $0.000219, probe cloud $0.000128, delegate cloud $0.000030, main local $0.000000 (local). Sort by cost desc within model. :cost reset -> "cost meter reset"; subsequent :cost shows zeros. All 5 categories appeared in the same session: main (twice — cloud + local), delegate, probe (x2 from :safety check). Warn-threshold firing already verified in commit #3 + #4. HELP gains 3 :cost lines. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:24 +00:00
marfrit	b30212af0f	safety + repl: opts.category for Norris + probe (Phase 7 commit #4 ) Closes the last two broker call sites that flow through safety.lua. Together with commits #1-#3, all 7 broker call sites in aish now attribute usage to the cost accumulator under the right category. Changes: safety.lua: - llm_probe (the YES/NO destructive checker) — broker.chat call gains opts.category = "probe". Captures (text, usage) via (reply, second) and, when opts.on_usage is provided AND the call succeeded, routes second through opts.on_usage(model, category, payload). N4 signature chain: opts already flowed through llm_second_opinion -> M.is_destructive from #52's work; opts.on_usage rides along naturally with no further signature change. - M.norris_step (Norris main broker round-trip): * opts to broker.chat_stream gains category = "norris" * probe_opts (passed to is_destructive inside the loop) gains on_usage = helpers.on_usage so the LLM probe's cost lands under "probe" too * on_delta wrapper adds elseif kind == "usage" branch that calls helpers.on_usage(payload.model, payload.category, payload). Coexists cleanly with the existing text (rehydrator) and tool_call branches. repl.lua: - Norris helpers table gains on_usage = _record_usage. The R5 central chokepoint (commit #3) does the warn-threshold check AND ctx:add_usage atomically. - :safety check meta's probe_opts always carries on_usage now (independently of whether secrets_session is set). secrets-aware scrub_msgs/rehydrate added conditionally as before. E2E verified against live broker (safety.llm_model = "cloud"): - :safety check ls -la /tmp -> 2 cloud probe calls - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100" - probe category visible in accumulator (would appear in :cost detail once commit #5 ships the meta). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:01:21 +00:00
marfrit	8adebd52cc	repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3 ) Wires broker.lua's on_delta("usage", payload) and broker.chat's (text, usage) return to the ctx accumulator via a single chokepoint. Changes: - Forward decl `local _record_usage` near _bg_spawn — same pattern; the summarize-on-evict closure in make_summarize_fn (built at line 299) needs lexical access to _record_usage (assigned at line 695), so forward-declare and assign-without-`local`. - _record_usage(model, category, usage) — R5 central chokepoint: routes to ctx:add_usage, then checks the per-threshold warn state. R4: cost_warn_state has two independent flags (dollars and tokens) so first-to-fire doesn't suppress the other. R10: warn message uses $%.6f for sub-cent precision. - call_broker wrapper: wrapped on_delta now branches on kind == "usage" -> _record_usage(payload.model, payload.category, payload). R2: keys by payload.model (set inside broker.lua from model_cfg.model). When fallback fires, broker is called with fb_cfg, so payload.model IS the fallback's name automatically — wrapper doesn't track primary-vs-fallback itself. - 5 caller sites wired with opts.category: ask_ai call_broker -> category="main" summarize-on-evict -> category="summarize" DELEGATE: handler -> category="delegate" :memory summarize -> category="memory_summarize" :delegate meta -> category="delegate" - All 4 broker.chat call sites switched from local reply, err = broker.chat(...) to local reply, second = broker.chat(...) branching on reply nil-ness to interpret second (err on failure, usage on success). Captured usage routes through _record_usage. E2E verified against live cloud broker: - cloud prompt -> reply "Hi! 👋" - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010" - R10 sub-cent precision visible in both numbers. Norris + safety paths still untouched — commit #4 wires those. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:00:06 +00:00
marfrit	7b4a9becc2	context: cost/usage accumulator (Phase 7 commit #2 ) Adds the per-conversation accumulator that broker.lua's on_delta("usage", ...) payload feeds into. No callers yet — commit #3 wires the broker callback to ctx:add_usage in repl.lua, commit #4 in safety.lua. Changes: - Context.new: new fields `usage_totals = {}` and `cost_warn_state = { dollars = false, tokens = false }`. R4: two independent flags so warn_at_dollars firing doesn't suppress warn_at_tokens (or vice versa). - Context:add_usage(model_name, category, usage): Increments usage_totals[model_name][category] slot. R6: when usage.cost is nil (local llama.cpp per B3), sets a sticky `is_local = true` flag on the slot AND does NOT add to cost (preserves the local-vs-cloud-zero distinction for :cost detail annotation). When usage.cost is a number (cloud), accumulates. - Context:total_cost() / total_tokens() — pure-Lua summation across all slots; total_tokens returns (prompt, completion). - Context:reset_usage() — explicit :cost reset path; zeros usage_totals AND clears both flags atomically. - Context:reset() — R8 parity: does NOT clear usage_totals OR cost_warn_state. Matches the Phase 4 memory_items / Phase 6 project rule ("ambient context survives a user-driven conversation reset"). Smoke verified (20/20 unit cases): - Empty zeros; cloud cost accumulation; local nil-cost preserves is_local=true sticky; calls counter; cost summation across multiple cloud calls; is_local sticky after a later nil-cost call on a cloud slot; separate slots per (model, category); :reset preserves; :reset_usage zeros both totals and flags. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:56 +00:00
marfrit	7364963b00	broker: usage capture + opts widening (Phase 7 commit #1 ) Foundation for Phase 7. broker.chat_stream now emits a third on_delta kind ("usage") after the stream completes successfully; broker.chat returns (text, usage). Backward-compatible — existing callers that ignore the new kind / second value continue working via Lua's drop-extra-returns semantics. Changes: - build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`. opts.tools / opts.max_tokens / opts.include_usage / opts.category all live inside opts now. Both internal call sites updated. - opts.include_usage defaults to true for streaming requests; sets `stream_options: { include_usage: true }` in the request body. B1: required for local llama.cpp to emit usage; cloud honors as a no-op (emits anyway). - on_event captures `doc.usage` into a closure-local `final_usage`. N1: the check is INDEPENDENT of the choice/delta branches — local emits usage on choices=[] chunks (choice nil) while cloud emits with non-empty choices + finish_reason. Both shapes funnel here. - After curl.post_sse returns successfully (NOT on transport/api errors), if final_usage is set, emit on_delta("usage", {prompt_tokens, completion_tokens, total_tokens, cost, model, category}). cost is nil for local (R6 preserves the nil vs 0 distinction the accumulator needs). model is model_cfg.model — caller-stable per B4 + R2 so call_broker's fallback retry attributes usage to the fallback's model name without wrapper-side tracking. - M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage" alongside "text"; M.chat returns (text, usage). Without this fix 4 of 5 non-streaming categories (summarize / delegate / memory_summarize / probe) would silently report zero usage. Smoke verified against live hossenfelder:8082: - CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/... - LOCAL chat -> (text, usage); cost=NIL (correct per R6), model=qwen-coder-7b-snappy-8k - CLOUD stream -> on_delta("usage", {...}) with category="test" echoed; model name caller-stable. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:14 +00:00
marfrit	d4c20f09df	docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs Sonnet-reviewed (per the reviews-use-sonnet feedback memory). BLOCKERs (RESOLVED in-place): R1. M.chat would silently return (text, nil) for ALL non-streaming callers — 4 of 5 categories (summarize/delegate/memory_summarize/ probe) flow through broker.chat, NOT chat_stream. §4 now shows the explicit M.chat update that captures kind=="usage" alongside "text" and returns (text, usage). R2. call_broker fallback retry would credit usage to the wrong model name. Fix: broker emits payload.model = model_cfg.model (which IS the fallback's name when called with fb_cfg — chat_stream's upvar). Wrapper keys by payload.model, NOT outer model_name. §4 + §13 commit 3 reflect. R3. build_request has TWO internal callers inside broker.lua itself, not just the public surface. Plan §13 commit 1 risk row now spells this out explicitly so the implementer doesn't read "every caller already passes opts" as "external-only". CONCERNs (FOLDED): R4. Single cost_warn_fired flag covers two thresholds — first-to-fire suppresses the other. Split into ctx.cost_warn_state = { dollars = false, tokens = false }; :cost reset clears both. §7 + §13. R5. Warn-check centralization — single _record_usage helper in repl.lua wraps ctx:add_usage AND does threshold check. safety.lua routes via helpers.on_usage / opts.on_usage callbacks. context.lua stays decoupled from renderer. R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains `is_local = true` (sticky) when ANY recorded usage had cost==nil. `:cost detail` annotation comes from is_local flag, not a fragile cost==0 heuristic. R7. :cost detail sort needs 3-level deterministic key: (cost desc, model asc, category asc) — table.sort is unstable. R8. call_broker fallback passes opts.include_usage unchanged. Documented as known assumption (B1 confirms both backends accept; future-broken fallback can pass include_usage=false). R9. :resume does NOT restore historical usage_totals. Per-turn usage IS in session JSONL for scripting; cross-session aggregation is Q-C2 deferred. Documented in §8. R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000). Widened to $%.6f in §6 + §7 warn message format. NITs (APPLIED): N1. §4 pseudocode comment notes `if doc.usage` branch is independent of choice branch (handles both B2 emission shapes). N2. §2 stale "B7" reference corrected to B3. N3. §13 commit 3 row gains explicit dependency note on commit 1's R1. N4. §13 commit 4 spells out llm_probe -> llm_second_opinion -> M.is_destructive signature chain widening. N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree (`3bad07b`); commit 6 must NOT re-apply. PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for implementation phase pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:55:59 +00:00
marfrit	0f14dc1727	docs/PHASE7: plan — §13 commit roadmap Status: Analyze -> Plan. Q-C4 was the last open question pending baseline; now resolved per B1 (stream_options accepted by both backends; required for local). §13 Implementation Plan added — 6 commits, bottom-up: 1. broker.lua: usage extraction from final SSE chunk; build_request signature widening to (model_cfg, msgs, stream, opts); on_delta ("usage", payload); chat returns (text, usage); opts.category passthrough. 2. context.lua: usage_totals + cost_warn_fired fields; add_usage / total_cost / total_tokens helpers; :reset preserves both. 3. repl.lua: wire opts.category at 5 non-Norris call sites (main, delegate x2, summarize, memory_summarize); on_delta("usage") branch routes to ctx:add_usage. 4. safety.lua: wire opts.category for Norris main broker + is_ destructive LLM probe; helpers.on_usage callback convention (no new module dep — matches #52's scrub_msgs pattern). 5. repl.lua: :cost meta surface + warn-threshold check + HELP. 6. config.lua: commented cost example block + PHASE7.md status bump to Implement. Per-commit risk index covers signature-change blast radius, missed call-site lint, and warn-flag one-shot semantics. Lua's multi- return semantics keep broker.chat backwards-compat automatic. Two items left open at plan, resolve at implement: - is_destructive opts.on_usage vs cfg.helpers threading - per-turn verbose mode (deferred; v1 = :cost on demand only) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:50:39 +00:00
marfrit	2244a3f1ee	docs/PHASE7-baseline: live broker probes for usage shape Real probes against hossenfelder.fritz.box:8082 against both backends. Five findings, all align with the formulate/analyze design — no structural changes. B1. `stream_options.include_usage = true` is safely accepted by both backends. REQUIRED for local llama.cpp to emit usage; no-op for cloud (which emits anyway). Default-true is correct. B2. Two emission patterns observed: - Cloud (Bedrock): usage rides the FINAL delta chunk with non-empty `choices` carrying finish_reason. - Local: usage rides a SEPARATE chunk with `choices: []` preceding `[DONE]`. Both shapes are handled by the same `if doc.usage then ...` check; the existing on_event choices-branch short-circuits safely when choices is empty. B3. `cost` field is dollar-denominated (number) and cloud-only. Local returns `timings` instead (perf, not cost). Accumulator captures `usage.cost` as-is; nil treated as 0. :cost detail annotates local lines so $0 isn't misread. B4. `doc.model` in the usage event reflects the upstream-API-version (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by caller-intended `model_cfg.model`, NOT `doc.model`, for stable cross-call comparison. B5. Usage event is always the LAST data event before `[DONE]`. Emission of `on_delta("usage", ...)` happens after curl.post_sse returns — one call per stream, after all text + tool_calls. Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage` to all backends correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:53 +00:00
marfrit	f0bccdec48	docs/PHASE7: analyze — probe broker surface + resolve Qs in-place Status: Formulate -> Analyze (tree at `3bad07b` probed). 11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline): A1. broker.chat_stream surface clean — usage capture via closure-local + on_delta("usage") emission after curl.post_sse returns. A2. 7 caller sites for opts.category threading (probe / norris / summarize / main / delegate x2 / memory_summarize). A3. build_request signature widens to (model_cfg, msgs, stream, opts) to absorb tools / max_tokens / include_usage / stream_options without further positional growth. A4. Q-C3 RESOLVED: free-form categories (caller decides); matches Phase 6 helpers/skills convention. A5. Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call delay). A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only :cost reset clears. A7. Norris call-graph rewires (commit `955bd82`) — secrets streaming rehydrator wraps only "text" kind; new "usage" kind passes through unchanged. No new entanglement. A8. ctx.usage_totals survives :reset (R8 parity with memory_items, project). A9. Session JSONL inherits the new field automatically (dkjson opaque encoding). A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage. Real probe required for local model — baseline action. A11. Q-C4 deferred to baseline (real broker probe). §2 build_request row updated to mention the A3 refactor. §11 Open Qs table now shows all 6 with resolutions; only Q-C4 remains as a baseline-time probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:03 +00:00
marfrit	3bad07b2da	docs/PHASE7: formulate — cost / usage observability Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7 row (substrate amendment per CLAUDE.md §3, lands in the same commit). Four pillars: 1. Usage capture in broker.chat_stream — extract `usage` from the final SSE chunk (OpenAI streaming spec with `stream_options: {include_usage: true}`). Surface via new on_delta("usage", payload) kind. broker.chat returns (text, usage) — backward- compat: existing callers ignore the second value. 2. Per-session accumulator on ctx — ctx.usage_totals[model][category] tables (categories: main / delegate / summarize / memory_summarize / probe / norris, tagged at the call site via opts.category). :reset preserves usage_totals (R8 parity with memory_items / project). Session JSONL gains an optional `usage` field on assistant turns for after-the-fact analysis. 3. :cost meta surface — :cost (summary), :cost detail (per-model + per-category breakdown), :cost reset (zero the meter). Pure-Lua read of ctx.usage_totals; no broker calls. 4. Optional warn thresholds — cfg.cost.warn_at_dollars / warn_at_tokens emit a one-shot status when crossed. Default off; useful with cloud presets configured. Doc covers scope + done-when criteria, tech decisions table, module changes, per-pillar deep dive with code sketches, UX surface, out of scope, risks, 6 open questions to resolve in analyze. Open at formulate: Q-C1 — provider-without-usage handling (local llama.cpp probably) Q-C2 — cross-session persistence (defer to phase 8) Q-C3 — categories closed-set vs free-form Q-C4 — does hossenfelder forward stream_options to all backends? Q-C5 — warn fires on the call that crosses, or the next one? Q-C6 — :reset clears cost_warn_fired too, or only :cost reset? Scope confirmed via AskUserQuestion: cost/usage observability (chosen over project-local config overlay and session search/tag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:47:58 +00:00
marfrit	955bd82efb	safety + repl: wire secrets into safety.lua (closes #52 ) Closes the last #13 gap — Norris broker call + is_destructive LLM second-opinion probe were the two egress points NOT covered by the scrub-at-egress design in commit `d852aca`. Approach: option (b) per #52's fix sketch — callback-via-helpers/opts. safety.lua does NOT gain a require("secrets") dependency (acceptance criteria 3); integration is purely through the convention the rest of the helpers table already uses. safety.lua changes: - llm_probe gains an opts table. When opts.scrub_msgs is set, the {system, user(cmd)} message pair is scrubbed before broker.chat. When opts.rehydrate is set, the YES/NO reply is rehydrated before parsing (defensive — the verdict shouldn't carry placeholders but rehydration is a safe no-op if it doesn't). - llm_second_opinion threads opts through to llm_probe. - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is backwards-compatible (no scrub, original behavior). - M.norris_step: * outbound broker.chat_stream message scrubbed via helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided. * on_delta wrapped with helpers.streaming_rehydrator():push / :flush so the user sees rehydrated text AND text_parts accumulates rehydrated chunks (parity with ask_ai in repl.lua). * both M.is_destructive call sites (tool_call probe + CMD: probe) now pass probe_opts = {scrub_msgs, rehydrate} when the helpers carry them. repl.lua changes: - Norris helpers table gains scrub_msgs / rehydrate / streaming_rehydrator closures, all nil-safe (return identity / nil when secrets_session is nil). - :safety check meta passes probe_opts to is_destructive when secrets_session is configured. Without secrets, behavior unchanged. Unit-test verified end-to-end: - Stubbed broker.chat captures the messages it receives. - Without opts: probe SEES `ghp_realsecretvalue_...` (control). - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:40:30 +00:00
marfrit	ac58b19da2	config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: `c4fc7fd` context: compose_project plumbing `d1dce83` _scan_project_tree + :tree + auto_tree hook `4d5f93a` :diff + _git_clean_cmd (B1 helper) `0d63f01` expand_mentions @<r1>..<r2> tiered resolution `11d0e59` tree-sitter highlighter (renderer fence filter + highlighted dispatch + :highlight meta) this config example + status bump Phase 6 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests against the live broker on each pillar plus filing of issues for any defects, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:58 +00:00
marfrit	11d0e599cd	repl + renderer: tree-sitter highlighter (Phase 6 commit #5 ) The largest Phase 6 commit — fence-aware stream filter in renderer.lua + external tree-sitter dispatch + :highlight meta in repl.lua. renderer.lua — fence-aware filter wrapping assistant_delta: M.set_highlight(enabled, detected, highlight_fn) Called by repl.lua at startup AND on every :highlight toggle. Stores state in module-locals (off by default). State machine inside _hl_push: outside: pass chunks through; HOLD trailing partial-fence chars (per R1 — local llama.cpp splits ```python as `'``'` then `'`python\n'`, so naive pass-through drops the leading "``" and never recovers). inside: buffer cumulatively until "\n```" appears; emit highlight_fn(body, lang) then the closing fence verbatim. Recursive call handles "rest" after the closing fence. N1: fences only open at start-of-stream OR after a newline (`^```` or `\n```` only). Inline backticks in prose ("use ``` to mark code") do not open a fence. R3 (PTY raw-mode toggle per highlight call): no change here — every executor.exec call already toggles raw-mode (existing behavior since Phase 1). The risk is theoretical; smoke-test interactively after install if multi-fence renders show flicker. assistant_flush handles end-of-stream gracefully: drains any held partial-fence tail OR an unterminated inside-fence buffer. repl.lua — _detect_treesitter + highlighted + :highlight meta: _detect_treesitter() one-shot popen probe of `tree-sitter --version`. Run once at startup; cached as highlight_detected. highlighted(body, lang_tag) R2-placed in repl.lua (has _shq + executor access). Translates the fence tag (`py`, `python`, `lua`, etc.) to a canonical lang via LANG_TAG, picks the canonical extension via LANG_EXTENSION, writes body to a tmpfile with that extension, runs `tree-sitter highlight <tmpfile>` via executor.exec, returns the output. On ANY failure (CLI absent, non-zero exit, empty output), returns `body` unchanged — silent pass-through. R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help` on noether; confirmed: - NO `--lang` flag exists (formulate-time assumption wrong) - takes a PATH; language inferred from file extension - alternative `--scope source.X` exists but also unreliable without configured grammars Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]` and pass the path. Matches the documented upstream contract. B4-followup: even with the CLI installed, highlighting requires `~/.config/tree-sitter/config.json` parser-directories with cloned + built `tree-sitter-<lang>` grammars. Without parsers, every call exits non-zero and we silently pass through. The :highlight install hint surfaces all three install steps so the user knows what's actually needed. :highlight [on\|off\|status] meta: no arg -> flip on/off -> set explicit status -> report toggle + CLI detection state When toggled on AND CLI absent: emit a 4-line install hint (CLI install, init-config, grammar clone reminder). When toggled on AND CLI present: emit a 1-line note that parser-directories must be set up for actual highlighting. HELP gains :highlight entry. Tested: 10/10 unit cases on the renderer state machine, including: - plain prose passthrough - single-chunk fence - B2 split fence ("``" + "`python\n" + "x=42" + "\n```") - N1 SOL anchor (mid-line ``` does not open) - trailing \n properly emitted across chunks - SOL-only fence open - prose after closing fence preserved - two fences in one stream - highlight off = passthrough (callback never fires) E2E :highlight meta verified: :highlight status -> off / detected :highlight on -> toggles + emits parser-dir reminder :highlight status -> on / detected :highlight off -> off Regression: test_safety 87/87, test_router_model 31/31, repl loads. Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config example block + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:04 +00:00
marfrit	0d63f01601	repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4 ) Per A6 (tiered resolution): @<token> tries file lookup first; if the file doesn't exist AND the token contains "..", retry as a git ref-range and substitute with a fenced `diff` block. Preserves the existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels the comma, resolves the ref, restores the comma after the closing fence). Resolution order for @<token>: 1. io.open(token, "rb") -- file lookup, with trailing-punct peel 2. if (1) fails and token contains "..": git --no-pager -c color.ui=never diff <r1>..<r2> on exit 0 + non-empty body: substitute as ```diff fenced block 3. else: leave literal `@token` + emit "[aish] @X: not found" status Examples: @README.md -> file (path branch) @../sibling.txt -> file (path branch; `..` only triggers retry when path lookup FAILS, so existing paths with `..` segments are unaffected) @HEAD~1..HEAD -> diff (path fails, ref succeeds) @origin/main..feature -> diff (path fails — no such literal file; ref succeeds; `/` in ref is fine because we don't use the path's `/`-absence as a discriminator) @nonsense..gibberish -> literal preserved (both fail) Required restructuring: - _shq and _git_clean_cmd lifted from M.run closure scope to module scope (above expand_mentions). Single source of truth for the B1 prefix shared with commit #3's :diff. The in-M.run duplicates are removed. - expand_mentions now references `executor` (already required at module scope on line 7) for the diff retry. Status messages updated: - File expansion: "@<path> expanded (N bytes, truncated)" (existing) - Diff expansion: "@<path> expanded (N bytes, diff)" (new) Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14): ref-range expansion shape, body contains `diff --git`, trailing prose preserved, @../path stays as file (not diff), neither-path- nor-ref preserves literal, trailing-comma peel composes with ref retry. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:20:25 +00:00
marfrit	4d5f93aaa5	repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3 ) User-driven git diff injection. The model sees the diff on the next ask_ai turn through the existing exec_output channel. Changes: - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree. B1: every git invocation that flows into context MUST use `--no-pager -c color.ui=never`. Forkpty makes git think stdout is a TTY, enabling both color and the pager's keypad/line-clear escapes — these would pollute the captured context block. The helper is the single chokepoint; commit #4's @<r1>..<r2> retry will reuse it. - :diff [<args>] meta: - Reads cwd at meta invocation (R6: differs from :tree's scan-time cwd capture; documented in §5). - Runs `_git_clean_cmd("diff " .. args)` via executor.exec. - Empty output -> "(no diff): <label>" status, no context append. - Non-zero exit -> "diff failed (exit N): <label>" status, no context append. git's stderr already streamed to the user via executor.exec's live multiplex, so the failure reason is visible. - Success -> appends "[diff <label>]\n<output>" via ctx:append_exec_output. Label is "(working tree)" for empty args, else verbatim args. - Status confirms injection size: "diff injected: <label> (N bytes)". - HELP gains :diff line with three example arg shapes; N3-resolved (no `staged` alias — the meta is thin pass-through to git's grammar). Smoke verified across four scenarios in an ephemeral test repo: - Working-tree dirty -> 110-byte diff injected, no ANSI escapes - --cached -> 118-byte staged diff injected, clean - garbage..nonexistent -> exit 128, status + skip - Clean working tree -> "(no diff)", status + skip Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:17:18 +00:00
marfrit	d1dce832da	repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2 ) First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '/.' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>\|refresh\|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:14:36 +00:00
marfrit	c4fc7fde01	context: [project] block plumbing (Phase 6 commit #1 ) Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:08:54 +00:00
marfrit	261b230be8	docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs Independent agent review of PHASE6 (manifest + baseline + plan at `4407029`). Status header: Plan -> Plan + review fold-in. BLOCKERs (RESOLVED in-place): R1. §4 fence detector's `outside`-state dropped the leading `'``'` chunk of a split fence — contradicted B2's local-model split-fence requirement (4-char median chunk size). Algorithm rewritten: outside-state now holds a tail (up to 10 chars) when the chunk's suffix could be a fence prefix; flushes on next push. Same accumulator pattern as the secrets streaming rehydrator. R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives in repl.lua (where _shq and executor are accessible); renderer.lua exposes set_highlight(enabled, detected, highlight_fn) and calls back. Keeps renderer.lua free of the executor require. CONCERNs (FOLDED): R3. PTY raw-mode toggle on every code-block render — smoke-test for cursor flicker / SIGWINCH races before locking in. Risk row 5. R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream CLI canonically takes a path with extension. Implement-time check required; fallback path documented (extension-based tmpfile + path arg). Added to risk row 5 + open-at-plan. R5. :tree off semantics clarified — one-shot clear of ctx.project + ctx._project_opts; no "disabled" flag. R6. cwd-coupling difference between :diff (call-time) and :tree (scan-time) now documented in §5. R7. :tree refresh opts caching specified — caches ctx._project_opts; `:tree refresh` reuses last explicit opts. R8. :reset preserves ctx.project (parity with memory_items per Phase 4). §12 commit 1 smoke updated. R9. Status-bump duplication between §12 commits 5e and 6 resolved — commit 6 owns the bump. NITs (APPLIED): N1. §4 algorithm pseudocode now includes SOL/post-newline anchor (mid-line backticks in prose don't open a fence). N2. _detect_treesitter() gained a comment explaining the popen pattern doesn't gate on exit code (B3). N3. :diff staged shorthand dropped — meta is a thin pass-through to git's own grammar. N4. _scan_project_tree switched from `cd && git ...` to `git -C <dir> ...` — no subshell, more idiomatic. N5. Open-at-plan dir-arg bullet dropped (already decided in §6); replaced with R3 + R4 implement-time verification items. N6. §11 wording on #52 left as-is (cosmetic only). PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for implementation phase 6 of the inner loop pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:06:19 +00:00
marfrit	4407029296	docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap Status header: Analyze -> Plan. Baseline findings folded into the design sections: §1 (highlighter pillar) gains B4: tree-sitter absent on every probed host; :highlight on emits install-hint when missing. §4 (highlighter sketch) revised per B3: io.popen():close() doesn't expose exit codes in LuaJIT. Route via executor.exec("cat tmp \| tree-sitter ...") which uses pty.spawn+waitpid and returns code reliably. Tmpfile design retained (avoids ARGMAX + shell-escape). §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git invocation must use `--no-pager -c color.ui=never` to suppress the color/keypad/line-clear escapes forkpty triggers. Factored recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff and the @-mention diff retry. New §12 Implementation Plan — 6 commits, bottom-up: 1. context.lua: ctx.project + compose_project + composition order 2. repl.lua: _scan_project_tree helper + :tree meta 3. repl.lua: :diff meta + _git_clean_cmd helper (B1) 4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6) 5. renderer.lua + repl.lua: tree-sitter detect + fence filter + :highlight meta (B3-revised tmpfile dispatch) 6. config.lua project example + status -> Implement Per-commit risk index + smoke criteria. Highlighter (commit 5) is the largest experimental surface — placed last so the rest of Phase 6 ships even if highlighter slips. Order is independent enough that swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each commit individually green. Things deliberately not split: _shq reuse, lang map duplication for v1, streaming-rehydration order (rehydrate -> highlight -> emit inherits naturally from existing chunk pipeline). Two items open at plan time, resolve at implement: _scan_project_tree dir-arg vs hardcoded getcwd; :highlight status probing tree-sitter --print-langs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:01:40 +00:00
marfrit	9f50206ca6	docs/PHASE6-baseline: substrate probes ahead of implementation Six findings from probing the world before tree-sitter / diff / project tree implementation lands: B1. `git` subcommands through executor.exec emit ANSI color + DEC keypad/line-clear escapes by default (forkpty enables interactive mode). `:diff` impl MUST use `git --no-pager --color=never <args>`. Same flags apply to any future git verbs. B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks (median 4 chars, max 13) AND splits code fences across boundaries (`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers big chunks (median 26 chars), fences intact. The §4 fence-aware filter accumulator design covers both — confirmed necessary by local-model behavior. B3. LuaJIT io.popen():close() does NOT return exit codes — Lua 5.1 contract, not 5.2+. Breaks the A4 highlighter resolution. Revised: route via `executor.exec("cat tmp \| tree-sitter ...")` which uses pty.spawn + waitpid and returns (out, code) reliably. B4. tree-sitter CLI absent on both probed hosts (noether, higgs). Highlighter is opt-in by design; absent-CLI path should emit a clear install hint, not silently no-op. B5. Project-tree envelope: aish 32 files / 449 chars; similar local repos 15-25 files; scan time ~1-5ms. The 4096-char default cap accommodates ~290 typical paths. Large repos handled via tree_depth or cap tuning per existing §9 risk row. B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for the B3-revised tmpfile-roundtrip pattern. No structural changes to formulate/analyze. B1, B3, B4 will fold into PHASE6.md §4 / §5 / §1 during plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:58:56 +00:00
marfrit	ad52fe4538	docs/PHASE6: analyze — substrate probes + Q resolutions in-place Analyze pass against tree at `f596743`. All 6 formulate-time questions resolved without structural changes; pillar shapes intact. A1. renderer.lua surface clean — assistant_delta/flush accumulate via stream_buf; fence-aware filter slots in between chunk receipt and emit without touching anything else. A2. executor.exec via pty.spawn already handles git diff / find; cwd-aware (inherits from libc.chdir). No new IO model. A3. context composition order locked: base + [background] + [earlier summary] + NORRIS. [project] inserts between [background] and [earlier summary]; Norris-suppression guard inherited. A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3 (io.popen("w") + redirect stdout to tmp file; io.open reads back). Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per code block. A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only; matches :history / :sessions / :safety check. A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first, then ref-range retry when path fails AND token contains "..". @origin/main..feature works naturally; @../sibling.txt unaffected. A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1. @-mention echo via readline is a different code path; deferred to v2 (added to §8 out-of-scope). A8. Q-T1 RESOLVED: project tree captured at scan time, not auto- refreshed on cd. v1 verb is :tree refresh; cd-intercept auto- refresh deferred to v2. A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard` in repos; find fallback outside. Custom globs deferred to v2. A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD, peels comma cleanly and the diff retry catches the cleaned token. A11. Auto-injection ordering: memory load → tree scan → first ask_ai. Composition reads memory facts before file tree. A12. [project] Norris-suppressed (parity with R-C1/R-C4). §3 module-changes table: context.lua row updated (project string + compose_project + ordering note + Norris suppression). §4 highlighter code sample replaced with the tmpfile-roundtrip resolved form. §5 @-mention section rewritten as tiered-resolution with worked examples. §8 out-of-scope gained three v2-polish items (echo highlight, cd- intercept auto-refresh, custom globs) so they're tracked. §10 Open Questions table now shows all 6 Qs with their resolutions inline. §9 Risks row for @-mention collision updated to point at A6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:53:58 +00:00
marfrit	f596743834	docs/PHASE6: formulate — tree-sitter highlight + diff + project tree Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6: 1. Tree-sitter syntax highlighting hooks External `tree-sitter` CLI when present, no-op otherwise. Honors PHASE0 §3 (no compiled extensions). Toggleable at runtime; off by default so existing UX is unchanged. 2. Diff-aware code injection :diff [args] meta + @<ref1>..<ref2> @-mention extension. Shells out to `git diff`; output flows through the existing exec-output context channel. 3. Project-level file-tree context :tree meta + optional cfg.project.auto_tree startup inject. git ls-files in a repo, find fallback otherwise. Composed into the system prompt as a new [project] block between [background] and [earlier summary]. Suppressed under Norris (R-C1 / R-C4 parity). Module changes: renderer.lua (fence-aware highlight filter), context.lua (compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions extension). No new module files in v1. Doc covers: scope + done-when criteria, tech decisions table, module changes table, per-pillar deep dive with example code, UX surface summary, out-of-scope list, risks, and 6 open questions to resolve in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree). Scope confirmed via AskUserQuestion: all three subsurfaces in scope; tree-sitter approach is external CLI w/ no-op fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:47:00 +00:00
marfrit	d852acadc2	repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args Plumbs the secrets.lua module (commit `e4b818b`) into the conversation pipeline. Hook points: ask_ai — scrub_messages(ctx:to_messages(), mode) before call_broker; rehydrate streamed deltas via streaming_rehydrator so the user sees real values while text_parts accumulates rehydrated chunks (final_resp is plain — CMD: / DELEGATE: extractors see plain values) MCP dispatch — dispatch_tool_call rehydrates the args table before sess:call_tool so the trusted MCP server receives real values (the model emitted placeholders because it saw a scrubbed context) DELEGATE: & :delegate — scrub sub_msgs before broker.chat; rehydrate sub_text before appending to context, so future turns see real values restored Phase 5 summarize-on-evict — scrub sum_msgs before broker.chat; rehydrate the reply that becomes ctx.summary :memory summarize — same scrub + rehydrate pair Mode resolution per call: model_cfg.redact → config.secrets.default → "vault+autodetect" if vault loaded, else "off". ctx storage convention: PLAIN values throughout. The scrub happens at the egress (broker call) per the active redact mode; ctx.turns never holds placeholders for content the user typed or executor produced. The model's own emissions (assistant tool_call arguments) may carry placeholders because the model saw the scrubbed context — rehydrated at MCP dispatch and otherwise harmless on re-serialization (idempotent re-scrubbing). New meta: :secrets [status] vault entries, placeholders allocated this session, active broker mode. Never prints actual values (vault file is itself a secret per gotcha 7). :secrets check <text> dry-run scrub against the active broker's mode — shows the output transformation. Documented in config.lua with a commented-out block + per-broker redact field example. Deferred to a follow-up issue (clearly scoped): - safety.lua broker call sites (Norris main loop, is_destructive LLM second-opinion probe) — same wiring pattern, but they don't currently see secrets_session; needs threading through helpers. - @-mention file content is appended PLAIN to ctx and scrubbed at egress alongside the rest of the user turn (covered by the ask_ai scrub). - exec output streamed live to terminal is pre-scrub (user sees real values in their own shell — by design); the captured-for- context copy is scrubbed at egress alongside the rest. This is the "full scope" implementation chosen via AskUserQuestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:38:23 +00:00
marfrit	e4b818b0e9	secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1) Standalone module — no wiring yet. Lands the substrate for issue #13: secrets.load(path) — vault file loader; refuses non-0600 secrets.make_session(vault) — per-conversation scrub/rehydrate state session:scrub(text, mode) — substitute literals (+ autodetect) session:rehydrate(text) — restore placeholders secrets.streaming_rehydrator — chunk-boundary-tolerant streaming wrapper Mode semantics (chosen per call by the caller): "off" — identity, no mapping "vault" — vault literals only, placeholders, rehydratable "vault+autodetect" — + heuristic regexes, placeholders, rehydratable "stealth" — + heuristic regexes, opaque decoys, one-way Placeholders are stable across the session: the same literal always maps to the same $AISH_SECRET_NNN slot, so re-scrubbing the same context is idempotent and the model sees a consistent vocabulary. AUTODETECT_PATTERNS (ordered; longer prefixes first): sk-or-v<N>-... OpenRouter ghp_/gho_/ghs_ GitHub PATs AKIA<16> AWS access keys eyJ...x.y.z JWTs sk-... OpenAI (generic; matched after openrouter) -----BEGIN ... PRIVATE KEY----- SSH/GPG key headers Streaming rehydrator: tolerates a placeholder split across SSE chunks ($AISH_SE then CRET_001). It holds back the trailing partial-match in a buffer, emits the rest, and resolves on the next push or flush. Verified with 20 unit cases (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder $-prose pass-through). Vault file mode enforcement: 0600 only — matches ssh's behavior for ~/.ssh/id_rsa. Loud failure (status + skip) if mode is wider. Next commit (issue #13 follow-up): wire into broker / tool dispatch / display, add per-broker `redact` policy, :secrets meta, config example block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:36:39 +00:00
marfrit	cdf4e86679	repl: sub-broker delegation via DELEGATE: marker (closes #6 ) Cost and context-window control: a "heavy" preset's model can offload work to a cheaper preset without spending its own tokens on the result. Example: deep model is mid-conversation and asks fast to summarize a 20k-line build log; the summary comes back as exec-output for the next turn, deep stays small. Marker syntax: DELEGATE: <preset> "<prompt>" (Single or double quotes; one DELEGATE per line; lines without the quoted shape are dropped — let the user write about delegation in prose without accidental dispatch.) Dispatch flow (mirrors CMD: / CMD&: extraction): 1. ask_ai's stream completes 2. extract_delegate_lines walks the final response 3. For each {preset, prompt}: broker.chat(config.models[preset], ...) synchronously; result is appended via ctx:append_exec_output as "[delegate <preset>]: <result>" 4. The model sees the delegate result on its next turn Implementation choice — marker over tool: option 1 from the issue ("inline delegate marker") works with any model regardless of tool_calls support. Option 2 (aish_delegate as a tool dispatched in the existing Phase 2 sub-loop) is the better UX for capable models since it returns the result mid-turn — filed as follow-up if needed. Meta surface: :delegate <preset> <prompt> one-shot direct invocation (useful for testing without depending on the model emitting DELEGATE:, and as a manual "ask <preset> something" verb) Scope: - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch - Norris: not extended; the planner's model anchor would conflict with mid-plan switching (R-C3-adjacent risk) - No self-delegation guard: each DELEGATE is a separate broker call, not recursive; a delegate result reaching the next turn could contain another DELEGATE but that's bounded by max_tool_depth-style iteration cap on the parent - No cost prompt: configuring a paid cloud preset already implies consent to spend on it - Unknown preset → error status + exec-output note "[delegate X failed: unknown preset]" Extractor unit-tested with 8 cases (single-quote, double-quote, multi- line prose, empty prompt, no-quotes, case-sensitive, wrong prefix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:29:09 +00:00
marfrit	f94d16fc89	repl: background CMD&: with handle/poll (closes #8 ) Builds, long-running network calls, and file watches no longer block the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL to spawn the command in the background, return immediately, and poll for completion between user inputs. Process model: shell-wrapped to avoid needing fork()/execv() FFI. nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null >/dev/null 2>&1 & echo $! The child is reparented to init; we hold only the PID and the path to the .status sidecar. Completion is detected by the .status file existing (the wrapper writes it as its last act). No waitpid needed — the child isn't ours after the popen subshell exits. Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is created lazily at startup (mkdir -p). Requires history.dir to be configured; without it CMD&: emits an error status and the model sees an "[bg failed to start]" exec-output note. check_bg_done() runs at the top of each main-loop iteration alongside check_every_due(). When a job is detected as exited, the REPL: - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>" - appends the same string to ctx as exec output, so the model sees the completion on its next turn (natural follow-up: "ok the build finished; let me check the log") Meta surface: :bg-spawn <cmd> start a bg job directly (no AI needed; also useful for testing without depending on the model emitting CMD&:) :bg-list show running/done jobs (id, pid, state, runtime, cmd) :bg-output <id> dump the log file to stdout :bg-kill <id> SIGTERM (note: only delivers if the PID is still the actual command — long-lived shells may need pkill by name) Scope (deliberately limited for v1): - No callback-mode readline: bg completion detection is pre-prompt, not mid-readline. If a build finishes while the user is typing, notification comes when they hit Enter. - Permission policy DSL (#9) does NOT apply to CMD&: — the asynchronous gating model wasn't designed for the y/N flow. Filed as follow-up if needed. - Norris not extended: helpers.exec_cmd is still synchronous; the planner doesn't dispatch bg jobs. - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>" and a "[plan] would bg-run: <cmd>" exec-output note, no spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:25:55 +00:00
marfrit	67d80e1047	repl: :every recurring prompts via pre-prompt due-check (closes #11 ) In-session timer that re-injects a prompt every N seconds. "Watch this thing" workflows (`:every 5m "check journalctl -u nginx for errors"`) without spawning a separate aish process. Approach: minimum viable. check_every_due() runs at the top of each main-loop iteration — timers fire BETWEEN user inputs, not during readline waits or active broker calls. Mid-stream firing would require rewriting ffi/readline to callback mode (substantial scope). If the on-the-fly firing requirement matters in practice it can land as a follow-up issue against the readline FFI. Meta: :every <interval> <prompt> schedule (interval: 30s \| 5m \| 2h \| bare int) :every list show jobs (id, interval, time-until-next, model, prompt) :every cancel <id> remove Defaults: - Model: "fast" preset if defined in config.models, else active model (per the issue's "recurring prompts should default to fast preset"). - In-memory only — jobs don't persist across restarts. - Suppressed while ctx.norris_active (planner stays on goal anchor). - Quotes around the prompt are stripped if present. - Each tick fires the job once, re-schedules next_fire = now + interval (no catch-up if the interval elapsed multiple times during a long user input). Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed), load via require, end-to-end :every list / cancel surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:23:07 +00:00
marfrit	17e62c0326	safety: permission policy DSL — allow/confirm/deny rule lists (closes #9 ) The confirm_cmd boolean was too coarse: true interrupts every harmless ls; false ungates everything. Most workflows want trust for read-only ops while still gating writes/network/sudo. New config: permissions = { allow = { "^ls%s", "^cat%s", "^git status" }, confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" }, deny = { "^ssh%s+root@", "^curl%s+http[^s]" }, } Verdict order: deny > confirm > allow. First match in the chosen category wins. Unmatched defaults to "confirm". Patterns are Lua patterns (not regex) per PHASE0.md §3 — no compiled extensions. Verdict behavior in the interactive CMD: loop: - allow → run without prompt - deny → status line, skip - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true) Backward compat: - permissions unset + confirm_cmd=true → always confirm - permissions unset + confirm_cmd=false → always allow - permissions set → policy table is authoritative Scope deliberately limited to the interactive AI-suggested CMD: gate. Norris autonomous mode keeps its own safety.is_destructive machinery (combining the two would double-gate or replace the LLM probe — both non-obvious behavioral changes that belong in their own issues). User-typed shell-routed lines (`router.classify → "shell"`) and :exec also bypass the policy by design — those are direct user intent. New introspection: :perms list — show the configured rule lists :perms check <cmd> — report verdict + matching rule (debug) safety.classify_command is exported and unit-tested with 12 cases covering each category, priority order (deny > allow on overlap), and both fallback paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:20:56 +00:00
marfrit	518c01a9f5	repl: user-defined skills loader (closes #2 ) PHASE0.md §5.2 froze the meta-command set at compile time. Skills let the user package repeatable workflows (project queries, prompt templates, audit routines) without forking aish. Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever $AISH_SKILLS_DIR points at — used both by users with non-XDG layouts and by CI). Each module exports: return { name = "<meta-cmd-name>", -- must match [%w_-]+ description = "<one-line>", -- shown by :skills run = function(args, h) ... end, } Helpers passed to run(): h.ask(text) — same path as :ask (with @path expansion) h.status(s) — emit "[aish] s" h.exec(cmd) — run a shell command (subject to plan_mode, hooks) h.model() — current active model name h.ctx — raw Context object (advanced) h.config — the loaded config table Validation rejects modules that miss name/run, use whitespace in the name, or collide with an existing meta command (built-in or earlier skill). Each rejection emits a status line so the user sees why a skill didn't appear. New meta command :skills lists what's loaded (sorted, with description). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:00 +00:00
marfrit	fb15f7a690	repl: pre/post CMD hooks via config.hooks (closes #3 ) Optional shell scripts trigger around every CMD: execution. Use cases: audit logging, auto-format-after-edit, custom safety gates beyond the existing confirm_cmd boolean. Config shape: hooks = { pre_cmd = "/path/to/pre-script", post_cmd = "/path/to/post-script", } Contract per hook invocation: - The command line is piped to the hook on stdin. - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the moment of dispatch), AISH_CWD (libc.getcwd() result). - Hook stdout is streamed live to the terminal via executor.exec (so the user sees its output regardless of exit status). Pre-hook: non-zero exit aborts the command and emits a status line including the exit code. last_exec_code is set to the hook's exit so the {last_status} prompt template variable reflects the abort. Post-hook: exit code is ignored (the spec says so); only the visible stdout matters. Runs after the command's exec_end frame. Tested with success, abort, and stdin-matches-env paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:16:11 +00:00
marfrit	ce1378edee	repl: fix {name} pattern to accept underscores (#10 follow-up) %w excludes underscore in Lua patterns, so {ctx_used}, {ctx_max}, {cwd_short}, {last_status} were left literal in the prompt. Use [%w_] to accept identifiers with underscores. Surfaced during higgs smoke test of the new template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:57 +00:00
marfrit	d738f339cb	repl: configurable prompt template via config.shell.prompt (closes #10 ) At-a-glance situational awareness: see the active model, context fill, mode flags, and cwd in the prompt itself — prevents "wait, am I still in plan mode?" surprises. Example config: shell = { prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ", } Variables (substituted via {name}): {model} active preset name {ctx_used} char/4 token heuristic (Phase 0 §8; accurate is Q1) {ctx_max} config.context.token_budget {turn} #ctx.turns {cwd} libc.getcwd() (chdir-aware; PWD env may drift) {cwd_short} cwd with $HOME -> ~ {last_status} last exec exit code, "" if none yet {mode} "norris" \| "plan" \| "normal" Default behavior unchanged when shell.prompt is unset — keeps the "[aish:<model>]>" form with norris ⚡ and plan markers. Side wiring: - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD). - run_shell records exit code into last_exec_code for {last_status}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:43 +00:00
marfrit	10d2501cff	repl: peel trailing punctuation from @path mentions (#7 follow-up) Natural-language prose like "look at @README.md, then..." or "@foo.lua." at sentence end previously failed to expand because the trailing comma/period was included in the path. Now: if the raw token doesn't resolve, peel trailing chars from [.,;:?!)] one at a time until the path resolves or no more peels are possible. On success, the peeled chars are emitted verbatim AFTER the closing fence so the original punctuation is preserved. Surfaced during higgs smoke test (TC: "say the first line of @README.md, then stop" — the trailing comma broke resolution). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:11:22 +00:00
marfrit	bb374c2ad2	repl: @path mention expansion in input lines (closes #7 ) Saves the user from manual copy/paste: typing "show me @repl.lua" or "compare @config.lua and @config.example.lua" auto-expands each mention to a fenced code block carrying the file contents, language-tagged by extension, and feeds the composed text to the broker. Wired on the "ai" branch of the input loop and inside :ask. Meta and shell branches pass through unchanged — "@foo" in shell context is a literal program argument; meta commands store text verbatim. Trigger rule: "@" must follow start-of-string or whitespace — avoids false positives on email addresses ("user@example.com") and shell short-options. Path extends to next whitespace. Other behavior: - Language tag derived from extension via a small lookup; unknown extensions yield an untagged fence. - Files over 32 KB are truncated head/tail (16K + 8K) with a marker. - Missing files leave the literal "@path" token in place and emit a "[aish] @path: not found" status — non-fatal, lets the user correct the path and re-type. - Each successful expansion emits "[aish] @path expanded (N bytes [, truncated])" so the user sees what was inlined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:10:54 +00:00
marfrit	dccd9e90cc	repl: :plan toggle — CMD: lines become PLAN: notes (closes #5 ) Plan mode is a safer entry point than going straight to Norris: the user iterates with the model on what to do, sees each CMD: as a PLAN: line, and the would-have-run notes feed back into the next-turn context so the model can refine without side effects. Toggle with :plan (flip), :plan on, :plan off. Off by default. When plan_mode is true: - CMD: lines extracted from the assistant turn print as "PLAN: <cmd>" - The note "[plan] would run: <cmd>" is appended via the existing append_exec_output channel — same context flow as a real exec, so the model sees its proposed action on the next turn. - run_shell is NOT called; no executor, no cd intercept, no capture. The prompt shows "[aish:<model> plan]>" while active (mirrors the norris ⚡ marker convention). Orthogonal to Norris: plan_mode only gates the interactive CMD: extraction path. Norris has its own halt protocol; combining them is not supported (the planner would be confused by skipped actions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:09:08 +00:00
marfrit	81c3b1b44a	main: non-interactive `-p`/`--prompt` one-shot mode (closes #4 ) Adds `aish -p "<text>"` for Unix-pipeline composability: tail app.log \| aish -p "any anomalies?" aish -p "summarize: $(curl -sS https://...)" The flag bypasses repl.lua entirely. On invocation: 1. Stdin: when not a TTY, read to EOF and prepend to the prompt as a fenced block. ffi.libc.isatty(0) gates the read so interactive `aish -p "..."` (no pipe) doesn't hang. 2. Resolve config.models[config.default_model]. 3. Stream broker.chat_stream replies to stdout; finalize with newline. 4. Exit 0 on success, 1 on broker error, 2 on arg / config error. Behavior NOT in -p mode (kept simple per the issue's "no repl.lua involvement"): - No MCP, no tool loop, no Norris, no routing, no memory injection. - "CMD:" lines in the reply are printed verbatim, NOT executed — callers can grep / pipe them as they wish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:06:27 +00:00
marfrit	0700dce881	repl: enforce budget per Norris step, not just post-loop (closes #51 ) PHASE3.md §2 specifies sliding-window eviction "including mid-Norris- session if the loop runs long". Implementation only called enforce_budget() once, after the planning loop exited — so for a tight max_turns with a multi-step Norris session the model saw the FULL conversation throughout, defeating context budgeting and preventing R-C3 (NORRIS suffix goal anchor surviving eviction) from being exercised end-to-end. Move status_evictions(ctx:enforce_budget()) inside the while loop so it runs after every safety.norris_step return. Drop the now-redundant post-loop call. Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the "oldest 4 turns evicted" status arrived AFTER NORRIS DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:34 +00:00
marfrit	0c93e31186	repl: warn on stale MCP auto_approve keys (closes #33 ) Auto-approve policy keys that point at unconnected aliases, mistyped tool names, or malformed forms were silently ignored — leaving the user with surprise confirm prompts and no diagnostic. validate_auto_approve() now walks config.mcp.auto_approve at startup (after the MCP connect loop) and after each :mcp connect. For each key: - "alias__*" — warn if alias has no live session - "alias__tool" — warn if alias unknown OR tool not in registry - anything else — warn as malformed (not in alias__tool form) Non-fatal. The re-run on :mcp connect lets a key that referenced a not-yet-connected alias become live without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:08 +00:00
marfrit	299dcce78f	repl: validate MCP tool names against Bedrock regex (closes #32 ) Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names. We already moved the alias separator from "." to "__" (commit `f26cbd9`), but a future MCP server could still register a tool whose name (or whose combination with the alias) contains characters outside that class — silently breaking calls to strict providers. connect_mcp now warns at startup for: - aliases containing "__" (would misparse on tool dispatch) - emitted alias__name strings that violate the regex or exceed 128 chars Behavior preserved: validation is informative-only. tools_schema() still emits the offending tool; local llama.cpp users accept lenient names and shouldn't be penalized for downstream strictness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:29 +00:00
marfrit	8e0e735e15	repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT) Surfaced by autonomous run of TC #48: pointing models.fast at http://localhost:9999 (port closed, host resolves) emits "transport: Could not connect to server" — CURLE_COULDNT_CONNECT (7) which the Phase 5 fallback pattern set didn't include. Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua. Now fallback fires for the full set of common libcurl/HTTP transport failure shapes: HTTP 5xx server-side HTTP 404 model_not_found HTTP 408 gateway request timeout Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST Could not connect to server CURLE_COULDNT_CONNECT (← added) Connection refused Timeout was reached CURLE_OPERATION_TIMEDOUT (variant A) Operation timed out CURLE_OPERATION_TIMEDOUT (variant B) Re-tested #48 end-to-end: fast pointed at dead port → fast fails → status fires → cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:49:13 +00:00
marfrit	d72689f709	config: deep model → deepseek-coder-v2-lite (temporary) qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per /v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B active params; fast enough that the 30-min timeout from the qwen3-30b config was wildly over-budget. Switched to deepseek-coder-v2-lite for the time being. Restore qwen3-30b when the slot is back up. Live-probed: YES/NO destructive probe via the deep model preset returns "YES." in ~4.8s — well within the new 5-min timeout, and fast enough that the Phase 3 LLM second-opinion path is now functional again without falling back to "fail-safe YES" on every ambiguous command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:42:23 +00:00
marfrit	a9b39cd435	config: Phase 5 routing + summarize-on-evict example (commit #5 ) Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only; commented-out example showing: - routing.auto (per-request auto-routing toggle) - routing.classes (class → model mapping; reasoning = nil by default per R-N2 cost-safety) - routing.fallback (single-hop retry to cloud on transport fail) - routing.fallback_model (default "cloud" if uncommented) - context.summarize_on_evict + summarizer_model + max_summary_chars (shown INSIDE the context = {...} block above) All defaults OFF — Phase 5 is opt-in across the board. Existing configs without `routing` or `context.summarize_on_evict` behave identically to Phase 4. Phase 5 implementation complete: #1 `3e57824` router.classify_model + 31-case corpus #2 `03497b5` context summarize_fn callback + summary block in to_messages #3 `40ea0b4` repl routing + fallback + summarize_fn wiring + :route/:fallback #4 - (bundled into #3 since meta cmds are trivial additions) #5 (this) config example block Phase 5 verify-partial: - router.classify_model: 31/31 case corpus passes - context summarize-on-evict: mock callback fires correctly (additive + compress paths), summary suppressed under Norris, :reset clears it - repl meta cmds: :route on/off/classes/check + :fallback on/off all work; :route check reports class + "routing currently disabled" suffix when auto is off (N1) Verify-pending: end-to-end with real broker (route a code question, see it land on deep; kill local backend, see fallback fire to cloud). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:32:20 +00:00
marfrit	40ea0b49b0	repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3 ) Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5 machinery into the REPL. make_summarize_fn(): Returns a closure that maps (prior_summary, evicted_turns) onto a broker.chat call against cfg.context.summarizer_model (default "fast"). Three dispatch paths matching the R-B1 callback contract: evicted == nil → compress signal prior present → additive ("extend the prior summary ...") prior nil → first-time ("summarize the following turns") All use a system prompt enforcing "exactly one short paragraph", max_tokens=300, timeout_ms=30000. Broker failure returns nil so Context falls back to silent eviction. Renderer status is logged on failure for visibility. Context construction: Build ctx_opts as a fresh table (copies config.context to avoid mutating it), adds summarize_fn ONLY when config.context.summarize_on_evict == true. Defaults stay OFF — Phase 4 regression coverage. Fallback machinery: - FALLBACK_PATTERNS table with 7 transport-error signatures (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused, "Timeout was reached", "Operation timed out") - fallback_reason(err) strips the "transport: " prefix and matches. - should_fallback(err) gates on cfg.routing.fallback. - call_broker(cfg, name, msgs, on_delta, opts) wraps broker.chat_stream: • tracks any_delta via wrapped on_delta callback • retries ONCE against cfg.routing.fallback_model (default "cloud") when err matches AND no deltas arrived (N3: mid-stream failures aren't retried — partial text would duplicate) • emits "[aish] local <name> failed (<reason>); retrying via <fb>" status before the retry call ask_ai routing: - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg locals carry the choice through every tool-sub-loop iteration. - active_name/active_cfg are NOT mutated — user's :model selection survives the request. - When config.routing.auto is true, classify_model(text, config) is invoked. Non-nil model + non-active → swap req_cfg + status line. - broker.chat_stream call replaced with call_broker (fallback wrap). Meta cmds: :route on/off — toggle cfg.routing.auto at runtime :route classes — show class → model mapping :route check <text> — report classify_model result with "(routing currently disabled)" suffix when auto is off (N1) :fallback on/off — toggle cfg.routing.fallback at runtime HELP updated with the four new commands. Smoke-tested: aish boots, all four metas behave correctly, classify_model returns reasoning class for "Explain how MMAP works on Linux" (the model slot is nil because no classes are configured by default — N2 cost-safety). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:31:14 +00:00

1 2 3

123 Commits