marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	a3c1813465	context: proactive periodic summarization (closes #101 ) Closes #101 (FR-A from the 2026-05-17 German strategy analysis, small-model improvement strategy 5: "History-Zusammenfassung via local"). Phase 5 summarize-on-evict only fires at budget pressure — exactly when the local model is already suffering. Small models benefit from tight context from turn 1, not "after eviction". This commit adds CADENCE-triggered summarization that fires every N appends regardless of budget, folding turns older than `summarize_keep_recent` into ctx.summary via the existing Phase 5 summarize_fn closure. context.lua additions: - New ctx fields: summarize_every_n_turns, summarize_keep_recent (default 4), _turns_since_summarize (counter). - Context:append bumps the counter on every store. - Context:enforce_cadence — the new entry point. Returns the number of turns folded (0 on no-op). Guards: * disabled (cfg unset OR summarize_fn unset) -> 0 * not yet due (_turns_since_summarize < N) -> 0 * Norris-active (Phase 5 R-C4 parity — planner stays on goal) -> 0 * #turns <= keep_recent (nothing to fold) -> 0 * summarize_fn returns nil/empty -> 0 (defer to enforce_budget later) Orphan-tool guard: when the fold slice would end on an assistant-with-tool_calls, peel back the right edge until the next live turn isn't role=tool. Strict chat templates reject tool-without-assistant-anchor (#87 already encountered this). - If ctx.summary grows past max_summary_chars after the fold, compress in a second pass (same shape as enforce_budget's Phase 5 logic). repl.lua wiring: - ctx_opts continues to copy all config.context keys; the new summarize_every_n_turns / summarize_keep_recent fields flow through automatically. - make_summarize_fn is now wired when EITHER summarize_on_evict OR summarize_every_n_turns is set (same closure, different trigger — Phase 5's #51 #issue eviction path uses it on budget; #101 uses it on cadence). - New status_cadence_fold helper: "[aish] proactively summarized N older turns". - ask_ai's existing enforce_budget call site now first fires enforce_cadence, then enforce_budget. Cadence comes first so the token estimate enforce_budget sees is the tighter post-fold one — no spurious eviction of turns we just summarized. - Norris path NOT wired: enforce_cadence is a no-op there via the norris_active guard (consistent with Phase 5 R-C4). 18 inline unit cases for enforce_cadence: - cfg disabled / no summarize_fn / below cadence -> 0 - cadence met -> exact fold count (N - keep) - summary contains folded contents; first/last live turn IDs match - cadence counter resets; second fold fires after another N appends - Norris-active -> suppressed - orphan-tool guard: peels back when last folded = asst+tool_calls - summary compression triggers when over max_summary_chars E2E verified on hossenfelder:8082, summarize_every_n_turns=4 / summarize_keep_recent=2: 5 user turns -> 2 cadence fires: [aish] proactively summarized 2 older turns [aish] proactively summarized 4 older turns :cost detail shows main=5 calls, summarize=2 calls (matches fires). Estimated ctx token count: 180 (vs ~1000 unsummarized). Flag-off path: no status, identical to pre-#101 behavior. Regression: 87/87 safety, 31/31 router_model, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:20:56 +00:00
marfrit	c9009399d6	config: example block for cfg.memory.auto_summarize_on_quit (#102 ) Documents the new opt-in keys (auto_summarize_on_quit, min_turns_for_summary, summary_model) inline with the existing Phase 4 memory block. Notes that the older summarizer_model key is still honored for back-compat. Config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:18:15 +00:00
marfrit	299719f4de	repl: auto-summarize on :q into memory.jsonl (closes #102 ) Closes #102 (FR-B from the 2026-05-17 German strategy analysis, small-model improvement strategy 5: "History-Zusammenfassung via local"). Today the `:memory summarize` distill flow is a manual meta — users have to remember to run it before quitting. This commit wires the same flow into shutdown_session under an opt-in cfg flag, so the local fast model can absorb each non-trivial session into the persistent memory.jsonl without user burden. Next-session startup's [background] block picks the new entries up automatically (Phase 4). Implementation: - Extract the `:memory summarize` body into _do_memory_summarize(opts). opts.auto = true: skip the per-candidate readline keep?[y/N/edit] loop and auto-add every parsed candidate (trust the model + the explicit opt-in via cfg.memory.auto_summarize_on_quit). opts.min_turns is the silent-no-op cutoff. Status messages suppressed for fast-path no-ops so :q stays quiet on trivial sessions. - :memory summarize meta now one line: _do_memory_summarize({ auto=false }). - shutdown_session checks cfg.memory.auto_summarize_on_quit; if set, pcall(_do_memory_summarize, { auto=true, min_turns=N }). pcall so a broker failure NEVER blocks :q (memory is best-effort). New config keys (all opt-in; default behavior unchanged): memory = { enabled = true, auto_summarize_on_quit = true, min_turns_for_summary = 5, -- skip trivial sessions summary_model = "fast", -- cfg.memory.summarizer_model is -- still honored for back-compat } E2E verified on hossenfelder:8082 with qwen-coder-7b as summary_model: 3 user turns ("remember venus...", "remember mars...", "remember pluto..."): :q -> "[aish] summarizing session for memory via fast ..." -> "[aish] auto-summarize: added 3 memory items" -> memory.jsonl gained 3 fact: entries (correctly extracted) Below threshold (1 user turn, min=10): :q -> silent, no broker call, no memory.jsonl change Flag off (default behavior, 4 turns): :q -> silent, identical to pre-#102 behavior Regression: 87/87 safety, 31/31 router_model, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:18:02 +00:00
marfrit	cb37fa861a	phase10: config.lua example for cfg.norris.{preplanner,executor} Documents the new Phase 10 / #89 surface for users: when, why, and how to set cfg.norris.preplanner + .executor + .tasks_max. Notes the graceful single-model fall-back when preplanner is unset OR fails, and the design choice that preplan does NOT route via call_broker (retry would silently swap planning models). C5: config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:22:22 +00:00
marfrit	76a8f97009	repl: cloud preplanner + local executor split for Norris (closes #89 ) Phase 10 C4 — the orchestration commit. Splits Norris autonomous mode into a one-shot cloud preplan + per-step local executor flow, with graceful fall-back to single-model Norris when preplan is disabled or fails. run_norris additions (in order): 1. R4 fix: clear ctx.norris_active/_goal/_tasks at the TOP so a prior crashed Norris can't leak stale state into the new launch. 2. Preplan block (gated on cfg.norris.preplanner): - Look up the preplanner preset in cfg.models; warn + skip if absent. - Build a system prompt asking for TASK: <imperative> lines (R1: %d via string.format — gsub("N", ...) would corrupt "No prose / commentary / numbering" to "16o prose"). - Scrub messages per the preplan model's redact policy; run broker.chat (non-streaming, per Q-PP2) with category "norris-preplan"; R7: respect pre_cfg.timeout_ms. - On success: rehydrate; record usage via _record_usage; extract_task_lines; cap to tasks_max; populate ctx.norris_tasks = { current = 1, list = parsed }. - On ANY failure (transport err / empty list / bogus preset): status log + leave ctx.norris_tasks nil → single-model fall-back. R3 design: NOT routed via call_broker; a fallback retry would silently swap planning models which is worse than a clean hard-fail. 3. Executor cfg resolution (independent of preplan per Q-PP1): cfg.norris.executor names a preset → executor_cfg = that cfg. Unset / missing preset → executor_cfg = active_cfg (existing :model-selection behavior). 4. Loop body: pass executor_cfg (not active_cfg) to safety.norris_step. After each "continue" result, advance ctx.norris_tasks.current. When current > #list, exit with synthesized status "tasks_complete" + reason "all N preplanned tasks executed". 5. Exit cleanup: clear ctx.norris_tasks alongside the existing norris_active/_goal clears so a re-launch starts fresh. renderer.norris_end gains "tasks_complete" as a non-error status (cyan, same as "done"). Distinct from "done" (executor said GOAL: complete) — executor exhausted the plan but didn't confirm goal, which is a clean exit, not an error. E2E verified (preplanner=fast, executor=fast on hossenfelder:8082): :norris print the date and the current uptime → preplanned 2 tasks via fast → ─ step 1/3 ─ Print the current date. → CMD: date → Sun May 17 ... → ─ step 2/3 ─ Print the current uptime. → CMD: uptime → ... up 1 day ... → NORRIS TASKS COMPLETE: all 2 preplanned tasks executed :cost detail correctly shows two rows for the same model: norris-preplan 1 calls, 95 / 12 tokens norris 1 calls, 364 / 9 tokens Fall-back verified: cfg.norris.preplanner = "doesnotexist" → "[aish] preplanner 'doesnotexist' is not in cfg.models; running single-model" → Norris runs as Phase 6. No-preplan path verified (no cfg.norris block): Norris runs exactly as Phase 6, no behavior change. Regression: 87/87 safety, 31/31 router_model, repl loads. Closes #89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:21:25 +00:00
marfrit	fa2cfc66ed	safety: pass current task descr to render_step (Phase 10 C3) helpers.render_step has supported a (step_n, max_n, descr) signature since Phase 3 (renderer.lua:246) but safety.norris_step has only ever called it with two args. Phase 10 lights up the descr slot: when ctx.norris_tasks is populated (cloud preplanner ran at :norris launch), the current task text becomes the per-step description so the user sees `─ step k/N ─ <task>` in real time. ctx.norris_tasks is nil when: - preplan disabled (cfg.norris.preplanner unset) - preplan failed (transport / parse / empty) - preplan emitted TASKs but already exhausted In all those cases descr falls through to nil → renderer prints just the step bar (Phase 3 behavior, no regression). Regression: 87/87 safety, 31/31 router_model, repl loads. No e2e visible change yet — ctx.norris_tasks is always nil until C4 wires the preplan call. R5 fix: this commit touches safety.lua ONLY (no repl.lua change as the prior plan implied). Executor cfg resolution + preplan wiring lands in C4 (next commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:18:40 +00:00
marfrit	477d8a76cc	context: norris_tasks anchor + task-hint composition + reset clear Phase 10 C2. Three additive changes; no regression. - compose_norris_task_hint(self) — module-scope helper. Returns "" when norris_tasks is nil OR list empty OR current pointer past end. Otherwise returns "\n\nCurrent step k/N:\n <task text>". - Context:to_messages appends the hint AFTER the NORRIS suffix, inside the existing `if self.norris_active and self.norris_goal` branch. NORRIS_SUFFIX_TEMPLATE is UNCHANGED (R2 fix); the hint is a separate concatenation. Goal anchor stays the primary per-step instruction; task hint sharpens current focus. - Context:reset() now clears self.norris_tasks (R6 fix). :reset is unreachable mid-Norris (planner runs without readline prompt), but if a Norris session crashed leaving stale state, :reset recovers cleanly. One line; defensive. 15 unit cases verified: - nil/empty/exhausted norris_tasks -> no hint block - current=1/3 -> "Current step 1/3" + task text in output - NORRIS suffix precedes hint (ordering preserved) - hint suppressed when norris_active=false even if tasks set - self.turns + self.norris_tasks table identity unmutated - Context:reset clears norris_tasks AND turns Regression: 87/87 safety, 31/31 router_model, repl loads. C2 isn't called from anywhere yet (ctx.norris_tasks is always nil until C4 wires the preplan call). No behavior change in the live tree until then. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:18:24 +00:00
marfrit	e4780483ad	executor: extract_task_lines for Phase 10 preplan parsing Pure function parallel to extract_cmd_lines, but more permissive to accommodate cloud-model output variation: tolerates leading whitespace (cloud often indents), tolerates extra whitespace after the colon, strips trailing whitespace. Strict on the literal "TASK:" prefix. Returns an array of trimmed strings; empty TASKs and non-TASK lines dropped silently. Callers cap the list size per cfg.norris.tasks_max. 10 inline unit cases verified: empty/nil, single TASK, mixed CMD+TASK (only TASKs returned), leading whitespace tolerated, empty-body TASKs dropped, trailing whitespace stripped, extra-spaces-after-colon AND no-space-after-colon both tolerated, prose interleaving (3 TASKs extracted from a realistic cloud response with intro+outro prose), TASK content with embedded quotes/punctuation preserved. Nothing in the tree calls this yet (Phase 10 C1 is the foundation commit; C4 lights it up). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:17:53 +00:00
marfrit	cbef05ff40	phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits All 8 actionable findings accepted; R9-R11 were confirmations. Blockers: - R1: sys:gsub("N", ...) would corrupt "No prose / commentary / numbering" → "16o prose" etc. Switch to %d + string.format. - R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that contradicted §11's "don't change the template; append helper output after". §5 now shows the helper-append approach. Important: - R3: preplan bypasses call_broker (no fallback retry) — keep that by design; retry would silently swap planning models. Documented in §10 Risks so it doesn't get "fixed" later. - R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks can leak across launches if a Norris step crashes. Fix: clear all three at the TOP of run_norris before preplan. Cheaper than full pcall wrap; handles the stale-tasks vector. - R5: clarified C3 commit scope — safety.lua ONLY in C3; the executor cfg resolution + preplan wiring lands in C4. - R6: Context:reset() now also clears self.norris_tasks (defensive; :reset is unreachable mid-Norris but one line is cheap). Nits: - R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the configured per-model timeout). - R8: "Status:" → "Terminal output:" in §1 acceptance criterion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:17:30 +00:00
marfrit	cb2f948e76	phase10: analyze + plan — answer Q-PP1..6, 5-commit roadmap Analysis resolves 6 OQs from the formulate: - executor cfg independent of preplanner cfg (Q-PP1) - preplan non-streaming for v1 (Q-PP2) - re-launch fires preplan again, naturally (Q-PP3) - executor sees goal + current task (Q-PP4) - :norris introspection out-of-scope v1 (Q-PP5) - 1-task degenerate case runs as normal (Q-PP6) Code-reading findings: safety.norris_step signature unchanged (executor cfg flows in as model_cfg param); NORRIS_SUFFIX_TEMPLATE stays stable (task hint appends after); renderer.norris_step already accepts descr (just unused by safety.norris_step today). Plan: 5 commits — executor / context / safety / repl / config-and- memory. Each commit verifiable in isolation; the orchestration lights up at C4 (repl preplan wiring); C5 documents. Sonnet review next (per ~/.claude/projects/.../memory rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:12:43 +00:00
marfrit	a7cbe22d1d	phase10: formulate manifest — cloud preplanner / local executor split Resolves direction for #89. Splits Norris into two roles: - Preplanner (cloud) fires ONCE at :norris launch; emits TASK: list. - Executor (local) handles each TASK; existing HALT protocol intact. ctx.norris_tasks anchor survives eviction (mirrors ctx.norris_goal). Cost category 'norris-preplan' separates the cloud preplan call from per-step executor cost in :cost detail. Graceful fall-back when cfg.norris.preplanner is unset OR preplan call fails — Norris runs as today (single-model). No regression for existing users. PHASE0 §11 amended to add Phase 10 row. Manifest declares 6 Open Questions for analyze step; 12 design decisions table; module-touch table; 4-pillar plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:11:33 +00:00
marfrit	c55077bc07	context + repl + config: route-aware context compression (closes #87 ) Small local models effectively use a fraction of their advertised context window. Per-request compression for routes that hit a local-compress-flagged model preset: keeps only the last N turns and tail-truncates oversized content. Cloud routes get the full context unchanged. Changes: - context.lua _compress_turns(turns, keep, max_chars): returns a new list (self.turns NEVER mutated) with the last `keep` turns preserved + content tail-truncated to `max_chars`. Defensive: drops tool turns at the slice head (orphaned without their assistant-with-tool_calls anchor — strict chat templates would reject them; same gotcha PHASE0 §6 warned about for user/user). - Context:to_messages(opts) — opts.compress = { keep_turns, max_turn_chars } swaps the turn iterable for the compressed view. Affects BOTH the use_tool_role=true path and the use_tool_role=false fallback (PHASE2.md Q18 strict-template workaround). Persistence + display via :history see the full uncompressed ctx.turns. - repl.lua ask_ai: when req_cfg (the routed model's cfg) has `local_compress = true`, build compress_opts from config.context.compress (defaults keep_turns=2, max_turn_chars=800). Pass through ctx:to_messages alongside the existing system_prompt_override (#86) — orthogonal opts that compose. - Norris unaffected: safety.norris_step builds its own messages array; the planner needs full history per PHASE3 design. - config.lua gains a header comment explaining the per-model opt-in + the context.compress defaults block + the documented tool-turn truncation trade-off. 13 unit cases verified: - no opts -> full turn list (no regression) - keep_turns=2 -> exactly last 2 emitted - long content tail-truncated to max_chars - self.turns unchanged after render - orphan tool-turn at slice head dropped (no chat-template violation) - tool turn included WITH its assistant anchor when keep_turns >= 3 E2E against live local broker: - models.fast.local_compress = true; keep_turns=1; max=200 - 4-turn session: each broker call sees ONLY the current turn (verified by short coherent CMD replies despite no cross-turn memory available to the model). FR-promised small-model friendliness in action; conversation continuity is the documented trade-off. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:50:07 +00:00
marfrit	74e4bffb37	broker + repl + safety: GBNF grammar-sampling passthrough (closes #88 ) llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:00:36 +00:00
marfrit	047d629a66	context + repl + config: per-class system_prompt override (closes #86 ) Small local models follow precise structured instructions better than natural language. Per-routing-class system_prompt override gives them tighter instructions for THAT request while preserving ambient context. Changes: - Context:to_messages(opts) — opts.system_prompt_override REPLACES the base system_prompt for THIS render only (state unchanged). Dynamic blocks ([background], [project], [earlier summary], NORRIS suffix) still compose on top. opts is optional; nil-safe for old callers. - repl.lua ask_ai — captures req_class from router.classify_model (already returned by Phase 5; previously discarded after the status line). Looks up config.routing.system_prompts[req_class]; passes as opts.system_prompt_override to ctx:to_messages each iteration of the tool-sub-loop. - Gating: override fires only when routing.auto is on (no class -> no override). If system_prompts[class] absent for a class, fall through to the default system_prompt (no surprise). - Norris unaffected: safety.norris_step builds its own messages array; doesn't go through this path. - config.lua gains a commented-out example showing routing.system_ prompts with the code/default examples from the FR body. Smoke verified: - 12-case context.lua unit test: opts nil/absent/present, override replaces base, dynamic blocks still compose, state unchanged after call, Norris-mode coexistence (suffix still present; background still suppressed). - E2E against cloud broker with routing.system_prompts.code set: triple-backtick prompt -> code class -> override fires; model emits terse code-only output. Non-code prompt -> default class -> no override -> normal verbose-ish reply. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 05:41:15 +00:00
marfrit	df59ee2f2c	config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4 ) config.lua header gains a Phase 9 paragraph documenting the project-overlay feature + the R7 shallow-merge warning ("if your .aish.lua sets a top-level block, it REPLACES the user's entire block — list every entry OR omit the block"). Inspect at runtime via `:config show`. docs/PHASE9.md status header bumped: "Plan + review fold-in" -> "Implement". Lists the 4 implement commits inline: `e525063` history: trust file helpers `34b465d` main: project-overlay loader `5b6ee55` repl: :config show meta + HELP this config template comment + status bump Phase 9 implementation complete. Next inner-loop step: verify (file TCs, run autonomous, close) + memory-update. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:53 +00:00
marfrit	5b6ee553db	repl: :config show meta + HELP (Phase 9 commit #3 ) User-facing diagnostic for the project-overlay layer. Reads config._sources (R3 cfg-embedded by main.lua's load_config_with_ overlay in commit #2) + the effective config; surfaces which file contributed each top-level key. :config show top-level keys + which source set each (nested tables collapsed to inner-key list) :config show full recursive dump with sensitive-key masking Masking heuristic (any key containing token/secret/auth/key, case-insensitive) -> "(set)" instead of the value. R6: applied RECURSIVELY in full mode so the actual leak vector (mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught. Defensive depth cap (5) prevents pathological recursion. When config._sources is absent (caller didn't go through load_config_with_overlay), status: "(unknown — main didn't pass _sources)" — meta still runs, just labels source as "?". N2 known cosmetic false-positive: `key_env` / `auth_env` config fields hold env-var NAMES (not secrets) but match the heuristic. Future polish exempts `*_env` patterns. Same for `token_budget` (contains "token") — also masked despite being a plain number. Acceptable; errs toward over-masking. HELP gains 1 :config line. E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: A. No project overlay: 6 user keys; nested tables collapsed. `secrets` masked as (set) at top level. B. Project overlay accepted: source map cleanly partitioned (user has 4 keys; project has 2 — default_model + models); each top-level row tagged [user] or [project]. C. :config show full: nested dump; auth_token in models.cloud correctly masked as (set); SECRET_VAL never appears in output (grep count = 0). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #4 next: config.lua template comment + PHASE9.md status header -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:30 +00:00
marfrit	34b465d6dc	main: project-overlay loader (Phase 9 commit #2 ) Wires the project-overlay step around the existing load_config. Activates only when a trusted .aish.lua is found in/above cwd. Changes: - _find_project_config() walks libc.getcwd() up to $HOME, returning first .aish.lua found. R1 fix folded: proper-prefix check (`dir == home OR dir starts with home .. "/"`) avoids the false positive where /home/user2 matches HOME=/home/user via byte prefix. - _trust_file_path() resolves via $AISH_TRUST_FILE env override, else ~/.aish/trusted-projects. Plan-time decision per N3. - _check_and_maybe_prompt(project_path, history) — calls history._sha256_file ONCE; routes through history.is_trusted; on miss prompts via rl.readline; on accept persists via history.add_trusted. A8 mitigation: if rl.readline fails to load, decline silently (no io.read fallback that would consume stdin). - load_config_with_overlay(opts): * Calls existing load_config; seeds sources={k="user", ...} * Walks for .aish.lua; if found: - In opts.prompt mode (-p, R2): skip the prompt entirely; only PRE-TRUSTED overlays load. Avoids io consuming the piped stdin that -p will read for context. - Else: interactive trust check + prompt. * On accept + successful dofile: shallow-merge top-level keys ONTO user config; update sources[k]="project" for overlapping. * R3: embeds sources on cfg._sources for repl.lua's :config show meta to read. No global. * Returns (cfg, user_path, project_path \| nil). - main() now calls load_config_with_overlay; on project layer active, emits the "[aish] project config: <path> (overlaid on <user>)" status line per A4 (AFTER the user-config status). E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: 1. Decline -> overlay skipped; user config active. 2. Accept -> overlay loaded; project_model active; status line "[aish] project config: ... (overlaid on ...)" visible. 3. Re-startup -> NO prompt (cached via sha); overlay loaded transparently. R4 single-sha-call confirmed. 4. -p mode with untrusted overlay -> skipped silently; piped stdin preserved for run_one_shot. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #3 lands :config show + HELP next; commit #4 the config template comment + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:48:22 +00:00
marfrit	e525063df3	history: trust file helpers for Phase 9 (commit #1 ) Foundation for the project-overlay trust mechanism. No callers yet — commit #2 wires main.lua to use these. Three new functions: history._sha256_file(path) -> hex digest or nil Shells `sha256sum`; parses first whitespace-separated field; validates 64-hex-char length. nil on any failure (path missing, binary missing, file unreadable). Caller treats nil as "skip the trust path" — never crashes. history.is_trusted(trust_path, project_path, sha256) -> bool Reads trust_path as JSONL; returns true iff an entry exists matching BOTH project_path AND sha256. Missing / corrupt / unreadable trust file -> false (re-prompt). Per-line JSON decode means partial-write corruption affects at most one line. history.add_trusted(trust_path, project_path, sha256) -> bool mkdir -p parent; append JSONL line {path, sha256, ts (ISO)}; chmod 600 the trust file (best-effort; ignore failure). Single writer per call; append-only. 11 unit cases verified: - sha256 known value matches manual `sha256sum` - nil / missing-file -> nil (no crash) - is_trusted on missing trust file -> false - add_trusted + is_trusted roundtrip works - Different sha -> not trusted (content-binding) - Different path -> not trusted - Multi-entry trust file: each entry independently checked - chmod 600 verified via stat Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:45:07 +00:00
marfrit	e796142a23	docs/PHASE9: review fold-in — 0 BLOCKERs + 7 CONCERNs + 5 NITs Sonnet review of PHASE9 (formulate + analyze + baseline + plan at `31e5de5`). No BLOCKERs (manifest design sound); seven real CONCERNs including a path-prefix bug + a piped-stdin interaction that would have surfaced at implement time. CONCERNs (FOLDED): R1. HOME-prefix walk-up false positive — dir:sub(1, #home) ~= home matches /home/user2 when HOME=/home/user. Real bug. Fix: `dir ~= home and dir:sub(1, #home + 1) ~= home .. "/"`. R2. A8's io.read("*l") fallback for trust prompt would consume the first line of piped stdin in aish -p mode. Fix: SKIP trust prompt in one-shot mode (load only pre-trusted overlays). If rl.readline misbehaves interactively, emit status + skip overlay (no fallback to stdin in either mode). R3. Sources-map delivery decided: cfg-embedded as config._sources. Globals across module boundaries explicitly avoided. Backward- compat: if absent, :config show reports "(sources unknown)". R4. _prompt_trust signature fixed — takes pre-computed sha; single sha256 call per startup per project file. R5. _check_trusted no longer reimplements trust-file read logic; routes through history.is_trusted / history.add_trusted with AISH_TRUST_FILE env override (single resolution site). R6. :config show `full` mode masking now spec'd: same heuristic applied RECURSIVELY to nested values (mcp.servers.X.auth_token is the actual leak vector). R7. Shallow-merge UX trap reframed — was "documented as predictable"; now an explicit conspicuous warning in done-when + UX surface + config.lua template that "if your .aish.lua sets a top-level block, it REPLACES the user's entire block". Deep-merge with explicit-replace-syntax v2 polish. NITs (APPLIED): N1. (no doc change — review-prompt clarification only) N2. key_env / auth_env over-masking documented as known cosmetic false-positive (env-var names, not secrets). N3. Sources-map decision added to open-at-plan-time before falling-into-commit-2 surprise. N4. Trust-file first-write atomicity edge case documented (manual delete to recover); temp-file+rename = v2. N5. Stale "stat" mention in §3 module table removed (A2: io.open is sufficient; no new FFI). Code sketches in §4 + §5 + §6 + §13 commits 2+3 all updated to reflect the fixes. Manifest is internally consistent + matches the history.lua API to be added in commit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:44:20 +00:00
marfrit	31e5de5ad5	docs/PHASE9: analyze + baseline + plan (single bundled commit) Bundled the three doc steps since the surface is small (4-commit impl, no major redesigns from formulate). Analyze findings (12, A1-A12): A1-A2 — main.lua surface clean; no new FFI needed A3 — Q-P2 RESOLVED via baseline: sha256sum (GNU coreutils) A4 — Q-P1: trust prompt AFTER user-config status line A5 — Q-P3: don't log walk-up by default; :config show on demand A6 — Q-P5: :cfg show top-level by default; `full` for deep A7 — Q-P6: project may set secrets.vault (covered by trust prompt) A8 — Q-P4 DEFERRED: rl.readline early-startup smoke at impl time A9 — walk-up perf <1ms even pessimistic A10 — trust-file race: JSONL append-only handles concurrent writes A11 — sandboxed dofile out of scope (trust prompt IS the gate) A12 — bootstrap order is correct: user→project→secrets_session Baseline: B1 — sha256sum + openssl agree byte-for-byte on noether; sha256sum chosen (universal + simpler parse). §10 Open Qs table now shows resolutions inline (5/6 done; Q-P4 deferred to implement-time smoke). §13 Implementation Plan added — 4 commits: 1. history.lua: trust file helpers (read/add/is_trusted + _sha256_file) 2. main.lua: walk-up + load_config_with_overlay + trust prompt 3. repl.lua: :config show meta + startup status line 4. config.lua header note + status -> Implement Per-commit risk index covers sha256sum-missing case, JSONL partial write, A8 rl.readline early-startup, symlink-loop walk-up, :config show token leakage via conservative masking heuristic. Open at plan-time (resolve at impl): - A8 rl.readline behavior; fall back to io.read if broken - $AISH_TRUST_FILE env override for CI isolation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:38:10 +00:00
marfrit	4f5c3aeba9	docs/PHASE9: formulate — project-local config overlay (.aish.lua) Phase 9 formulate manifest + PHASE0 §11 amendment (adds Phase 9 row) + PHASE0 §10 amendment (config resolution order now references Phase 9's overlay step). Substrate-touch lands same commit per CLAUDE.md §3. Four pillars: 1. .aish.lua walk-up from cwd; stops at $HOME or filesystem root. First found file becomes the project layer. Absence = no-op. 2. Shallow merge over user config: project top-level keys REPLACE user keys. Predictable; deep merge surprises with array/table semantics. Users compose full blocks explicitly. 3. Trust prompt + sha256-pinned persistence in ~/.aish/trusted- projects (JSONL, mode 0600). First encounter prompts; subsequent startups load only if recorded sha matches. Content change -> re-prompt. Matches direnv-allow security posture. 4. :config show meta — lists each source path with the top-level keys it contributed + sanitized effective config dump (token-bearing fields masked). Key design decisions documented: - Trust mechanism is explicit (not default-trust-all-cwds) — .aish.lua runs arbitrary Lua via dofile; hostile cloned-repo case is a real concern. - $HOME boundary on walk-up — don't search /tmp or /. Repos outside $HOME get no project layer. - Reload on cd: NO. Config resolved at startup only. - sha256 via shelled `sha256sum` (POSIX-portable; avoid vendoring a Lua impl). §9 risk table covers: hostile repo (trust prompt), corrupted trust file (best-effort skip), updated repo (sha mismatch re-prompts), dofile errors (pcall-protected), walk-up safety ($HOME boundary). 6 open questions for analyze: Q-P1 — trust prompt before/after startup status Q-P2 — sha256sum vs openssl dgst (baseline) Q-P3 — log walk-up path? Q-P4 — rl.readline safe at startup? Q-P5 — :config show full vs top-level Q-P6 — project-set secrets.vault security Scope confirmed via AskUserQuestion: project-local overlay (chosen over cost preflight enforcement and cross-session cost persistence, both deferred as Phase 10 candidates per §11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:36:35 +00:00
marfrit	08dba69fce	config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5 ) config.lua: - Commented-out `tokenize = { use_endpoint = true }` block with parity to the Phase 1-7 example blocks. - Documents the two consequences: (1) per-turn network cost (~30ms first time, cached after) and (2) token_budget is now actually enforced — sessions that fit under char/4 may evict earlier under accurate counts. - Notes cloud /tokenize 404 fallback path. docs/PHASE8.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 5 implement commits inline for traceability: `7ef2a6e` broker: token_count + endpoint cache `8502517` context: tokenize_fn + _tokens cache `db26d0c` context: enforce_budget honors token_budget (R2 guard) `94b7d86` repl: wire tokenize_fn + :cost detail estimate row this config example + status bump Phase 8 implementation is complete. Resolves Q1 (PHASE0 §13, originally Phase 3, deferred forward). Next inner-loop step: verify (7) — file test cases, run autonomous, close. Then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:32:16 +00:00
marfrit	94b7d86926	repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4 ) Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised :cost detail trailing line. Changes: - When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is set to `function(text) return broker.token_count(active_cfg, text) end` before Context.new fires. R4: the closure body references active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call time, so subsequent :model switches re-route to the new model's tokenizer automatically (verified by E2E: :model cloud after the fast call still produces clean estimate row). - :cost detail gains a trailing line per R3: estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used) N comes from ctx:estimate_tokens() (current in-memory snapshot, NOT a comparison against the accumulator sum above which is cumulative across calls + evicted turns). Gives at-a-glance budget utilization. E2E verified against live broker: - fast model call -> 168 tokens estimated (real BPE via /tokenize) - :model cloud + cloud call -> 178 tokens estimated (closure follows :model switch correctly per R4) - 21% / 22.3% budget utilization shown - Accumulator sums and estimate are intentionally different (sums are cumulative, estimate is current snapshot) — R3- correctly displayed as separate lines Regression: test_safety 87/87, test_router_model 31/31, repl loads. With this commit landed, Phase 8 is functionally complete; commit #5 is config example + status bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:31:40 +00:00
marfrit	db26d0ccb7	context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3 ) Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:30:37 +00:00
marfrit	8502517021	context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2 ) Foundation for accurate Context:estimate_tokens. When the optional tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua), estimate_tokens uses it with per-turn caching for O(1) amortized cost. char/4 path unchanged when tokenize_fn nil. Changes: - Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn. - Context:estimate_tokens: if tokenize_fn nil -> existing char/4 (no behavior change). if tokenize_fn set -> - tokenize self.system_prompt every call (dynamic per compose_background/project/summary; can't cache). - for each turn: if t._tokens nil -> compute + cache; else use cached. Turn content immutable after append (we never mutate stored turns) so cache never goes stale. - :reset wipes self.turns which takes the _tokens cache with them; new turns start with t._tokens == nil and lazy-set on first count. 8/8 unit cases verified: - char/4 path unchanged when no tokenize_fn - tokenize_fn called 1+ N times on first estimate (sys + N turns) - subsequent estimates fire only 1 tokenize call (sys; turns cached) - new turn fires +1 tokenize call on next estimate - :reset + fresh turn fires fresh tokenize call (cache died with turn) No callers wire tokenize_fn yet — Phase 8 commit #4 lands the repl.lua wiring (after commit #3 adds the enforce_budget extension that's the real beneficiary of accuracy). Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:56 +00:00
marfrit	7ef2a6ed5c	broker: token_count + endpoint capability cache (Phase 8 commit #1 ) Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize where supported, char/4 fallback otherwise. Changes: - `M.token_count(model_cfg, text)`: Empty text -> 0. No endpoint -> char/4 immediately. Capability cache says false -> char/4. Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`, 2s timeout. On 200 + parseable `{tokens=[...]}`: cache true, return #tokens. Anything else (non-200 / parse-fail / transport err / timeout): cache false, char/4. - `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1 confirmed /tokenize ignores the model field, so same-endpoint presets share one cache entry. If a future broker honors the model field, revisit. - `M.tokenize_supported(model_cfg)`: returns nil/true/false for the cached state (introspection for tests + future :tokenize meta). - `M._reset_tokenize_cache()`: test hook so the session-local cache doesn't leak between test runs sharing a LuaJIT VM. Live verified against hossenfelder + a deliberately-broken endpoint: - "hello world" -> 2 tokens (matches manual curl probe) - 901-char text -> 201 real tokens vs 225 char/4 (24-token gap; real is LOWER here, opposite direction from the README probe where it was higher — confirms heuristic is inaccurate in both directions) - Pre-probe: tokenize_supported() returns nil - Post-probe: tokenize_supported() returns true (local) / false (broken) - Broken endpoint second call: still char/4, no re-probe - Empty / nil text edge cases handled Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:17 +00:00
marfrit	467e573d24	docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs Sonnet-reviewed per reviews-use-sonnet memory directive. BLOCKERs (RESOLVED in-place): R1. §5 estimate_tokens pseudocode missing per-turn cache pattern. Prose described it; code block called tokenize_fn unconditionally. Implementer following code verbatim would hit the O(N round- trips per call) perf gap the prose flagged. Code block now shows explicit `if t._tokens then ... else t._tokens = ... end`. R2. enforce_budget loop can spin forever when system_prompt alone exceeds token_budget (e.g. 5KB project block + budget=4096 + zero turns -> turns can't shrink further but OR-condition stays true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit 3 row shows the explicit Lua-syntax condition. CONCERNs (FOLDED): R3. :cost detail per-slot ~est=N annotation was semantically undefined — accumulator sum (cumulative across calls + evicted turns) vs current-snapshot estimate are incommensurable. §6 reworked: ONE trailing summary line "[estimated session ctx: N tokens; token_budget=M (X% used)]" instead of per-slot annotations. §13 commit 4 aligned. R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT capture by value). Subtle but easy to miss — §13 commit 4 now spells out the correct vs wrong patterns explicitly. R5. 2s tokenize timeout can spuriously cache-as-unsupported when llama.cpp is busy with a concurrent completion (single-threaded inference; /tokenize queues behind). Documented in §9; v1 ships 2s, revisit during verify if it bites. R6. Per-endpoint cache key conflated two same-endpoint/different- model presets (B1: /tokenize ignores the model field). Cache key simplified to endpoint-only. One probe per endpoint per session; if a future broker honors the model field, revisit. NITs (APPLIED): N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`. N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1). N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach (trailing summary line; per-slot annotation dropped). N4. Status header tree-hash updated to current (`aa64ad3` -> stays fresh through review fold-in; commit 5 will refresh again at "Implement" status). PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for implementation phase 6 of the inner loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:27 +00:00
marfrit	aa64ad3eec	docs/PHASE8: plan — §13 commit roadmap (5 commits) Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1). 5-commit roadmap, bottom-up: 1. broker.lua — M.token_count helper + per-endpoint capability cache. <endpoint>/tokenize probe with 2s timeout; cache true/false per (endpoint, model) for the session. char/4 fallback on any non-200 / parse-fail / transport err. M.tokenize_supported introspection helper. 2. context.lua — Context.new accepts opts.tokenize_fn; estimate_ tokens widens to use it when set, with per-turn `_tokens` cache. char/4 path unchanged when tokenize_fn nil. 3. context.lua — enforce_budget consults token_budget too (pillar 5 from A1). Loop condition: turns>max_turns OR estimate_tokens >token_budget. Existing summarize-on-evict callback unchanged. 4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true. Closure captures active_cfg upval (A5 — follows :model switches naturally). :cost detail extension: trailing line showing estimated session ctx tokens for comparison with the per-slot prompt_tokens sums in the accumulator. 5. config.lua commented `tokenize = { use_endpoint = true }` example + PHASE8.md status -> Implement. Per-commit risk index covers: probe latency cap (2s, one-shot), per-turn cache correctness (immutable post-append), enforce_budget performance (O(N) per call after cache fill), and the intentional behavior change of token_budget actually being enforced (sessions fitting under char/4 may evict earlier under accurate counts — documented in §9). Two items open at plan, resolve at implement: - exact :cost detail layout for estimated session ctx row - whether to add a :tokenize debug meta (defer unless useful in verify) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:24:41 +00:00
marfrit	79bd40db79	docs/PHASE8-baseline: live /tokenize probes Four findings, all align with formulate/analyze: B1. /tokenize IGNORES the `model` request field — returns the tokenization of whichever model is currently loaded on the proxy backend, NOT the requested model. Acceptable: a real BPE count is still much better than char/4, and the gap between Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s regardless, so cloud falls back to char/4 via the capability cache. B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars. Network round-trip dominates. Per-turn _tokens cache amortizes to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time cost on first enforce_budget call. Acceptable. B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs; we use #response.tokens for count, discard the IDs). JSON not SSE; ffi.curl.M.post is the right call. B4. Cloud /tokenize 404s as expected. Capability cache marks it unsupported on first probe; char/4 fallback silent thereafter. No design change. Q-T5 RESOLVED per B1. All open questions now resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:22:05 +00:00
marfrit	1a136d81b7	docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget) Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs resolved in-place (Q-T5 deferred to baseline). MAJOR FINDING: A1. enforce_budget ONLY checks max_turns, NOT token_budget — even with accurate tokenization, eviction decisions are unaffected. The new estimate_tokens() would just feed the prompt template display. Pillar 5 added: enforce_budget evicts when EITHER max_turns OR token_budget is exceeded. This is the real motivation for accurate tokenization. Other findings: A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err). A3. Single caller of estimate_tokens today; enforce_budget becomes the second (more frequent) caller — per-turn _tokens cache becomes important. A4. Q-T1: cache lives on turn dict; dies with turns on :reset. A5. Q-T2: closure captures active_cfg upval; follows :model switch naturally. A6. Q-T3: opt-out skips the probe entirely (no wiring). A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per session; under-count bounded). A8. _tokens cache invalidation: only :reset; turn content is immutable after append. A9. Probe latency ~50ms/call locally; per-turn cache amortizes to O(1) after first count. A10. estimate_tokens called OUTSIDE streaming callback; no race. A11. role:"tool" turns tokenize identically; per-turn cache works. A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal — different endpoints, different code paths. §1 expanded to 5 pillars (pillar 5 = enforce_budget extension). §3 context.lua row updated to reference the enforce_budget change + per-turn _tokens cache. §9 risk row added: accurate counts mean the default token_budget=4096 is finally ENFORCED — sessions that spilled silently under char/4 may now evict earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:21:24 +00:00
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00
marfrit	1f34b6dce8	config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit `3bad07b` (formulate); not re-applied here. config.lua: - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block with parity to the Phase 1-6 example blocks. - Notes warn flags are independent (R4) and per-turn usage flows to session/*.jsonl for after-the-fact analysis. docs/PHASE7.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits inline for traceability: `7364963` broker: usage capture + opts widening `7b4a9be` context: accumulator helpers `8adebd5` repl: _record_usage + opts.category at 5 sites `b30212a` safety + repl: opts.category for Norris + probe `0d6ff93` repl: :cost meta surface this config example + status bump Phase 7 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:55 +00:00
marfrit	0d6ff93134	repl: :cost meta surface (Phase 7 commit #5 ) User-facing reporter of the per-session accumulator. Three shapes: :cost one-line summary (calls / tokens / cost) :cost detail per-model + per-category breakdown :cost reset zero the meter; clears warn flags All read-only against ctx.usage_totals; no broker calls. R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile cost==0 heuristic. Summary line classifies: cloud only -> "cost=$X.XXXXXX" cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens but no cost field)" local only -> "cost=$X.XXXXXX (local only; no cost field)" R7 — :cost detail rows sort by (cost desc, model asc, category asc). Three-level key for deterministic output across equal-cost rows (table.sort is unstable; identical costs would otherwise reorder). R10 — all dollar values use $%.6f formatting. Sub-cent precision is critical: a Haiku call can cost $0.000028; $%.4f would round it to $0.0000 — indistinguishable from local $0. Column width widened to %-26s to fit fully-qualified cloud model names (e.g. "anthropic/claude-haiku-4.5" = 25 chars). E2E verified against live cloud + local broker: :cost (empty session) -> "0 calls, $0.000000" ...after mixed-mode session... :cost -> "5 calls, prompt=472 / completion=26 tokens, cost=$0.000377 (cloud only; local: tokens but no cost field)" :cost detail -> 4 rows: main cloud $0.000219, probe cloud $0.000128, delegate cloud $0.000030, main local $0.000000 (local). Sort by cost desc within model. :cost reset -> "cost meter reset"; subsequent :cost shows zeros. All 5 categories appeared in the same session: main (twice — cloud + local), delegate, probe (x2 from :safety check). Warn-threshold firing already verified in commit #3 + #4. HELP gains 3 :cost lines. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:24 +00:00
marfrit	b30212af0f	safety + repl: opts.category for Norris + probe (Phase 7 commit #4 ) Closes the last two broker call sites that flow through safety.lua. Together with commits #1-#3, all 7 broker call sites in aish now attribute usage to the cost accumulator under the right category. Changes: safety.lua: - llm_probe (the YES/NO destructive checker) — broker.chat call gains opts.category = "probe". Captures (text, usage) via (reply, second) and, when opts.on_usage is provided AND the call succeeded, routes second through opts.on_usage(model, category, payload). N4 signature chain: opts already flowed through llm_second_opinion -> M.is_destructive from #52's work; opts.on_usage rides along naturally with no further signature change. - M.norris_step (Norris main broker round-trip): * opts to broker.chat_stream gains category = "norris" * probe_opts (passed to is_destructive inside the loop) gains on_usage = helpers.on_usage so the LLM probe's cost lands under "probe" too * on_delta wrapper adds elseif kind == "usage" branch that calls helpers.on_usage(payload.model, payload.category, payload). Coexists cleanly with the existing text (rehydrator) and tool_call branches. repl.lua: - Norris helpers table gains on_usage = _record_usage. The R5 central chokepoint (commit #3) does the warn-threshold check AND ctx:add_usage atomically. - :safety check meta's probe_opts always carries on_usage now (independently of whether secrets_session is set). secrets-aware scrub_msgs/rehydrate added conditionally as before. E2E verified against live broker (safety.llm_model = "cloud"): - :safety check ls -la /tmp -> 2 cloud probe calls - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100" - probe category visible in accumulator (would appear in :cost detail once commit #5 ships the meta). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:01:21 +00:00
marfrit	8adebd52cc	repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3 ) Wires broker.lua's on_delta("usage", payload) and broker.chat's (text, usage) return to the ctx accumulator via a single chokepoint. Changes: - Forward decl `local _record_usage` near _bg_spawn — same pattern; the summarize-on-evict closure in make_summarize_fn (built at line 299) needs lexical access to _record_usage (assigned at line 695), so forward-declare and assign-without-`local`. - _record_usage(model, category, usage) — R5 central chokepoint: routes to ctx:add_usage, then checks the per-threshold warn state. R4: cost_warn_state has two independent flags (dollars and tokens) so first-to-fire doesn't suppress the other. R10: warn message uses $%.6f for sub-cent precision. - call_broker wrapper: wrapped on_delta now branches on kind == "usage" -> _record_usage(payload.model, payload.category, payload). R2: keys by payload.model (set inside broker.lua from model_cfg.model). When fallback fires, broker is called with fb_cfg, so payload.model IS the fallback's name automatically — wrapper doesn't track primary-vs-fallback itself. - 5 caller sites wired with opts.category: ask_ai call_broker -> category="main" summarize-on-evict -> category="summarize" DELEGATE: handler -> category="delegate" :memory summarize -> category="memory_summarize" :delegate meta -> category="delegate" - All 4 broker.chat call sites switched from local reply, err = broker.chat(...) to local reply, second = broker.chat(...) branching on reply nil-ness to interpret second (err on failure, usage on success). Captured usage routes through _record_usage. E2E verified against live cloud broker: - cloud prompt -> reply "Hi! 👋" - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010" - R10 sub-cent precision visible in both numbers. Norris + safety paths still untouched — commit #4 wires those. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:00:06 +00:00
marfrit	7b4a9becc2	context: cost/usage accumulator (Phase 7 commit #2 ) Adds the per-conversation accumulator that broker.lua's on_delta("usage", ...) payload feeds into. No callers yet — commit #3 wires the broker callback to ctx:add_usage in repl.lua, commit #4 in safety.lua. Changes: - Context.new: new fields `usage_totals = {}` and `cost_warn_state = { dollars = false, tokens = false }`. R4: two independent flags so warn_at_dollars firing doesn't suppress warn_at_tokens (or vice versa). - Context:add_usage(model_name, category, usage): Increments usage_totals[model_name][category] slot. R6: when usage.cost is nil (local llama.cpp per B3), sets a sticky `is_local = true` flag on the slot AND does NOT add to cost (preserves the local-vs-cloud-zero distinction for :cost detail annotation). When usage.cost is a number (cloud), accumulates. - Context:total_cost() / total_tokens() — pure-Lua summation across all slots; total_tokens returns (prompt, completion). - Context:reset_usage() — explicit :cost reset path; zeros usage_totals AND clears both flags atomically. - Context:reset() — R8 parity: does NOT clear usage_totals OR cost_warn_state. Matches the Phase 4 memory_items / Phase 6 project rule ("ambient context survives a user-driven conversation reset"). Smoke verified (20/20 unit cases): - Empty zeros; cloud cost accumulation; local nil-cost preserves is_local=true sticky; calls counter; cost summation across multiple cloud calls; is_local sticky after a later nil-cost call on a cloud slot; separate slots per (model, category); :reset preserves; :reset_usage zeros both totals and flags. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:56 +00:00
marfrit	7364963b00	broker: usage capture + opts widening (Phase 7 commit #1 ) Foundation for Phase 7. broker.chat_stream now emits a third on_delta kind ("usage") after the stream completes successfully; broker.chat returns (text, usage). Backward-compatible — existing callers that ignore the new kind / second value continue working via Lua's drop-extra-returns semantics. Changes: - build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`. opts.tools / opts.max_tokens / opts.include_usage / opts.category all live inside opts now. Both internal call sites updated. - opts.include_usage defaults to true for streaming requests; sets `stream_options: { include_usage: true }` in the request body. B1: required for local llama.cpp to emit usage; cloud honors as a no-op (emits anyway). - on_event captures `doc.usage` into a closure-local `final_usage`. N1: the check is INDEPENDENT of the choice/delta branches — local emits usage on choices=[] chunks (choice nil) while cloud emits with non-empty choices + finish_reason. Both shapes funnel here. - After curl.post_sse returns successfully (NOT on transport/api errors), if final_usage is set, emit on_delta("usage", {prompt_tokens, completion_tokens, total_tokens, cost, model, category}). cost is nil for local (R6 preserves the nil vs 0 distinction the accumulator needs). model is model_cfg.model — caller-stable per B4 + R2 so call_broker's fallback retry attributes usage to the fallback's model name without wrapper-side tracking. - M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage" alongside "text"; M.chat returns (text, usage). Without this fix 4 of 5 non-streaming categories (summarize / delegate / memory_summarize / probe) would silently report zero usage. Smoke verified against live hossenfelder:8082: - CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/... - LOCAL chat -> (text, usage); cost=NIL (correct per R6), model=qwen-coder-7b-snappy-8k - CLOUD stream -> on_delta("usage", {...}) with category="test" echoed; model name caller-stable. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:14 +00:00
marfrit	d4c20f09df	docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs Sonnet-reviewed (per the reviews-use-sonnet feedback memory). BLOCKERs (RESOLVED in-place): R1. M.chat would silently return (text, nil) for ALL non-streaming callers — 4 of 5 categories (summarize/delegate/memory_summarize/ probe) flow through broker.chat, NOT chat_stream. §4 now shows the explicit M.chat update that captures kind=="usage" alongside "text" and returns (text, usage). R2. call_broker fallback retry would credit usage to the wrong model name. Fix: broker emits payload.model = model_cfg.model (which IS the fallback's name when called with fb_cfg — chat_stream's upvar). Wrapper keys by payload.model, NOT outer model_name. §4 + §13 commit 3 reflect. R3. build_request has TWO internal callers inside broker.lua itself, not just the public surface. Plan §13 commit 1 risk row now spells this out explicitly so the implementer doesn't read "every caller already passes opts" as "external-only". CONCERNs (FOLDED): R4. Single cost_warn_fired flag covers two thresholds — first-to-fire suppresses the other. Split into ctx.cost_warn_state = { dollars = false, tokens = false }; :cost reset clears both. §7 + §13. R5. Warn-check centralization — single _record_usage helper in repl.lua wraps ctx:add_usage AND does threshold check. safety.lua routes via helpers.on_usage / opts.on_usage callbacks. context.lua stays decoupled from renderer. R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains `is_local = true` (sticky) when ANY recorded usage had cost==nil. `:cost detail` annotation comes from is_local flag, not a fragile cost==0 heuristic. R7. :cost detail sort needs 3-level deterministic key: (cost desc, model asc, category asc) — table.sort is unstable. R8. call_broker fallback passes opts.include_usage unchanged. Documented as known assumption (B1 confirms both backends accept; future-broken fallback can pass include_usage=false). R9. :resume does NOT restore historical usage_totals. Per-turn usage IS in session JSONL for scripting; cross-session aggregation is Q-C2 deferred. Documented in §8. R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000). Widened to $%.6f in §6 + §7 warn message format. NITs (APPLIED): N1. §4 pseudocode comment notes `if doc.usage` branch is independent of choice branch (handles both B2 emission shapes). N2. §2 stale "B7" reference corrected to B3. N3. §13 commit 3 row gains explicit dependency note on commit 1's R1. N4. §13 commit 4 spells out llm_probe -> llm_second_opinion -> M.is_destructive signature chain widening. N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree (`3bad07b`); commit 6 must NOT re-apply. PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for implementation phase pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:55:59 +00:00
marfrit	0f14dc1727	docs/PHASE7: plan — §13 commit roadmap Status: Analyze -> Plan. Q-C4 was the last open question pending baseline; now resolved per B1 (stream_options accepted by both backends; required for local). §13 Implementation Plan added — 6 commits, bottom-up: 1. broker.lua: usage extraction from final SSE chunk; build_request signature widening to (model_cfg, msgs, stream, opts); on_delta ("usage", payload); chat returns (text, usage); opts.category passthrough. 2. context.lua: usage_totals + cost_warn_fired fields; add_usage / total_cost / total_tokens helpers; :reset preserves both. 3. repl.lua: wire opts.category at 5 non-Norris call sites (main, delegate x2, summarize, memory_summarize); on_delta("usage") branch routes to ctx:add_usage. 4. safety.lua: wire opts.category for Norris main broker + is_ destructive LLM probe; helpers.on_usage callback convention (no new module dep — matches #52's scrub_msgs pattern). 5. repl.lua: :cost meta surface + warn-threshold check + HELP. 6. config.lua: commented cost example block + PHASE7.md status bump to Implement. Per-commit risk index covers signature-change blast radius, missed call-site lint, and warn-flag one-shot semantics. Lua's multi- return semantics keep broker.chat backwards-compat automatic. Two items left open at plan, resolve at implement: - is_destructive opts.on_usage vs cfg.helpers threading - per-turn verbose mode (deferred; v1 = :cost on demand only) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:50:39 +00:00
marfrit	2244a3f1ee	docs/PHASE7-baseline: live broker probes for usage shape Real probes against hossenfelder.fritz.box:8082 against both backends. Five findings, all align with the formulate/analyze design — no structural changes. B1. `stream_options.include_usage = true` is safely accepted by both backends. REQUIRED for local llama.cpp to emit usage; no-op for cloud (which emits anyway). Default-true is correct. B2. Two emission patterns observed: - Cloud (Bedrock): usage rides the FINAL delta chunk with non-empty `choices` carrying finish_reason. - Local: usage rides a SEPARATE chunk with `choices: []` preceding `[DONE]`. Both shapes are handled by the same `if doc.usage then ...` check; the existing on_event choices-branch short-circuits safely when choices is empty. B3. `cost` field is dollar-denominated (number) and cloud-only. Local returns `timings` instead (perf, not cost). Accumulator captures `usage.cost` as-is; nil treated as 0. :cost detail annotates local lines so $0 isn't misread. B4. `doc.model` in the usage event reflects the upstream-API-version (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by caller-intended `model_cfg.model`, NOT `doc.model`, for stable cross-call comparison. B5. Usage event is always the LAST data event before `[DONE]`. Emission of `on_delta("usage", ...)` happens after curl.post_sse returns — one call per stream, after all text + tool_calls. Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage` to all backends correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:53 +00:00
marfrit	f0bccdec48	docs/PHASE7: analyze — probe broker surface + resolve Qs in-place Status: Formulate -> Analyze (tree at `3bad07b` probed). 11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline): A1. broker.chat_stream surface clean — usage capture via closure-local + on_delta("usage") emission after curl.post_sse returns. A2. 7 caller sites for opts.category threading (probe / norris / summarize / main / delegate x2 / memory_summarize). A3. build_request signature widens to (model_cfg, msgs, stream, opts) to absorb tools / max_tokens / include_usage / stream_options without further positional growth. A4. Q-C3 RESOLVED: free-form categories (caller decides); matches Phase 6 helpers/skills convention. A5. Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call delay). A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only :cost reset clears. A7. Norris call-graph rewires (commit `955bd82`) — secrets streaming rehydrator wraps only "text" kind; new "usage" kind passes through unchanged. No new entanglement. A8. ctx.usage_totals survives :reset (R8 parity with memory_items, project). A9. Session JSONL inherits the new field automatically (dkjson opaque encoding). A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage. Real probe required for local model — baseline action. A11. Q-C4 deferred to baseline (real broker probe). §2 build_request row updated to mention the A3 refactor. §11 Open Qs table now shows all 6 with resolutions; only Q-C4 remains as a baseline-time probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:03 +00:00
marfrit	3bad07b2da	docs/PHASE7: formulate — cost / usage observability Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7 row (substrate amendment per CLAUDE.md §3, lands in the same commit). Four pillars: 1. Usage capture in broker.chat_stream — extract `usage` from the final SSE chunk (OpenAI streaming spec with `stream_options: {include_usage: true}`). Surface via new on_delta("usage", payload) kind. broker.chat returns (text, usage) — backward- compat: existing callers ignore the second value. 2. Per-session accumulator on ctx — ctx.usage_totals[model][category] tables (categories: main / delegate / summarize / memory_summarize / probe / norris, tagged at the call site via opts.category). :reset preserves usage_totals (R8 parity with memory_items / project). Session JSONL gains an optional `usage` field on assistant turns for after-the-fact analysis. 3. :cost meta surface — :cost (summary), :cost detail (per-model + per-category breakdown), :cost reset (zero the meter). Pure-Lua read of ctx.usage_totals; no broker calls. 4. Optional warn thresholds — cfg.cost.warn_at_dollars / warn_at_tokens emit a one-shot status when crossed. Default off; useful with cloud presets configured. Doc covers scope + done-when criteria, tech decisions table, module changes, per-pillar deep dive with code sketches, UX surface, out of scope, risks, 6 open questions to resolve in analyze. Open at formulate: Q-C1 — provider-without-usage handling (local llama.cpp probably) Q-C2 — cross-session persistence (defer to phase 8) Q-C3 — categories closed-set vs free-form Q-C4 — does hossenfelder forward stream_options to all backends? Q-C5 — warn fires on the call that crosses, or the next one? Q-C6 — :reset clears cost_warn_fired too, or only :cost reset? Scope confirmed via AskUserQuestion: cost/usage observability (chosen over project-local config overlay and session search/tag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:47:58 +00:00
marfrit	955bd82efb	safety + repl: wire secrets into safety.lua (closes #52 ) Closes the last #13 gap — Norris broker call + is_destructive LLM second-opinion probe were the two egress points NOT covered by the scrub-at-egress design in commit `d852aca`. Approach: option (b) per #52's fix sketch — callback-via-helpers/opts. safety.lua does NOT gain a require("secrets") dependency (acceptance criteria 3); integration is purely through the convention the rest of the helpers table already uses. safety.lua changes: - llm_probe gains an opts table. When opts.scrub_msgs is set, the {system, user(cmd)} message pair is scrubbed before broker.chat. When opts.rehydrate is set, the YES/NO reply is rehydrated before parsing (defensive — the verdict shouldn't carry placeholders but rehydration is a safe no-op if it doesn't). - llm_second_opinion threads opts through to llm_probe. - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is backwards-compatible (no scrub, original behavior). - M.norris_step: * outbound broker.chat_stream message scrubbed via helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided. * on_delta wrapped with helpers.streaming_rehydrator():push / :flush so the user sees rehydrated text AND text_parts accumulates rehydrated chunks (parity with ask_ai in repl.lua). * both M.is_destructive call sites (tool_call probe + CMD: probe) now pass probe_opts = {scrub_msgs, rehydrate} when the helpers carry them. repl.lua changes: - Norris helpers table gains scrub_msgs / rehydrate / streaming_rehydrator closures, all nil-safe (return identity / nil when secrets_session is nil). - :safety check meta passes probe_opts to is_destructive when secrets_session is configured. Without secrets, behavior unchanged. Unit-test verified end-to-end: - Stubbed broker.chat captures the messages it receives. - Without opts: probe SEES `ghp_realsecretvalue_...` (control). - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:40:30 +00:00
marfrit	ac58b19da2	config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: `c4fc7fd` context: compose_project plumbing `d1dce83` _scan_project_tree + :tree + auto_tree hook `4d5f93a` :diff + _git_clean_cmd (B1 helper) `0d63f01` expand_mentions @<r1>..<r2> tiered resolution `11d0e59` tree-sitter highlighter (renderer fence filter + highlighted dispatch + :highlight meta) this config example + status bump Phase 6 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests against the live broker on each pillar plus filing of issues for any defects, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:58 +00:00
marfrit	11d0e599cd	repl + renderer: tree-sitter highlighter (Phase 6 commit #5 ) The largest Phase 6 commit — fence-aware stream filter in renderer.lua + external tree-sitter dispatch + :highlight meta in repl.lua. renderer.lua — fence-aware filter wrapping assistant_delta: M.set_highlight(enabled, detected, highlight_fn) Called by repl.lua at startup AND on every :highlight toggle. Stores state in module-locals (off by default). State machine inside _hl_push: outside: pass chunks through; HOLD trailing partial-fence chars (per R1 — local llama.cpp splits ```python as `'``'` then `'`python\n'`, so naive pass-through drops the leading "``" and never recovers). inside: buffer cumulatively until "\n```" appears; emit highlight_fn(body, lang) then the closing fence verbatim. Recursive call handles "rest" after the closing fence. N1: fences only open at start-of-stream OR after a newline (`^```` or `\n```` only). Inline backticks in prose ("use ``` to mark code") do not open a fence. R3 (PTY raw-mode toggle per highlight call): no change here — every executor.exec call already toggles raw-mode (existing behavior since Phase 1). The risk is theoretical; smoke-test interactively after install if multi-fence renders show flicker. assistant_flush handles end-of-stream gracefully: drains any held partial-fence tail OR an unterminated inside-fence buffer. repl.lua — _detect_treesitter + highlighted + :highlight meta: _detect_treesitter() one-shot popen probe of `tree-sitter --version`. Run once at startup; cached as highlight_detected. highlighted(body, lang_tag) R2-placed in repl.lua (has _shq + executor access). Translates the fence tag (`py`, `python`, `lua`, etc.) to a canonical lang via LANG_TAG, picks the canonical extension via LANG_EXTENSION, writes body to a tmpfile with that extension, runs `tree-sitter highlight <tmpfile>` via executor.exec, returns the output. On ANY failure (CLI absent, non-zero exit, empty output), returns `body` unchanged — silent pass-through. R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help` on noether; confirmed: - NO `--lang` flag exists (formulate-time assumption wrong) - takes a PATH; language inferred from file extension - alternative `--scope source.X` exists but also unreliable without configured grammars Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]` and pass the path. Matches the documented upstream contract. B4-followup: even with the CLI installed, highlighting requires `~/.config/tree-sitter/config.json` parser-directories with cloned + built `tree-sitter-<lang>` grammars. Without parsers, every call exits non-zero and we silently pass through. The :highlight install hint surfaces all three install steps so the user knows what's actually needed. :highlight [on\|off\|status] meta: no arg -> flip on/off -> set explicit status -> report toggle + CLI detection state When toggled on AND CLI absent: emit a 4-line install hint (CLI install, init-config, grammar clone reminder). When toggled on AND CLI present: emit a 1-line note that parser-directories must be set up for actual highlighting. HELP gains :highlight entry. Tested: 10/10 unit cases on the renderer state machine, including: - plain prose passthrough - single-chunk fence - B2 split fence ("``" + "`python\n" + "x=42" + "\n```") - N1 SOL anchor (mid-line ``` does not open) - trailing \n properly emitted across chunks - SOL-only fence open - prose after closing fence preserved - two fences in one stream - highlight off = passthrough (callback never fires) E2E :highlight meta verified: :highlight status -> off / detected :highlight on -> toggles + emits parser-dir reminder :highlight status -> on / detected :highlight off -> off Regression: test_safety 87/87, test_router_model 31/31, repl loads. Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config example block + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:04 +00:00
marfrit	0d63f01601	repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4 ) Per A6 (tiered resolution): @<token> tries file lookup first; if the file doesn't exist AND the token contains "..", retry as a git ref-range and substitute with a fenced `diff` block. Preserves the existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels the comma, resolves the ref, restores the comma after the closing fence). Resolution order for @<token>: 1. io.open(token, "rb") -- file lookup, with trailing-punct peel 2. if (1) fails and token contains "..": git --no-pager -c color.ui=never diff <r1>..<r2> on exit 0 + non-empty body: substitute as ```diff fenced block 3. else: leave literal `@token` + emit "[aish] @X: not found" status Examples: @README.md -> file (path branch) @../sibling.txt -> file (path branch; `..` only triggers retry when path lookup FAILS, so existing paths with `..` segments are unaffected) @HEAD~1..HEAD -> diff (path fails, ref succeeds) @origin/main..feature -> diff (path fails — no such literal file; ref succeeds; `/` in ref is fine because we don't use the path's `/`-absence as a discriminator) @nonsense..gibberish -> literal preserved (both fail) Required restructuring: - _shq and _git_clean_cmd lifted from M.run closure scope to module scope (above expand_mentions). Single source of truth for the B1 prefix shared with commit #3's :diff. The in-M.run duplicates are removed. - expand_mentions now references `executor` (already required at module scope on line 7) for the diff retry. Status messages updated: - File expansion: "@<path> expanded (N bytes, truncated)" (existing) - Diff expansion: "@<path> expanded (N bytes, diff)" (new) Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14): ref-range expansion shape, body contains `diff --git`, trailing prose preserved, @../path stays as file (not diff), neither-path- nor-ref preserves literal, trailing-comma peel composes with ref retry. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:20:25 +00:00
marfrit	4d5f93aaa5	repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3 ) User-driven git diff injection. The model sees the diff on the next ask_ai turn through the existing exec_output channel. Changes: - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree. B1: every git invocation that flows into context MUST use `--no-pager -c color.ui=never`. Forkpty makes git think stdout is a TTY, enabling both color and the pager's keypad/line-clear escapes — these would pollute the captured context block. The helper is the single chokepoint; commit #4's @<r1>..<r2> retry will reuse it. - :diff [<args>] meta: - Reads cwd at meta invocation (R6: differs from :tree's scan-time cwd capture; documented in §5). - Runs `_git_clean_cmd("diff " .. args)` via executor.exec. - Empty output -> "(no diff): <label>" status, no context append. - Non-zero exit -> "diff failed (exit N): <label>" status, no context append. git's stderr already streamed to the user via executor.exec's live multiplex, so the failure reason is visible. - Success -> appends "[diff <label>]\n<output>" via ctx:append_exec_output. Label is "(working tree)" for empty args, else verbatim args. - Status confirms injection size: "diff injected: <label> (N bytes)". - HELP gains :diff line with three example arg shapes; N3-resolved (no `staged` alias — the meta is thin pass-through to git's grammar). Smoke verified across four scenarios in an ephemeral test repo: - Working-tree dirty -> 110-byte diff injected, no ANSI escapes - --cached -> 118-byte staged diff injected, clean - garbage..nonexistent -> exit 128, status + skip - Clean working tree -> "(no diff)", status + skip Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:17:18 +00:00
marfrit	d1dce832da	repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2 ) First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '/.' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>\|refresh\|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:14:36 +00:00
marfrit	c4fc7fde01	context: [project] block plumbing (Phase 6 commit #1 ) Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:08:54 +00:00
marfrit	261b230be8	docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs Independent agent review of PHASE6 (manifest + baseline + plan at `4407029`). Status header: Plan -> Plan + review fold-in. BLOCKERs (RESOLVED in-place): R1. §4 fence detector's `outside`-state dropped the leading `'``'` chunk of a split fence — contradicted B2's local-model split-fence requirement (4-char median chunk size). Algorithm rewritten: outside-state now holds a tail (up to 10 chars) when the chunk's suffix could be a fence prefix; flushes on next push. Same accumulator pattern as the secrets streaming rehydrator. R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives in repl.lua (where _shq and executor are accessible); renderer.lua exposes set_highlight(enabled, detected, highlight_fn) and calls back. Keeps renderer.lua free of the executor require. CONCERNs (FOLDED): R3. PTY raw-mode toggle on every code-block render — smoke-test for cursor flicker / SIGWINCH races before locking in. Risk row 5. R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream CLI canonically takes a path with extension. Implement-time check required; fallback path documented (extension-based tmpfile + path arg). Added to risk row 5 + open-at-plan. R5. :tree off semantics clarified — one-shot clear of ctx.project + ctx._project_opts; no "disabled" flag. R6. cwd-coupling difference between :diff (call-time) and :tree (scan-time) now documented in §5. R7. :tree refresh opts caching specified — caches ctx._project_opts; `:tree refresh` reuses last explicit opts. R8. :reset preserves ctx.project (parity with memory_items per Phase 4). §12 commit 1 smoke updated. R9. Status-bump duplication between §12 commits 5e and 6 resolved — commit 6 owns the bump. NITs (APPLIED): N1. §4 algorithm pseudocode now includes SOL/post-newline anchor (mid-line backticks in prose don't open a fence). N2. _detect_treesitter() gained a comment explaining the popen pattern doesn't gate on exit code (B3). N3. :diff staged shorthand dropped — meta is a thin pass-through to git's own grammar. N4. _scan_project_tree switched from `cd && git ...` to `git -C <dir> ...` — no subshell, more idiomatic. N5. Open-at-plan dir-arg bullet dropped (already decided in §6); replaced with R3 + R4 implement-time verification items. N6. §11 wording on #52 left as-is (cosmetic only). PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for implementation phase 6 of the inner loop pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:06:19 +00:00

1 2 3

148 Commits