marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	c55077bc07	context + repl + config: route-aware context compression (closes #87 ) Small local models effectively use a fraction of their advertised context window. Per-request compression for routes that hit a local-compress-flagged model preset: keeps only the last N turns and tail-truncates oversized content. Cloud routes get the full context unchanged. Changes: - context.lua _compress_turns(turns, keep, max_chars): returns a new list (self.turns NEVER mutated) with the last `keep` turns preserved + content tail-truncated to `max_chars`. Defensive: drops tool turns at the slice head (orphaned without their assistant-with-tool_calls anchor — strict chat templates would reject them; same gotcha PHASE0 §6 warned about for user/user). - Context:to_messages(opts) — opts.compress = { keep_turns, max_turn_chars } swaps the turn iterable for the compressed view. Affects BOTH the use_tool_role=true path and the use_tool_role=false fallback (PHASE2.md Q18 strict-template workaround). Persistence + display via :history see the full uncompressed ctx.turns. - repl.lua ask_ai: when req_cfg (the routed model's cfg) has `local_compress = true`, build compress_opts from config.context.compress (defaults keep_turns=2, max_turn_chars=800). Pass through ctx:to_messages alongside the existing system_prompt_override (#86) — orthogonal opts that compose. - Norris unaffected: safety.norris_step builds its own messages array; the planner needs full history per PHASE3 design. - config.lua gains a header comment explaining the per-model opt-in + the context.compress defaults block + the documented tool-turn truncation trade-off. 13 unit cases verified: - no opts -> full turn list (no regression) - keep_turns=2 -> exactly last 2 emitted - long content tail-truncated to max_chars - self.turns unchanged after render - orphan tool-turn at slice head dropped (no chat-template violation) - tool turn included WITH its assistant anchor when keep_turns >= 3 E2E against live local broker: - models.fast.local_compress = true; keep_turns=1; max=200 - 4-turn session: each broker call sees ONLY the current turn (verified by short coherent CMD replies despite no cross-turn memory available to the model). FR-promised small-model friendliness in action; conversation continuity is the documented trade-off. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:50:07 +00:00
marfrit	74e4bffb37	broker + repl + safety: GBNF grammar-sampling passthrough (closes #88 ) llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:00:36 +00:00
marfrit	047d629a66	context + repl + config: per-class system_prompt override (closes #86 ) Small local models follow precise structured instructions better than natural language. Per-routing-class system_prompt override gives them tighter instructions for THAT request while preserving ambient context. Changes: - Context:to_messages(opts) — opts.system_prompt_override REPLACES the base system_prompt for THIS render only (state unchanged). Dynamic blocks ([background], [project], [earlier summary], NORRIS suffix) still compose on top. opts is optional; nil-safe for old callers. - repl.lua ask_ai — captures req_class from router.classify_model (already returned by Phase 5; previously discarded after the status line). Looks up config.routing.system_prompts[req_class]; passes as opts.system_prompt_override to ctx:to_messages each iteration of the tool-sub-loop. - Gating: override fires only when routing.auto is on (no class -> no override). If system_prompts[class] absent for a class, fall through to the default system_prompt (no surprise). - Norris unaffected: safety.norris_step builds its own messages array; doesn't go through this path. - config.lua gains a commented-out example showing routing.system_ prompts with the code/default examples from the FR body. Smoke verified: - 12-case context.lua unit test: opts nil/absent/present, override replaces base, dynamic blocks still compose, state unchanged after call, Norris-mode coexistence (suffix still present; background still suppressed). - E2E against cloud broker with routing.system_prompts.code set: triple-backtick prompt -> code class -> override fires; model emits terse code-only output. Non-code prompt -> default class -> no override -> normal verbose-ish reply. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 05:41:15 +00:00
marfrit	df59ee2f2c	config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4 ) config.lua header gains a Phase 9 paragraph documenting the project-overlay feature + the R7 shallow-merge warning ("if your .aish.lua sets a top-level block, it REPLACES the user's entire block — list every entry OR omit the block"). Inspect at runtime via `:config show`. docs/PHASE9.md status header bumped: "Plan + review fold-in" -> "Implement". Lists the 4 implement commits inline: `e525063` history: trust file helpers `34b465d` main: project-overlay loader `5b6ee55` repl: :config show meta + HELP this config template comment + status bump Phase 9 implementation complete. Next inner-loop step: verify (file TCs, run autonomous, close) + memory-update. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:53 +00:00
marfrit	5b6ee553db	repl: :config show meta + HELP (Phase 9 commit #3 ) User-facing diagnostic for the project-overlay layer. Reads config._sources (R3 cfg-embedded by main.lua's load_config_with_ overlay in commit #2) + the effective config; surfaces which file contributed each top-level key. :config show top-level keys + which source set each (nested tables collapsed to inner-key list) :config show full recursive dump with sensitive-key masking Masking heuristic (any key containing token/secret/auth/key, case-insensitive) -> "(set)" instead of the value. R6: applied RECURSIVELY in full mode so the actual leak vector (mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught. Defensive depth cap (5) prevents pathological recursion. When config._sources is absent (caller didn't go through load_config_with_overlay), status: "(unknown — main didn't pass _sources)" — meta still runs, just labels source as "?". N2 known cosmetic false-positive: `key_env` / `auth_env` config fields hold env-var NAMES (not secrets) but match the heuristic. Future polish exempts `*_env` patterns. Same for `token_budget` (contains "token") — also masked despite being a plain number. Acceptable; errs toward over-masking. HELP gains 1 :config line. E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: A. No project overlay: 6 user keys; nested tables collapsed. `secrets` masked as (set) at top level. B. Project overlay accepted: source map cleanly partitioned (user has 4 keys; project has 2 — default_model + models); each top-level row tagged [user] or [project]. C. :config show full: nested dump; auth_token in models.cloud correctly masked as (set); SECRET_VAL never appears in output (grep count = 0). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #4 next: config.lua template comment + PHASE9.md status header -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:30 +00:00
marfrit	34b465d6dc	main: project-overlay loader (Phase 9 commit #2 ) Wires the project-overlay step around the existing load_config. Activates only when a trusted .aish.lua is found in/above cwd. Changes: - _find_project_config() walks libc.getcwd() up to $HOME, returning first .aish.lua found. R1 fix folded: proper-prefix check (`dir == home OR dir starts with home .. "/"`) avoids the false positive where /home/user2 matches HOME=/home/user via byte prefix. - _trust_file_path() resolves via $AISH_TRUST_FILE env override, else ~/.aish/trusted-projects. Plan-time decision per N3. - _check_and_maybe_prompt(project_path, history) — calls history._sha256_file ONCE; routes through history.is_trusted; on miss prompts via rl.readline; on accept persists via history.add_trusted. A8 mitigation: if rl.readline fails to load, decline silently (no io.read fallback that would consume stdin). - load_config_with_overlay(opts): * Calls existing load_config; seeds sources={k="user", ...} * Walks for .aish.lua; if found: - In opts.prompt mode (-p, R2): skip the prompt entirely; only PRE-TRUSTED overlays load. Avoids io consuming the piped stdin that -p will read for context. - Else: interactive trust check + prompt. * On accept + successful dofile: shallow-merge top-level keys ONTO user config; update sources[k]="project" for overlapping. * R3: embeds sources on cfg._sources for repl.lua's :config show meta to read. No global. * Returns (cfg, user_path, project_path \| nil). - main() now calls load_config_with_overlay; on project layer active, emits the "[aish] project config: <path> (overlaid on <user>)" status line per A4 (AFTER the user-config status). E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: 1. Decline -> overlay skipped; user config active. 2. Accept -> overlay loaded; project_model active; status line "[aish] project config: ... (overlaid on ...)" visible. 3. Re-startup -> NO prompt (cached via sha); overlay loaded transparently. R4 single-sha-call confirmed. 4. -p mode with untrusted overlay -> skipped silently; piped stdin preserved for run_one_shot. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #3 lands :config show + HELP next; commit #4 the config template comment + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:48:22 +00:00
marfrit	e525063df3	history: trust file helpers for Phase 9 (commit #1 ) Foundation for the project-overlay trust mechanism. No callers yet — commit #2 wires main.lua to use these. Three new functions: history._sha256_file(path) -> hex digest or nil Shells `sha256sum`; parses first whitespace-separated field; validates 64-hex-char length. nil on any failure (path missing, binary missing, file unreadable). Caller treats nil as "skip the trust path" — never crashes. history.is_trusted(trust_path, project_path, sha256) -> bool Reads trust_path as JSONL; returns true iff an entry exists matching BOTH project_path AND sha256. Missing / corrupt / unreadable trust file -> false (re-prompt). Per-line JSON decode means partial-write corruption affects at most one line. history.add_trusted(trust_path, project_path, sha256) -> bool mkdir -p parent; append JSONL line {path, sha256, ts (ISO)}; chmod 600 the trust file (best-effort; ignore failure). Single writer per call; append-only. 11 unit cases verified: - sha256 known value matches manual `sha256sum` - nil / missing-file -> nil (no crash) - is_trusted on missing trust file -> false - add_trusted + is_trusted roundtrip works - Different sha -> not trusted (content-binding) - Different path -> not trusted - Multi-entry trust file: each entry independently checked - chmod 600 verified via stat Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:45:07 +00:00
marfrit	e796142a23	docs/PHASE9: review fold-in — 0 BLOCKERs + 7 CONCERNs + 5 NITs Sonnet review of PHASE9 (formulate + analyze + baseline + plan at `31e5de5`). No BLOCKERs (manifest design sound); seven real CONCERNs including a path-prefix bug + a piped-stdin interaction that would have surfaced at implement time. CONCERNs (FOLDED): R1. HOME-prefix walk-up false positive — dir:sub(1, #home) ~= home matches /home/user2 when HOME=/home/user. Real bug. Fix: `dir ~= home and dir:sub(1, #home + 1) ~= home .. "/"`. R2. A8's io.read("*l") fallback for trust prompt would consume the first line of piped stdin in aish -p mode. Fix: SKIP trust prompt in one-shot mode (load only pre-trusted overlays). If rl.readline misbehaves interactively, emit status + skip overlay (no fallback to stdin in either mode). R3. Sources-map delivery decided: cfg-embedded as config._sources. Globals across module boundaries explicitly avoided. Backward- compat: if absent, :config show reports "(sources unknown)". R4. _prompt_trust signature fixed — takes pre-computed sha; single sha256 call per startup per project file. R5. _check_trusted no longer reimplements trust-file read logic; routes through history.is_trusted / history.add_trusted with AISH_TRUST_FILE env override (single resolution site). R6. :config show `full` mode masking now spec'd: same heuristic applied RECURSIVELY to nested values (mcp.servers.X.auth_token is the actual leak vector). R7. Shallow-merge UX trap reframed — was "documented as predictable"; now an explicit conspicuous warning in done-when + UX surface + config.lua template that "if your .aish.lua sets a top-level block, it REPLACES the user's entire block". Deep-merge with explicit-replace-syntax v2 polish. NITs (APPLIED): N1. (no doc change — review-prompt clarification only) N2. key_env / auth_env over-masking documented as known cosmetic false-positive (env-var names, not secrets). N3. Sources-map decision added to open-at-plan-time before falling-into-commit-2 surprise. N4. Trust-file first-write atomicity edge case documented (manual delete to recover); temp-file+rename = v2. N5. Stale "stat" mention in §3 module table removed (A2: io.open is sufficient; no new FFI). Code sketches in §4 + §5 + §6 + §13 commits 2+3 all updated to reflect the fixes. Manifest is internally consistent + matches the history.lua API to be added in commit 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:44:20 +00:00
marfrit	31e5de5ad5	docs/PHASE9: analyze + baseline + plan (single bundled commit) Bundled the three doc steps since the surface is small (4-commit impl, no major redesigns from formulate). Analyze findings (12, A1-A12): A1-A2 — main.lua surface clean; no new FFI needed A3 — Q-P2 RESOLVED via baseline: sha256sum (GNU coreutils) A4 — Q-P1: trust prompt AFTER user-config status line A5 — Q-P3: don't log walk-up by default; :config show on demand A6 — Q-P5: :cfg show top-level by default; `full` for deep A7 — Q-P6: project may set secrets.vault (covered by trust prompt) A8 — Q-P4 DEFERRED: rl.readline early-startup smoke at impl time A9 — walk-up perf <1ms even pessimistic A10 — trust-file race: JSONL append-only handles concurrent writes A11 — sandboxed dofile out of scope (trust prompt IS the gate) A12 — bootstrap order is correct: user→project→secrets_session Baseline: B1 — sha256sum + openssl agree byte-for-byte on noether; sha256sum chosen (universal + simpler parse). §10 Open Qs table now shows resolutions inline (5/6 done; Q-P4 deferred to implement-time smoke). §13 Implementation Plan added — 4 commits: 1. history.lua: trust file helpers (read/add/is_trusted + _sha256_file) 2. main.lua: walk-up + load_config_with_overlay + trust prompt 3. repl.lua: :config show meta + startup status line 4. config.lua header note + status -> Implement Per-commit risk index covers sha256sum-missing case, JSONL partial write, A8 rl.readline early-startup, symlink-loop walk-up, :config show token leakage via conservative masking heuristic. Open at plan-time (resolve at impl): - A8 rl.readline behavior; fall back to io.read if broken - $AISH_TRUST_FILE env override for CI isolation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:38:10 +00:00
marfrit	4f5c3aeba9	docs/PHASE9: formulate — project-local config overlay (.aish.lua) Phase 9 formulate manifest + PHASE0 §11 amendment (adds Phase 9 row) + PHASE0 §10 amendment (config resolution order now references Phase 9's overlay step). Substrate-touch lands same commit per CLAUDE.md §3. Four pillars: 1. .aish.lua walk-up from cwd; stops at $HOME or filesystem root. First found file becomes the project layer. Absence = no-op. 2. Shallow merge over user config: project top-level keys REPLACE user keys. Predictable; deep merge surprises with array/table semantics. Users compose full blocks explicitly. 3. Trust prompt + sha256-pinned persistence in ~/.aish/trusted- projects (JSONL, mode 0600). First encounter prompts; subsequent startups load only if recorded sha matches. Content change -> re-prompt. Matches direnv-allow security posture. 4. :config show meta — lists each source path with the top-level keys it contributed + sanitized effective config dump (token-bearing fields masked). Key design decisions documented: - Trust mechanism is explicit (not default-trust-all-cwds) — .aish.lua runs arbitrary Lua via dofile; hostile cloned-repo case is a real concern. - $HOME boundary on walk-up — don't search /tmp or /. Repos outside $HOME get no project layer. - Reload on cd: NO. Config resolved at startup only. - sha256 via shelled `sha256sum` (POSIX-portable; avoid vendoring a Lua impl). §9 risk table covers: hostile repo (trust prompt), corrupted trust file (best-effort skip), updated repo (sha mismatch re-prompts), dofile errors (pcall-protected), walk-up safety ($HOME boundary). 6 open questions for analyze: Q-P1 — trust prompt before/after startup status Q-P2 — sha256sum vs openssl dgst (baseline) Q-P3 — log walk-up path? Q-P4 — rl.readline safe at startup? Q-P5 — :config show full vs top-level Q-P6 — project-set secrets.vault security Scope confirmed via AskUserQuestion: project-local overlay (chosen over cost preflight enforcement and cross-session cost persistence, both deferred as Phase 10 candidates per §11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:36:35 +00:00
marfrit	08dba69fce	config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5 ) config.lua: - Commented-out `tokenize = { use_endpoint = true }` block with parity to the Phase 1-7 example blocks. - Documents the two consequences: (1) per-turn network cost (~30ms first time, cached after) and (2) token_budget is now actually enforced — sessions that fit under char/4 may evict earlier under accurate counts. - Notes cloud /tokenize 404 fallback path. docs/PHASE8.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 5 implement commits inline for traceability: `7ef2a6e` broker: token_count + endpoint cache `8502517` context: tokenize_fn + _tokens cache `db26d0c` context: enforce_budget honors token_budget (R2 guard) `94b7d86` repl: wire tokenize_fn + :cost detail estimate row this config example + status bump Phase 8 implementation is complete. Resolves Q1 (PHASE0 §13, originally Phase 3, deferred forward). Next inner-loop step: verify (7) — file test cases, run autonomous, close. Then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:32:16 +00:00
marfrit	94b7d86926	repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4 ) Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised :cost detail trailing line. Changes: - When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is set to `function(text) return broker.token_count(active_cfg, text) end` before Context.new fires. R4: the closure body references active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call time, so subsequent :model switches re-route to the new model's tokenizer automatically (verified by E2E: :model cloud after the fast call still produces clean estimate row). - :cost detail gains a trailing line per R3: estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used) N comes from ctx:estimate_tokens() (current in-memory snapshot, NOT a comparison against the accumulator sum above which is cumulative across calls + evicted turns). Gives at-a-glance budget utilization. E2E verified against live broker: - fast model call -> 168 tokens estimated (real BPE via /tokenize) - :model cloud + cloud call -> 178 tokens estimated (closure follows :model switch correctly per R4) - 21% / 22.3% budget utilization shown - Accumulator sums and estimate are intentionally different (sums are cumulative, estimate is current snapshot) — R3- correctly displayed as separate lines Regression: test_safety 87/87, test_router_model 31/31, repl loads. With this commit landed, Phase 8 is functionally complete; commit #5 is config example + status bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:31:40 +00:00
marfrit	db26d0ccb7	context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3 ) Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:30:37 +00:00
marfrit	8502517021	context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2 ) Foundation for accurate Context:estimate_tokens. When the optional tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua), estimate_tokens uses it with per-turn caching for O(1) amortized cost. char/4 path unchanged when tokenize_fn nil. Changes: - Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn. - Context:estimate_tokens: if tokenize_fn nil -> existing char/4 (no behavior change). if tokenize_fn set -> - tokenize self.system_prompt every call (dynamic per compose_background/project/summary; can't cache). - for each turn: if t._tokens nil -> compute + cache; else use cached. Turn content immutable after append (we never mutate stored turns) so cache never goes stale. - :reset wipes self.turns which takes the _tokens cache with them; new turns start with t._tokens == nil and lazy-set on first count. 8/8 unit cases verified: - char/4 path unchanged when no tokenize_fn - tokenize_fn called 1+ N times on first estimate (sys + N turns) - subsequent estimates fire only 1 tokenize call (sys; turns cached) - new turn fires +1 tokenize call on next estimate - :reset + fresh turn fires fresh tokenize call (cache died with turn) No callers wire tokenize_fn yet — Phase 8 commit #4 lands the repl.lua wiring (after commit #3 adds the enforce_budget extension that's the real beneficiary of accuracy). Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:56 +00:00
marfrit	7ef2a6ed5c	broker: token_count + endpoint capability cache (Phase 8 commit #1 ) Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize where supported, char/4 fallback otherwise. Changes: - `M.token_count(model_cfg, text)`: Empty text -> 0. No endpoint -> char/4 immediately. Capability cache says false -> char/4. Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`, 2s timeout. On 200 + parseable `{tokens=[...]}`: cache true, return #tokens. Anything else (non-200 / parse-fail / transport err / timeout): cache false, char/4. - `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1 confirmed /tokenize ignores the model field, so same-endpoint presets share one cache entry. If a future broker honors the model field, revisit. - `M.tokenize_supported(model_cfg)`: returns nil/true/false for the cached state (introspection for tests + future :tokenize meta). - `M._reset_tokenize_cache()`: test hook so the session-local cache doesn't leak between test runs sharing a LuaJIT VM. Live verified against hossenfelder + a deliberately-broken endpoint: - "hello world" -> 2 tokens (matches manual curl probe) - 901-char text -> 201 real tokens vs 225 char/4 (24-token gap; real is LOWER here, opposite direction from the README probe where it was higher — confirms heuristic is inaccurate in both directions) - Pre-probe: tokenize_supported() returns nil - Post-probe: tokenize_supported() returns true (local) / false (broken) - Broken endpoint second call: still char/4, no re-probe - Empty / nil text edge cases handled Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:17 +00:00
marfrit	467e573d24	docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs Sonnet-reviewed per reviews-use-sonnet memory directive. BLOCKERs (RESOLVED in-place): R1. §5 estimate_tokens pseudocode missing per-turn cache pattern. Prose described it; code block called tokenize_fn unconditionally. Implementer following code verbatim would hit the O(N round- trips per call) perf gap the prose flagged. Code block now shows explicit `if t._tokens then ... else t._tokens = ... end`. R2. enforce_budget loop can spin forever when system_prompt alone exceeds token_budget (e.g. 5KB project block + budget=4096 + zero turns -> turns can't shrink further but OR-condition stays true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit 3 row shows the explicit Lua-syntax condition. CONCERNs (FOLDED): R3. :cost detail per-slot ~est=N annotation was semantically undefined — accumulator sum (cumulative across calls + evicted turns) vs current-snapshot estimate are incommensurable. §6 reworked: ONE trailing summary line "[estimated session ctx: N tokens; token_budget=M (X% used)]" instead of per-slot annotations. §13 commit 4 aligned. R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT capture by value). Subtle but easy to miss — §13 commit 4 now spells out the correct vs wrong patterns explicitly. R5. 2s tokenize timeout can spuriously cache-as-unsupported when llama.cpp is busy with a concurrent completion (single-threaded inference; /tokenize queues behind). Documented in §9; v1 ships 2s, revisit during verify if it bites. R6. Per-endpoint cache key conflated two same-endpoint/different- model presets (B1: /tokenize ignores the model field). Cache key simplified to endpoint-only. One probe per endpoint per session; if a future broker honors the model field, revisit. NITs (APPLIED): N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`. N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1). N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach (trailing summary line; per-slot annotation dropped). N4. Status header tree-hash updated to current (`aa64ad3` -> stays fresh through review fold-in; commit 5 will refresh again at "Implement" status). PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for implementation phase 6 of the inner loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:27 +00:00
marfrit	aa64ad3eec	docs/PHASE8: plan — §13 commit roadmap (5 commits) Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1). 5-commit roadmap, bottom-up: 1. broker.lua — M.token_count helper + per-endpoint capability cache. <endpoint>/tokenize probe with 2s timeout; cache true/false per (endpoint, model) for the session. char/4 fallback on any non-200 / parse-fail / transport err. M.tokenize_supported introspection helper. 2. context.lua — Context.new accepts opts.tokenize_fn; estimate_ tokens widens to use it when set, with per-turn `_tokens` cache. char/4 path unchanged when tokenize_fn nil. 3. context.lua — enforce_budget consults token_budget too (pillar 5 from A1). Loop condition: turns>max_turns OR estimate_tokens >token_budget. Existing summarize-on-evict callback unchanged. 4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true. Closure captures active_cfg upval (A5 — follows :model switches naturally). :cost detail extension: trailing line showing estimated session ctx tokens for comparison with the per-slot prompt_tokens sums in the accumulator. 5. config.lua commented `tokenize = { use_endpoint = true }` example + PHASE8.md status -> Implement. Per-commit risk index covers: probe latency cap (2s, one-shot), per-turn cache correctness (immutable post-append), enforce_budget performance (O(N) per call after cache fill), and the intentional behavior change of token_budget actually being enforced (sessions fitting under char/4 may evict earlier under accurate counts — documented in §9). Two items open at plan, resolve at implement: - exact :cost detail layout for estimated session ctx row - whether to add a :tokenize debug meta (defer unless useful in verify) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:24:41 +00:00
marfrit	79bd40db79	docs/PHASE8-baseline: live /tokenize probes Four findings, all align with formulate/analyze: B1. /tokenize IGNORES the `model` request field — returns the tokenization of whichever model is currently loaded on the proxy backend, NOT the requested model. Acceptable: a real BPE count is still much better than char/4, and the gap between Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s regardless, so cloud falls back to char/4 via the capability cache. B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars. Network round-trip dominates. Per-turn _tokens cache amortizes to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time cost on first enforce_budget call. Acceptable. B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs; we use #response.tokens for count, discard the IDs). JSON not SSE; ffi.curl.M.post is the right call. B4. Cloud /tokenize 404s as expected. Capability cache marks it unsupported on first probe; char/4 fallback silent thereafter. No design change. Q-T5 RESOLVED per B1. All open questions now resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:22:05 +00:00
marfrit	1a136d81b7	docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget) Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs resolved in-place (Q-T5 deferred to baseline). MAJOR FINDING: A1. enforce_budget ONLY checks max_turns, NOT token_budget — even with accurate tokenization, eviction decisions are unaffected. The new estimate_tokens() would just feed the prompt template display. Pillar 5 added: enforce_budget evicts when EITHER max_turns OR token_budget is exceeded. This is the real motivation for accurate tokenization. Other findings: A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err). A3. Single caller of estimate_tokens today; enforce_budget becomes the second (more frequent) caller — per-turn _tokens cache becomes important. A4. Q-T1: cache lives on turn dict; dies with turns on :reset. A5. Q-T2: closure captures active_cfg upval; follows :model switch naturally. A6. Q-T3: opt-out skips the probe entirely (no wiring). A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per session; under-count bounded). A8. _tokens cache invalidation: only :reset; turn content is immutable after append. A9. Probe latency ~50ms/call locally; per-turn cache amortizes to O(1) after first count. A10. estimate_tokens called OUTSIDE streaming callback; no race. A11. role:"tool" turns tokenize identically; per-turn cache works. A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal — different endpoints, different code paths. §1 expanded to 5 pillars (pillar 5 = enforce_budget extension). §3 context.lua row updated to reference the enforce_budget change + per-turn _tokens cache. §9 risk row added: accurate counts mean the default token_budget=4096 is finally ENFORCED — sessions that spilled silently under char/4 may now evict earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:21:24 +00:00
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00
marfrit	1f34b6dce8	config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit `3bad07b` (formulate); not re-applied here. config.lua: - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block with parity to the Phase 1-6 example blocks. - Notes warn flags are independent (R4) and per-turn usage flows to session/*.jsonl for after-the-fact analysis. docs/PHASE7.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits inline for traceability: `7364963` broker: usage capture + opts widening `7b4a9be` context: accumulator helpers `8adebd5` repl: _record_usage + opts.category at 5 sites `b30212a` safety + repl: opts.category for Norris + probe `0d6ff93` repl: :cost meta surface this config example + status bump Phase 7 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:55 +00:00
marfrit	0d6ff93134	repl: :cost meta surface (Phase 7 commit #5 ) User-facing reporter of the per-session accumulator. Three shapes: :cost one-line summary (calls / tokens / cost) :cost detail per-model + per-category breakdown :cost reset zero the meter; clears warn flags All read-only against ctx.usage_totals; no broker calls. R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile cost==0 heuristic. Summary line classifies: cloud only -> "cost=$X.XXXXXX" cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens but no cost field)" local only -> "cost=$X.XXXXXX (local only; no cost field)" R7 — :cost detail rows sort by (cost desc, model asc, category asc). Three-level key for deterministic output across equal-cost rows (table.sort is unstable; identical costs would otherwise reorder). R10 — all dollar values use $%.6f formatting. Sub-cent precision is critical: a Haiku call can cost $0.000028; $%.4f would round it to $0.0000 — indistinguishable from local $0. Column width widened to %-26s to fit fully-qualified cloud model names (e.g. "anthropic/claude-haiku-4.5" = 25 chars). E2E verified against live cloud + local broker: :cost (empty session) -> "0 calls, $0.000000" ...after mixed-mode session... :cost -> "5 calls, prompt=472 / completion=26 tokens, cost=$0.000377 (cloud only; local: tokens but no cost field)" :cost detail -> 4 rows: main cloud $0.000219, probe cloud $0.000128, delegate cloud $0.000030, main local $0.000000 (local). Sort by cost desc within model. :cost reset -> "cost meter reset"; subsequent :cost shows zeros. All 5 categories appeared in the same session: main (twice — cloud + local), delegate, probe (x2 from :safety check). Warn-threshold firing already verified in commit #3 + #4. HELP gains 3 :cost lines. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:24 +00:00
marfrit	b30212af0f	safety + repl: opts.category for Norris + probe (Phase 7 commit #4 ) Closes the last two broker call sites that flow through safety.lua. Together with commits #1-#3, all 7 broker call sites in aish now attribute usage to the cost accumulator under the right category. Changes: safety.lua: - llm_probe (the YES/NO destructive checker) — broker.chat call gains opts.category = "probe". Captures (text, usage) via (reply, second) and, when opts.on_usage is provided AND the call succeeded, routes second through opts.on_usage(model, category, payload). N4 signature chain: opts already flowed through llm_second_opinion -> M.is_destructive from #52's work; opts.on_usage rides along naturally with no further signature change. - M.norris_step (Norris main broker round-trip): * opts to broker.chat_stream gains category = "norris" * probe_opts (passed to is_destructive inside the loop) gains on_usage = helpers.on_usage so the LLM probe's cost lands under "probe" too * on_delta wrapper adds elseif kind == "usage" branch that calls helpers.on_usage(payload.model, payload.category, payload). Coexists cleanly with the existing text (rehydrator) and tool_call branches. repl.lua: - Norris helpers table gains on_usage = _record_usage. The R5 central chokepoint (commit #3) does the warn-threshold check AND ctx:add_usage atomically. - :safety check meta's probe_opts always carries on_usage now (independently of whether secrets_session is set). secrets-aware scrub_msgs/rehydrate added conditionally as before. E2E verified against live broker (safety.llm_model = "cloud"): - :safety check ls -la /tmp -> 2 cloud probe calls - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100" - probe category visible in accumulator (would appear in :cost detail once commit #5 ships the meta). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:01:21 +00:00
marfrit	8adebd52cc	repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3 ) Wires broker.lua's on_delta("usage", payload) and broker.chat's (text, usage) return to the ctx accumulator via a single chokepoint. Changes: - Forward decl `local _record_usage` near _bg_spawn — same pattern; the summarize-on-evict closure in make_summarize_fn (built at line 299) needs lexical access to _record_usage (assigned at line 695), so forward-declare and assign-without-`local`. - _record_usage(model, category, usage) — R5 central chokepoint: routes to ctx:add_usage, then checks the per-threshold warn state. R4: cost_warn_state has two independent flags (dollars and tokens) so first-to-fire doesn't suppress the other. R10: warn message uses $%.6f for sub-cent precision. - call_broker wrapper: wrapped on_delta now branches on kind == "usage" -> _record_usage(payload.model, payload.category, payload). R2: keys by payload.model (set inside broker.lua from model_cfg.model). When fallback fires, broker is called with fb_cfg, so payload.model IS the fallback's name automatically — wrapper doesn't track primary-vs-fallback itself. - 5 caller sites wired with opts.category: ask_ai call_broker -> category="main" summarize-on-evict -> category="summarize" DELEGATE: handler -> category="delegate" :memory summarize -> category="memory_summarize" :delegate meta -> category="delegate" - All 4 broker.chat call sites switched from local reply, err = broker.chat(...) to local reply, second = broker.chat(...) branching on reply nil-ness to interpret second (err on failure, usage on success). Captured usage routes through _record_usage. E2E verified against live cloud broker: - cloud prompt -> reply "Hi! 👋" - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010" - R10 sub-cent precision visible in both numbers. Norris + safety paths still untouched — commit #4 wires those. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:00:06 +00:00
marfrit	7b4a9becc2	context: cost/usage accumulator (Phase 7 commit #2 ) Adds the per-conversation accumulator that broker.lua's on_delta("usage", ...) payload feeds into. No callers yet — commit #3 wires the broker callback to ctx:add_usage in repl.lua, commit #4 in safety.lua. Changes: - Context.new: new fields `usage_totals = {}` and `cost_warn_state = { dollars = false, tokens = false }`. R4: two independent flags so warn_at_dollars firing doesn't suppress warn_at_tokens (or vice versa). - Context:add_usage(model_name, category, usage): Increments usage_totals[model_name][category] slot. R6: when usage.cost is nil (local llama.cpp per B3), sets a sticky `is_local = true` flag on the slot AND does NOT add to cost (preserves the local-vs-cloud-zero distinction for :cost detail annotation). When usage.cost is a number (cloud), accumulates. - Context:total_cost() / total_tokens() — pure-Lua summation across all slots; total_tokens returns (prompt, completion). - Context:reset_usage() — explicit :cost reset path; zeros usage_totals AND clears both flags atomically. - Context:reset() — R8 parity: does NOT clear usage_totals OR cost_warn_state. Matches the Phase 4 memory_items / Phase 6 project rule ("ambient context survives a user-driven conversation reset"). Smoke verified (20/20 unit cases): - Empty zeros; cloud cost accumulation; local nil-cost preserves is_local=true sticky; calls counter; cost summation across multiple cloud calls; is_local sticky after a later nil-cost call on a cloud slot; separate slots per (model, category); :reset preserves; :reset_usage zeros both totals and flags. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:56 +00:00
marfrit	7364963b00	broker: usage capture + opts widening (Phase 7 commit #1 ) Foundation for Phase 7. broker.chat_stream now emits a third on_delta kind ("usage") after the stream completes successfully; broker.chat returns (text, usage). Backward-compatible — existing callers that ignore the new kind / second value continue working via Lua's drop-extra-returns semantics. Changes: - build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`. opts.tools / opts.max_tokens / opts.include_usage / opts.category all live inside opts now. Both internal call sites updated. - opts.include_usage defaults to true for streaming requests; sets `stream_options: { include_usage: true }` in the request body. B1: required for local llama.cpp to emit usage; cloud honors as a no-op (emits anyway). - on_event captures `doc.usage` into a closure-local `final_usage`. N1: the check is INDEPENDENT of the choice/delta branches — local emits usage on choices=[] chunks (choice nil) while cloud emits with non-empty choices + finish_reason. Both shapes funnel here. - After curl.post_sse returns successfully (NOT on transport/api errors), if final_usage is set, emit on_delta("usage", {prompt_tokens, completion_tokens, total_tokens, cost, model, category}). cost is nil for local (R6 preserves the nil vs 0 distinction the accumulator needs). model is model_cfg.model — caller-stable per B4 + R2 so call_broker's fallback retry attributes usage to the fallback's model name without wrapper-side tracking. - M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage" alongside "text"; M.chat returns (text, usage). Without this fix 4 of 5 non-streaming categories (summarize / delegate / memory_summarize / probe) would silently report zero usage. Smoke verified against live hossenfelder:8082: - CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/... - LOCAL chat -> (text, usage); cost=NIL (correct per R6), model=qwen-coder-7b-snappy-8k - CLOUD stream -> on_delta("usage", {...}) with category="test" echoed; model name caller-stable. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:14 +00:00
marfrit	d4c20f09df	docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs Sonnet-reviewed (per the reviews-use-sonnet feedback memory). BLOCKERs (RESOLVED in-place): R1. M.chat would silently return (text, nil) for ALL non-streaming callers — 4 of 5 categories (summarize/delegate/memory_summarize/ probe) flow through broker.chat, NOT chat_stream. §4 now shows the explicit M.chat update that captures kind=="usage" alongside "text" and returns (text, usage). R2. call_broker fallback retry would credit usage to the wrong model name. Fix: broker emits payload.model = model_cfg.model (which IS the fallback's name when called with fb_cfg — chat_stream's upvar). Wrapper keys by payload.model, NOT outer model_name. §4 + §13 commit 3 reflect. R3. build_request has TWO internal callers inside broker.lua itself, not just the public surface. Plan §13 commit 1 risk row now spells this out explicitly so the implementer doesn't read "every caller already passes opts" as "external-only". CONCERNs (FOLDED): R4. Single cost_warn_fired flag covers two thresholds — first-to-fire suppresses the other. Split into ctx.cost_warn_state = { dollars = false, tokens = false }; :cost reset clears both. §7 + §13. R5. Warn-check centralization — single _record_usage helper in repl.lua wraps ctx:add_usage AND does threshold check. safety.lua routes via helpers.on_usage / opts.on_usage callbacks. context.lua stays decoupled from renderer. R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains `is_local = true` (sticky) when ANY recorded usage had cost==nil. `:cost detail` annotation comes from is_local flag, not a fragile cost==0 heuristic. R7. :cost detail sort needs 3-level deterministic key: (cost desc, model asc, category asc) — table.sort is unstable. R8. call_broker fallback passes opts.include_usage unchanged. Documented as known assumption (B1 confirms both backends accept; future-broken fallback can pass include_usage=false). R9. :resume does NOT restore historical usage_totals. Per-turn usage IS in session JSONL for scripting; cross-session aggregation is Q-C2 deferred. Documented in §8. R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000). Widened to $%.6f in §6 + §7 warn message format. NITs (APPLIED): N1. §4 pseudocode comment notes `if doc.usage` branch is independent of choice branch (handles both B2 emission shapes). N2. §2 stale "B7" reference corrected to B3. N3. §13 commit 3 row gains explicit dependency note on commit 1's R1. N4. §13 commit 4 spells out llm_probe -> llm_second_opinion -> M.is_destructive signature chain widening. N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree (`3bad07b`); commit 6 must NOT re-apply. PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for implementation phase pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:55:59 +00:00
marfrit	0f14dc1727	docs/PHASE7: plan — §13 commit roadmap Status: Analyze -> Plan. Q-C4 was the last open question pending baseline; now resolved per B1 (stream_options accepted by both backends; required for local). §13 Implementation Plan added — 6 commits, bottom-up: 1. broker.lua: usage extraction from final SSE chunk; build_request signature widening to (model_cfg, msgs, stream, opts); on_delta ("usage", payload); chat returns (text, usage); opts.category passthrough. 2. context.lua: usage_totals + cost_warn_fired fields; add_usage / total_cost / total_tokens helpers; :reset preserves both. 3. repl.lua: wire opts.category at 5 non-Norris call sites (main, delegate x2, summarize, memory_summarize); on_delta("usage") branch routes to ctx:add_usage. 4. safety.lua: wire opts.category for Norris main broker + is_ destructive LLM probe; helpers.on_usage callback convention (no new module dep — matches #52's scrub_msgs pattern). 5. repl.lua: :cost meta surface + warn-threshold check + HELP. 6. config.lua: commented cost example block + PHASE7.md status bump to Implement. Per-commit risk index covers signature-change blast radius, missed call-site lint, and warn-flag one-shot semantics. Lua's multi- return semantics keep broker.chat backwards-compat automatic. Two items left open at plan, resolve at implement: - is_destructive opts.on_usage vs cfg.helpers threading - per-turn verbose mode (deferred; v1 = :cost on demand only) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:50:39 +00:00
marfrit	2244a3f1ee	docs/PHASE7-baseline: live broker probes for usage shape Real probes against hossenfelder.fritz.box:8082 against both backends. Five findings, all align with the formulate/analyze design — no structural changes. B1. `stream_options.include_usage = true` is safely accepted by both backends. REQUIRED for local llama.cpp to emit usage; no-op for cloud (which emits anyway). Default-true is correct. B2. Two emission patterns observed: - Cloud (Bedrock): usage rides the FINAL delta chunk with non-empty `choices` carrying finish_reason. - Local: usage rides a SEPARATE chunk with `choices: []` preceding `[DONE]`. Both shapes are handled by the same `if doc.usage then ...` check; the existing on_event choices-branch short-circuits safely when choices is empty. B3. `cost` field is dollar-denominated (number) and cloud-only. Local returns `timings` instead (perf, not cost). Accumulator captures `usage.cost` as-is; nil treated as 0. :cost detail annotates local lines so $0 isn't misread. B4. `doc.model` in the usage event reflects the upstream-API-version (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by caller-intended `model_cfg.model`, NOT `doc.model`, for stable cross-call comparison. B5. Usage event is always the LAST data event before `[DONE]`. Emission of `on_delta("usage", ...)` happens after curl.post_sse returns — one call per stream, after all text + tool_calls. Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage` to all backends correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:53 +00:00
marfrit	f0bccdec48	docs/PHASE7: analyze — probe broker surface + resolve Qs in-place Status: Formulate -> Analyze (tree at `3bad07b` probed). 11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline): A1. broker.chat_stream surface clean — usage capture via closure-local + on_delta("usage") emission after curl.post_sse returns. A2. 7 caller sites for opts.category threading (probe / norris / summarize / main / delegate x2 / memory_summarize). A3. build_request signature widens to (model_cfg, msgs, stream, opts) to absorb tools / max_tokens / include_usage / stream_options without further positional growth. A4. Q-C3 RESOLVED: free-form categories (caller decides); matches Phase 6 helpers/skills convention. A5. Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call delay). A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only :cost reset clears. A7. Norris call-graph rewires (commit `955bd82`) — secrets streaming rehydrator wraps only "text" kind; new "usage" kind passes through unchanged. No new entanglement. A8. ctx.usage_totals survives :reset (R8 parity with memory_items, project). A9. Session JSONL inherits the new field automatically (dkjson opaque encoding). A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage. Real probe required for local model — baseline action. A11. Q-C4 deferred to baseline (real broker probe). §2 build_request row updated to mention the A3 refactor. §11 Open Qs table now shows all 6 with resolutions; only Q-C4 remains as a baseline-time probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:49:03 +00:00
marfrit	3bad07b2da	docs/PHASE7: formulate — cost / usage observability Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7 row (substrate amendment per CLAUDE.md §3, lands in the same commit). Four pillars: 1. Usage capture in broker.chat_stream — extract `usage` from the final SSE chunk (OpenAI streaming spec with `stream_options: {include_usage: true}`). Surface via new on_delta("usage", payload) kind. broker.chat returns (text, usage) — backward- compat: existing callers ignore the second value. 2. Per-session accumulator on ctx — ctx.usage_totals[model][category] tables (categories: main / delegate / summarize / memory_summarize / probe / norris, tagged at the call site via opts.category). :reset preserves usage_totals (R8 parity with memory_items / project). Session JSONL gains an optional `usage` field on assistant turns for after-the-fact analysis. 3. :cost meta surface — :cost (summary), :cost detail (per-model + per-category breakdown), :cost reset (zero the meter). Pure-Lua read of ctx.usage_totals; no broker calls. 4. Optional warn thresholds — cfg.cost.warn_at_dollars / warn_at_tokens emit a one-shot status when crossed. Default off; useful with cloud presets configured. Doc covers scope + done-when criteria, tech decisions table, module changes, per-pillar deep dive with code sketches, UX surface, out of scope, risks, 6 open questions to resolve in analyze. Open at formulate: Q-C1 — provider-without-usage handling (local llama.cpp probably) Q-C2 — cross-session persistence (defer to phase 8) Q-C3 — categories closed-set vs free-form Q-C4 — does hossenfelder forward stream_options to all backends? Q-C5 — warn fires on the call that crosses, or the next one? Q-C6 — :reset clears cost_warn_fired too, or only :cost reset? Scope confirmed via AskUserQuestion: cost/usage observability (chosen over project-local config overlay and session search/tag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:47:58 +00:00
marfrit	955bd82efb	safety + repl: wire secrets into safety.lua (closes #52 ) Closes the last #13 gap — Norris broker call + is_destructive LLM second-opinion probe were the two egress points NOT covered by the scrub-at-egress design in commit `d852aca`. Approach: option (b) per #52's fix sketch — callback-via-helpers/opts. safety.lua does NOT gain a require("secrets") dependency (acceptance criteria 3); integration is purely through the convention the rest of the helpers table already uses. safety.lua changes: - llm_probe gains an opts table. When opts.scrub_msgs is set, the {system, user(cmd)} message pair is scrubbed before broker.chat. When opts.rehydrate is set, the YES/NO reply is rehydrated before parsing (defensive — the verdict shouldn't carry placeholders but rehydration is a safe no-op if it doesn't). - llm_second_opinion threads opts through to llm_probe. - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is backwards-compatible (no scrub, original behavior). - M.norris_step: * outbound broker.chat_stream message scrubbed via helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided. * on_delta wrapped with helpers.streaming_rehydrator():push / :flush so the user sees rehydrated text AND text_parts accumulates rehydrated chunks (parity with ask_ai in repl.lua). * both M.is_destructive call sites (tool_call probe + CMD: probe) now pass probe_opts = {scrub_msgs, rehydrate} when the helpers carry them. repl.lua changes: - Norris helpers table gains scrub_msgs / rehydrate / streaming_rehydrator closures, all nil-safe (return identity / nil when secrets_session is nil). - :safety check meta passes probe_opts to is_destructive when secrets_session is configured. Without secrets, behavior unchanged. Unit-test verified end-to-end: - Stubbed broker.chat captures the messages it receives. - Without opts: probe SEES `ghp_realsecretvalue_...` (control). - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:40:30 +00:00
marfrit	ac58b19da2	config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: `c4fc7fd` context: compose_project plumbing `d1dce83` _scan_project_tree + :tree + auto_tree hook `4d5f93a` :diff + _git_clean_cmd (B1 helper) `0d63f01` expand_mentions @<r1>..<r2> tiered resolution `11d0e59` tree-sitter highlighter (renderer fence filter + highlighted dispatch + :highlight meta) this config example + status bump Phase 6 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests against the live broker on each pillar plus filing of issues for any defects, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:58 +00:00
marfrit	11d0e599cd	repl + renderer: tree-sitter highlighter (Phase 6 commit #5 ) The largest Phase 6 commit — fence-aware stream filter in renderer.lua + external tree-sitter dispatch + :highlight meta in repl.lua. renderer.lua — fence-aware filter wrapping assistant_delta: M.set_highlight(enabled, detected, highlight_fn) Called by repl.lua at startup AND on every :highlight toggle. Stores state in module-locals (off by default). State machine inside _hl_push: outside: pass chunks through; HOLD trailing partial-fence chars (per R1 — local llama.cpp splits ```python as `'``'` then `'`python\n'`, so naive pass-through drops the leading "``" and never recovers). inside: buffer cumulatively until "\n```" appears; emit highlight_fn(body, lang) then the closing fence verbatim. Recursive call handles "rest" after the closing fence. N1: fences only open at start-of-stream OR after a newline (`^```` or `\n```` only). Inline backticks in prose ("use ``` to mark code") do not open a fence. R3 (PTY raw-mode toggle per highlight call): no change here — every executor.exec call already toggles raw-mode (existing behavior since Phase 1). The risk is theoretical; smoke-test interactively after install if multi-fence renders show flicker. assistant_flush handles end-of-stream gracefully: drains any held partial-fence tail OR an unterminated inside-fence buffer. repl.lua — _detect_treesitter + highlighted + :highlight meta: _detect_treesitter() one-shot popen probe of `tree-sitter --version`. Run once at startup; cached as highlight_detected. highlighted(body, lang_tag) R2-placed in repl.lua (has _shq + executor access). Translates the fence tag (`py`, `python`, `lua`, etc.) to a canonical lang via LANG_TAG, picks the canonical extension via LANG_EXTENSION, writes body to a tmpfile with that extension, runs `tree-sitter highlight <tmpfile>` via executor.exec, returns the output. On ANY failure (CLI absent, non-zero exit, empty output), returns `body` unchanged — silent pass-through. R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help` on noether; confirmed: - NO `--lang` flag exists (formulate-time assumption wrong) - takes a PATH; language inferred from file extension - alternative `--scope source.X` exists but also unreliable without configured grammars Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]` and pass the path. Matches the documented upstream contract. B4-followup: even with the CLI installed, highlighting requires `~/.config/tree-sitter/config.json` parser-directories with cloned + built `tree-sitter-<lang>` grammars. Without parsers, every call exits non-zero and we silently pass through. The :highlight install hint surfaces all three install steps so the user knows what's actually needed. :highlight [on\|off\|status] meta: no arg -> flip on/off -> set explicit status -> report toggle + CLI detection state When toggled on AND CLI absent: emit a 4-line install hint (CLI install, init-config, grammar clone reminder). When toggled on AND CLI present: emit a 1-line note that parser-directories must be set up for actual highlighting. HELP gains :highlight entry. Tested: 10/10 unit cases on the renderer state machine, including: - plain prose passthrough - single-chunk fence - B2 split fence ("``" + "`python\n" + "x=42" + "\n```") - N1 SOL anchor (mid-line ``` does not open) - trailing \n properly emitted across chunks - SOL-only fence open - prose after closing fence preserved - two fences in one stream - highlight off = passthrough (callback never fires) E2E :highlight meta verified: :highlight status -> off / detected :highlight on -> toggles + emits parser-dir reminder :highlight status -> on / detected :highlight off -> off Regression: test_safety 87/87, test_router_model 31/31, repl loads. Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config example block + status -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:04 +00:00
marfrit	0d63f01601	repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4 ) Per A6 (tiered resolution): @<token> tries file lookup first; if the file doesn't exist AND the token contains "..", retry as a git ref-range and substitute with a fenced `diff` block. Preserves the existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels the comma, resolves the ref, restores the comma after the closing fence). Resolution order for @<token>: 1. io.open(token, "rb") -- file lookup, with trailing-punct peel 2. if (1) fails and token contains "..": git --no-pager -c color.ui=never diff <r1>..<r2> on exit 0 + non-empty body: substitute as ```diff fenced block 3. else: leave literal `@token` + emit "[aish] @X: not found" status Examples: @README.md -> file (path branch) @../sibling.txt -> file (path branch; `..` only triggers retry when path lookup FAILS, so existing paths with `..` segments are unaffected) @HEAD~1..HEAD -> diff (path fails, ref succeeds) @origin/main..feature -> diff (path fails — no such literal file; ref succeeds; `/` in ref is fine because we don't use the path's `/`-absence as a discriminator) @nonsense..gibberish -> literal preserved (both fail) Required restructuring: - _shq and _git_clean_cmd lifted from M.run closure scope to module scope (above expand_mentions). Single source of truth for the B1 prefix shared with commit #3's :diff. The in-M.run duplicates are removed. - expand_mentions now references `executor` (already required at module scope on line 7) for the diff retry. Status messages updated: - File expansion: "@<path> expanded (N bytes, truncated)" (existing) - Diff expansion: "@<path> expanded (N bytes, diff)" (new) Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14): ref-range expansion shape, body contains `diff --git`, trailing prose preserved, @../path stays as file (not diff), neither-path- nor-ref preserves literal, trailing-comma peel composes with ref retry. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:20:25 +00:00
marfrit	4d5f93aaa5	repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3 ) User-driven git diff injection. The model sees the diff on the next ask_ai turn through the existing exec_output channel. Changes: - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree. B1: every git invocation that flows into context MUST use `--no-pager -c color.ui=never`. Forkpty makes git think stdout is a TTY, enabling both color and the pager's keypad/line-clear escapes — these would pollute the captured context block. The helper is the single chokepoint; commit #4's @<r1>..<r2> retry will reuse it. - :diff [<args>] meta: - Reads cwd at meta invocation (R6: differs from :tree's scan-time cwd capture; documented in §5). - Runs `_git_clean_cmd("diff " .. args)` via executor.exec. - Empty output -> "(no diff): <label>" status, no context append. - Non-zero exit -> "diff failed (exit N): <label>" status, no context append. git's stderr already streamed to the user via executor.exec's live multiplex, so the failure reason is visible. - Success -> appends "[diff <label>]\n<output>" via ctx:append_exec_output. Label is "(working tree)" for empty args, else verbatim args. - Status confirms injection size: "diff injected: <label> (N bytes)". - HELP gains :diff line with three example arg shapes; N3-resolved (no `staged` alias — the meta is thin pass-through to git's grammar). Smoke verified across four scenarios in an ephemeral test repo: - Working-tree dirty -> 110-byte diff injected, no ANSI escapes - --cached -> 118-byte staged diff injected, clean - garbage..nonexistent -> exit 128, status + skip - Clean working tree -> "(no diff)", status + skip Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:17:18 +00:00
marfrit	d1dce832da	repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2 ) First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '/.' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>\|refresh\|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:14:36 +00:00
marfrit	c4fc7fde01	context: [project] block plumbing (Phase 6 commit #1 ) Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:08:54 +00:00
marfrit	261b230be8	docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs Independent agent review of PHASE6 (manifest + baseline + plan at `4407029`). Status header: Plan -> Plan + review fold-in. BLOCKERs (RESOLVED in-place): R1. §4 fence detector's `outside`-state dropped the leading `'``'` chunk of a split fence — contradicted B2's local-model split-fence requirement (4-char median chunk size). Algorithm rewritten: outside-state now holds a tail (up to 10 chars) when the chunk's suffix could be a fence prefix; flushes on next push. Same accumulator pattern as the secrets streaming rehydrator. R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives in repl.lua (where _shq and executor are accessible); renderer.lua exposes set_highlight(enabled, detected, highlight_fn) and calls back. Keeps renderer.lua free of the executor require. CONCERNs (FOLDED): R3. PTY raw-mode toggle on every code-block render — smoke-test for cursor flicker / SIGWINCH races before locking in. Risk row 5. R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream CLI canonically takes a path with extension. Implement-time check required; fallback path documented (extension-based tmpfile + path arg). Added to risk row 5 + open-at-plan. R5. :tree off semantics clarified — one-shot clear of ctx.project + ctx._project_opts; no "disabled" flag. R6. cwd-coupling difference between :diff (call-time) and :tree (scan-time) now documented in §5. R7. :tree refresh opts caching specified — caches ctx._project_opts; `:tree refresh` reuses last explicit opts. R8. :reset preserves ctx.project (parity with memory_items per Phase 4). §12 commit 1 smoke updated. R9. Status-bump duplication between §12 commits 5e and 6 resolved — commit 6 owns the bump. NITs (APPLIED): N1. §4 algorithm pseudocode now includes SOL/post-newline anchor (mid-line backticks in prose don't open a fence). N2. _detect_treesitter() gained a comment explaining the popen pattern doesn't gate on exit code (B3). N3. :diff staged shorthand dropped — meta is a thin pass-through to git's own grammar. N4. _scan_project_tree switched from `cd && git ...` to `git -C <dir> ...` — no subshell, more idiomatic. N5. Open-at-plan dir-arg bullet dropped (already decided in §6); replaced with R3 + R4 implement-time verification items. N6. §11 wording on #52 left as-is (cosmetic only). PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for implementation phase 6 of the inner loop pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:06:19 +00:00
marfrit	4407029296	docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap Status header: Analyze -> Plan. Baseline findings folded into the design sections: §1 (highlighter pillar) gains B4: tree-sitter absent on every probed host; :highlight on emits install-hint when missing. §4 (highlighter sketch) revised per B3: io.popen():close() doesn't expose exit codes in LuaJIT. Route via executor.exec("cat tmp \| tree-sitter ...") which uses pty.spawn+waitpid and returns code reliably. Tmpfile design retained (avoids ARGMAX + shell-escape). §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git invocation must use `--no-pager -c color.ui=never` to suppress the color/keypad/line-clear escapes forkpty triggers. Factored recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff and the @-mention diff retry. New §12 Implementation Plan — 6 commits, bottom-up: 1. context.lua: ctx.project + compose_project + composition order 2. repl.lua: _scan_project_tree helper + :tree meta 3. repl.lua: :diff meta + _git_clean_cmd helper (B1) 4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6) 5. renderer.lua + repl.lua: tree-sitter detect + fence filter + :highlight meta (B3-revised tmpfile dispatch) 6. config.lua project example + status -> Implement Per-commit risk index + smoke criteria. Highlighter (commit 5) is the largest experimental surface — placed last so the rest of Phase 6 ships even if highlighter slips. Order is independent enough that swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each commit individually green. Things deliberately not split: _shq reuse, lang map duplication for v1, streaming-rehydration order (rehydrate -> highlight -> emit inherits naturally from existing chunk pipeline). Two items open at plan time, resolve at implement: _scan_project_tree dir-arg vs hardcoded getcwd; :highlight status probing tree-sitter --print-langs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:01:40 +00:00
marfrit	9f50206ca6	docs/PHASE6-baseline: substrate probes ahead of implementation Six findings from probing the world before tree-sitter / diff / project tree implementation lands: B1. `git` subcommands through executor.exec emit ANSI color + DEC keypad/line-clear escapes by default (forkpty enables interactive mode). `:diff` impl MUST use `git --no-pager --color=never <args>`. Same flags apply to any future git verbs. B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks (median 4 chars, max 13) AND splits code fences across boundaries (`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers big chunks (median 26 chars), fences intact. The §4 fence-aware filter accumulator design covers both — confirmed necessary by local-model behavior. B3. LuaJIT io.popen():close() does NOT return exit codes — Lua 5.1 contract, not 5.2+. Breaks the A4 highlighter resolution. Revised: route via `executor.exec("cat tmp \| tree-sitter ...")` which uses pty.spawn + waitpid and returns (out, code) reliably. B4. tree-sitter CLI absent on both probed hosts (noether, higgs). Highlighter is opt-in by design; absent-CLI path should emit a clear install hint, not silently no-op. B5. Project-tree envelope: aish 32 files / 449 chars; similar local repos 15-25 files; scan time ~1-5ms. The 4096-char default cap accommodates ~290 typical paths. Large repos handled via tree_depth or cap tuning per existing §9 risk row. B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for the B3-revised tmpfile-roundtrip pattern. No structural changes to formulate/analyze. B1, B3, B4 will fold into PHASE6.md §4 / §5 / §1 during plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:58:56 +00:00
marfrit	ad52fe4538	docs/PHASE6: analyze — substrate probes + Q resolutions in-place Analyze pass against tree at `f596743`. All 6 formulate-time questions resolved without structural changes; pillar shapes intact. A1. renderer.lua surface clean — assistant_delta/flush accumulate via stream_buf; fence-aware filter slots in between chunk receipt and emit without touching anything else. A2. executor.exec via pty.spawn already handles git diff / find; cwd-aware (inherits from libc.chdir). No new IO model. A3. context composition order locked: base + [background] + [earlier summary] + NORRIS. [project] inserts between [background] and [earlier summary]; Norris-suppression guard inherited. A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3 (io.popen("w") + redirect stdout to tmp file; io.open reads back). Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per code block. A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only; matches :history / :sessions / :safety check. A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first, then ref-range retry when path fails AND token contains "..". @origin/main..feature works naturally; @../sibling.txt unaffected. A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1. @-mention echo via readline is a different code path; deferred to v2 (added to §8 out-of-scope). A8. Q-T1 RESOLVED: project tree captured at scan time, not auto- refreshed on cd. v1 verb is :tree refresh; cd-intercept auto- refresh deferred to v2. A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard` in repos; find fallback outside. Custom globs deferred to v2. A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD, peels comma cleanly and the diff retry catches the cleaned token. A11. Auto-injection ordering: memory load → tree scan → first ask_ai. Composition reads memory facts before file tree. A12. [project] Norris-suppressed (parity with R-C1/R-C4). §3 module-changes table: context.lua row updated (project string + compose_project + ordering note + Norris suppression). §4 highlighter code sample replaced with the tmpfile-roundtrip resolved form. §5 @-mention section rewritten as tiered-resolution with worked examples. §8 out-of-scope gained three v2-polish items (echo highlight, cd- intercept auto-refresh, custom globs) so they're tracked. §10 Open Questions table now shows all 6 Qs with their resolutions inline. §9 Risks row for @-mention collision updated to point at A6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:53:58 +00:00
marfrit	f596743834	docs/PHASE6: formulate — tree-sitter highlight + diff + project tree Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6: 1. Tree-sitter syntax highlighting hooks External `tree-sitter` CLI when present, no-op otherwise. Honors PHASE0 §3 (no compiled extensions). Toggleable at runtime; off by default so existing UX is unchanged. 2. Diff-aware code injection :diff [args] meta + @<ref1>..<ref2> @-mention extension. Shells out to `git diff`; output flows through the existing exec-output context channel. 3. Project-level file-tree context :tree meta + optional cfg.project.auto_tree startup inject. git ls-files in a repo, find fallback otherwise. Composed into the system prompt as a new [project] block between [background] and [earlier summary]. Suppressed under Norris (R-C1 / R-C4 parity). Module changes: renderer.lua (fence-aware highlight filter), context.lua (compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions extension). No new module files in v1. Doc covers: scope + done-when criteria, tech decisions table, module changes table, per-pillar deep dive with example code, UX surface summary, out-of-scope list, risks, and 6 open questions to resolve in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree). Scope confirmed via AskUserQuestion: all three subsurfaces in scope; tree-sitter approach is external CLI w/ no-op fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:47:00 +00:00
marfrit	d852acadc2	repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args Plumbs the secrets.lua module (commit `e4b818b`) into the conversation pipeline. Hook points: ask_ai — scrub_messages(ctx:to_messages(), mode) before call_broker; rehydrate streamed deltas via streaming_rehydrator so the user sees real values while text_parts accumulates rehydrated chunks (final_resp is plain — CMD: / DELEGATE: extractors see plain values) MCP dispatch — dispatch_tool_call rehydrates the args table before sess:call_tool so the trusted MCP server receives real values (the model emitted placeholders because it saw a scrubbed context) DELEGATE: & :delegate — scrub sub_msgs before broker.chat; rehydrate sub_text before appending to context, so future turns see real values restored Phase 5 summarize-on-evict — scrub sum_msgs before broker.chat; rehydrate the reply that becomes ctx.summary :memory summarize — same scrub + rehydrate pair Mode resolution per call: model_cfg.redact → config.secrets.default → "vault+autodetect" if vault loaded, else "off". ctx storage convention: PLAIN values throughout. The scrub happens at the egress (broker call) per the active redact mode; ctx.turns never holds placeholders for content the user typed or executor produced. The model's own emissions (assistant tool_call arguments) may carry placeholders because the model saw the scrubbed context — rehydrated at MCP dispatch and otherwise harmless on re-serialization (idempotent re-scrubbing). New meta: :secrets [status] vault entries, placeholders allocated this session, active broker mode. Never prints actual values (vault file is itself a secret per gotcha 7). :secrets check <text> dry-run scrub against the active broker's mode — shows the output transformation. Documented in config.lua with a commented-out block + per-broker redact field example. Deferred to a follow-up issue (clearly scoped): - safety.lua broker call sites (Norris main loop, is_destructive LLM second-opinion probe) — same wiring pattern, but they don't currently see secrets_session; needs threading through helpers. - @-mention file content is appended PLAIN to ctx and scrubbed at egress alongside the rest of the user turn (covered by the ask_ai scrub). - exec output streamed live to terminal is pre-scrub (user sees real values in their own shell — by design); the captured-for- context copy is scrubbed at egress alongside the rest. This is the "full scope" implementation chosen via AskUserQuestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:38:23 +00:00
marfrit	e4b818b0e9	secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1) Standalone module — no wiring yet. Lands the substrate for issue #13: secrets.load(path) — vault file loader; refuses non-0600 secrets.make_session(vault) — per-conversation scrub/rehydrate state session:scrub(text, mode) — substitute literals (+ autodetect) session:rehydrate(text) — restore placeholders secrets.streaming_rehydrator — chunk-boundary-tolerant streaming wrapper Mode semantics (chosen per call by the caller): "off" — identity, no mapping "vault" — vault literals only, placeholders, rehydratable "vault+autodetect" — + heuristic regexes, placeholders, rehydratable "stealth" — + heuristic regexes, opaque decoys, one-way Placeholders are stable across the session: the same literal always maps to the same $AISH_SECRET_NNN slot, so re-scrubbing the same context is idempotent and the model sees a consistent vocabulary. AUTODETECT_PATTERNS (ordered; longer prefixes first): sk-or-v<N>-... OpenRouter ghp_/gho_/ghs_ GitHub PATs AKIA<16> AWS access keys eyJ...x.y.z JWTs sk-... OpenAI (generic; matched after openrouter) -----BEGIN ... PRIVATE KEY----- SSH/GPG key headers Streaming rehydrator: tolerates a placeholder split across SSE chunks ($AISH_SE then CRET_001). It holds back the trailing partial-match in a buffer, emits the rest, and resolves on the next push or flush. Verified with 20 unit cases (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder $-prose pass-through). Vault file mode enforcement: 0600 only — matches ssh's behavior for ~/.ssh/id_rsa. Loud failure (status + skip) if mode is wider. Next commit (issue #13 follow-up): wire into broker / tool dispatch / display, add per-broker `redact` policy, :secrets meta, config example block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:36:39 +00:00
marfrit	cdf4e86679	repl: sub-broker delegation via DELEGATE: marker (closes #6 ) Cost and context-window control: a "heavy" preset's model can offload work to a cheaper preset without spending its own tokens on the result. Example: deep model is mid-conversation and asks fast to summarize a 20k-line build log; the summary comes back as exec-output for the next turn, deep stays small. Marker syntax: DELEGATE: <preset> "<prompt>" (Single or double quotes; one DELEGATE per line; lines without the quoted shape are dropped — let the user write about delegation in prose without accidental dispatch.) Dispatch flow (mirrors CMD: / CMD&: extraction): 1. ask_ai's stream completes 2. extract_delegate_lines walks the final response 3. For each {preset, prompt}: broker.chat(config.models[preset], ...) synchronously; result is appended via ctx:append_exec_output as "[delegate <preset>]: <result>" 4. The model sees the delegate result on its next turn Implementation choice — marker over tool: option 1 from the issue ("inline delegate marker") works with any model regardless of tool_calls support. Option 2 (aish_delegate as a tool dispatched in the existing Phase 2 sub-loop) is the better UX for capable models since it returns the result mid-turn — filed as follow-up if needed. Meta surface: :delegate <preset> <prompt> one-shot direct invocation (useful for testing without depending on the model emitting DELEGATE:, and as a manual "ask <preset> something" verb) Scope: - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch - Norris: not extended; the planner's model anchor would conflict with mid-plan switching (R-C3-adjacent risk) - No self-delegation guard: each DELEGATE is a separate broker call, not recursive; a delegate result reaching the next turn could contain another DELEGATE but that's bounded by max_tool_depth-style iteration cap on the parent - No cost prompt: configuring a paid cloud preset already implies consent to spend on it - Unknown preset → error status + exec-output note "[delegate X failed: unknown preset]" Extractor unit-tested with 8 cases (single-quote, double-quote, multi- line prose, empty prompt, no-quotes, case-sensitive, wrong prefix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:29:09 +00:00
marfrit	f94d16fc89	repl: background CMD&: with handle/poll (closes #8 ) Builds, long-running network calls, and file watches no longer block the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL to spawn the command in the background, return immediately, and poll for completion between user inputs. Process model: shell-wrapped to avoid needing fork()/execv() FFI. nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null >/dev/null 2>&1 & echo $! The child is reparented to init; we hold only the PID and the path to the .status sidecar. Completion is detected by the .status file existing (the wrapper writes it as its last act). No waitpid needed — the child isn't ours after the popen subshell exits. Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is created lazily at startup (mkdir -p). Requires history.dir to be configured; without it CMD&: emits an error status and the model sees an "[bg failed to start]" exec-output note. check_bg_done() runs at the top of each main-loop iteration alongside check_every_due(). When a job is detected as exited, the REPL: - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>" - appends the same string to ctx as exec output, so the model sees the completion on its next turn (natural follow-up: "ok the build finished; let me check the log") Meta surface: :bg-spawn <cmd> start a bg job directly (no AI needed; also useful for testing without depending on the model emitting CMD&:) :bg-list show running/done jobs (id, pid, state, runtime, cmd) :bg-output <id> dump the log file to stdout :bg-kill <id> SIGTERM (note: only delivers if the PID is still the actual command — long-lived shells may need pkill by name) Scope (deliberately limited for v1): - No callback-mode readline: bg completion detection is pre-prompt, not mid-readline. If a build finishes while the user is typing, notification comes when they hit Enter. - Permission policy DSL (#9) does NOT apply to CMD&: — the asynchronous gating model wasn't designed for the y/N flow. Filed as follow-up if needed. - Norris not extended: helpers.exec_cmd is still synchronous; the planner doesn't dispatch bg jobs. - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>" and a "[plan] would bg-run: <cmd>" exec-output note, no spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:25:55 +00:00
marfrit	67d80e1047	repl: :every recurring prompts via pre-prompt due-check (closes #11 ) In-session timer that re-injects a prompt every N seconds. "Watch this thing" workflows (`:every 5m "check journalctl -u nginx for errors"`) without spawning a separate aish process. Approach: minimum viable. check_every_due() runs at the top of each main-loop iteration — timers fire BETWEEN user inputs, not during readline waits or active broker calls. Mid-stream firing would require rewriting ffi/readline to callback mode (substantial scope). If the on-the-fly firing requirement matters in practice it can land as a follow-up issue against the readline FFI. Meta: :every <interval> <prompt> schedule (interval: 30s \| 5m \| 2h \| bare int) :every list show jobs (id, interval, time-until-next, model, prompt) :every cancel <id> remove Defaults: - Model: "fast" preset if defined in config.models, else active model (per the issue's "recurring prompts should default to fast preset"). - In-memory only — jobs don't persist across restarts. - Suppressed while ctx.norris_active (planner stays on goal anchor). - Quotes around the prompt are stripped if present. - Each tick fires the job once, re-schedules next_fire = now + interval (no catch-up if the interval elapsed multiple times during a long user input). Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed), load via require, end-to-end :every list / cancel surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:23:07 +00:00
marfrit	17e62c0326	safety: permission policy DSL — allow/confirm/deny rule lists (closes #9 ) The confirm_cmd boolean was too coarse: true interrupts every harmless ls; false ungates everything. Most workflows want trust for read-only ops while still gating writes/network/sudo. New config: permissions = { allow = { "^ls%s", "^cat%s", "^git status" }, confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" }, deny = { "^ssh%s+root@", "^curl%s+http[^s]" }, } Verdict order: deny > confirm > allow. First match in the chosen category wins. Unmatched defaults to "confirm". Patterns are Lua patterns (not regex) per PHASE0.md §3 — no compiled extensions. Verdict behavior in the interactive CMD: loop: - allow → run without prompt - deny → status line, skip - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true) Backward compat: - permissions unset + confirm_cmd=true → always confirm - permissions unset + confirm_cmd=false → always allow - permissions set → policy table is authoritative Scope deliberately limited to the interactive AI-suggested CMD: gate. Norris autonomous mode keeps its own safety.is_destructive machinery (combining the two would double-gate or replace the LLM probe — both non-obvious behavioral changes that belong in their own issues). User-typed shell-routed lines (`router.classify → "shell"`) and :exec also bypass the policy by design — those are direct user intent. New introspection: :perms list — show the configured rule lists :perms check <cmd> — report verdict + matching rule (debug) safety.classify_command is exported and unit-tested with 12 cases covering each category, priority order (deny > allow on overlap), and both fallback paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:20:56 +00:00
marfrit	518c01a9f5	repl: user-defined skills loader (closes #2 ) PHASE0.md §5.2 froze the meta-command set at compile time. Skills let the user package repeatable workflows (project queries, prompt templates, audit routines) without forking aish. Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever $AISH_SKILLS_DIR points at — used both by users with non-XDG layouts and by CI). Each module exports: return { name = "<meta-cmd-name>", -- must match [%w_-]+ description = "<one-line>", -- shown by :skills run = function(args, h) ... end, } Helpers passed to run(): h.ask(text) — same path as :ask (with @path expansion) h.status(s) — emit "[aish] s" h.exec(cmd) — run a shell command (subject to plan_mode, hooks) h.model() — current active model name h.ctx — raw Context object (advanced) h.config — the loaded config table Validation rejects modules that miss name/run, use whitespace in the name, or collide with an existing meta command (built-in or earlier skill). Each rejection emits a status line so the user sees why a skill didn't appear. New meta command :skills lists what's loaded (sorted, with description). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:00 +00:00

1 2 3

137 Commits