Commit Graph

137 Commits

Author SHA1 Message Date
marfrit c55077bc07 context + repl + config: route-aware context compression (closes #87)
Small local models effectively use a fraction of their advertised
context window. Per-request compression for routes that hit a
local-compress-flagged model preset: keeps only the last N turns
and tail-truncates oversized content. Cloud routes get the full
context unchanged.

Changes:

- context.lua _compress_turns(turns, keep, max_chars): returns a
  new list (self.turns NEVER mutated) with the last `keep` turns
  preserved + content tail-truncated to `max_chars`. Defensive:
  drops tool turns at the slice head (orphaned without their
  assistant-with-tool_calls anchor — strict chat templates would
  reject them; same gotcha PHASE0 §6 warned about for user/user).

- Context:to_messages(opts) — opts.compress = { keep_turns,
  max_turn_chars } swaps the turn iterable for the compressed
  view. Affects BOTH the use_tool_role=true path and the
  use_tool_role=false fallback (PHASE2.md Q18 strict-template
  workaround). Persistence + display via :history see the full
  uncompressed ctx.turns.

- repl.lua ask_ai: when req_cfg (the routed model's cfg) has
  `local_compress = true`, build compress_opts from
  config.context.compress (defaults keep_turns=2, max_turn_chars=800).
  Pass through ctx:to_messages alongside the existing
  system_prompt_override (#86) — orthogonal opts that compose.

- Norris unaffected: safety.norris_step builds its own messages
  array; the planner needs full history per PHASE3 design.

- config.lua gains a header comment explaining the per-model opt-in
  + the context.compress defaults block + the documented tool-turn
  truncation trade-off.

13 unit cases verified:
  - no opts -> full turn list (no regression)
  - keep_turns=2 -> exactly last 2 emitted
  - long content tail-truncated to max_chars
  - self.turns unchanged after render
  - orphan tool-turn at slice head dropped (no chat-template violation)
  - tool turn included WITH its assistant anchor when keep_turns >= 3

E2E against live local broker:
  - models.fast.local_compress = true; keep_turns=1; max=200
  - 4-turn session: each broker call sees ONLY the current turn
    (verified by short coherent CMD replies despite no cross-turn
    memory available to the model). FR-promised small-model
    friendliness in action; conversation continuity is the
    documented trade-off.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 07:50:07 +00:00
marfrit 74e4bffb37 broker + repl + safety: GBNF grammar-sampling passthrough (closes #88)
llama.cpp constrains the sampler to ONLY emit tokens matching a
GBNF grammar. For small models this kills format drift at the
token level — `CMD: <cmd>` is enforced by the sampler rather than
hoped for via prompt discipline.

Probe finding (this commit's pre-implementation): cloud (Anthropic
via Bedrock) silently IGNORES the `grammar` field — returns normally
via standard sampling. Default passthrough is safe for all routes;
no per-model opt-in/opt-out needed in v1.

Changes:

- broker.lua build_request: `if opts.grammar then req.grammar =
  opts.grammar end`. Misformed grammar surfaces at request time
  via the existing transport-error path.

- repl.lua ask_ai: `grammar_override = config.routing.grammars
  [req_class]` (same gating shape as #86's system_prompts override).
  Passed via opts.grammar in the call_broker invocation.

- safety.lua is_destructive threads cfg.safety.probe_grammar through
  opts.grammar so llm_probe constrains the YES/NO output. Skips
  the regex-match dance entirely when the model can't drift.
  Caller-provided opts.grammar takes precedence over cfg.

- config.lua gains two commented examples:
  * routing.grammars per class
  * safety.probe_grammar for the destructive probe

6 unit cases verified (stubbed curl.post_sse / broker.chat):
  - default: no grammar in body
  - opts.grammar -> body contains grammar JSON-encoded
  - safety probe_grammar reaches llm_probe via opts
  - no probe_grammar configured -> opts.grammar nil
  - caller opts.grammar takes precedence over cfg.safety.probe_grammar

E2E against live local broker:
  - `routing.grammars.default = "root ::= \\"ACK\\""` configured;
    prompted "tell me a long story about a fox" -> model output
    EXACTLY "ACK" (sampler forced; would normally produce paragraphs).
    Grammar passthrough end-to-end confirmed.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 07:00:36 +00:00
marfrit 047d629a66 context + repl + config: per-class system_prompt override (closes #86)
Small local models follow precise structured instructions better than
natural language. Per-routing-class system_prompt override gives them
tighter instructions for THAT request while preserving ambient context.

Changes:

- Context:to_messages(opts) — opts.system_prompt_override REPLACES
  the base system_prompt for THIS render only (state unchanged).
  Dynamic blocks ([background], [project], [earlier summary], NORRIS
  suffix) still compose on top. opts is optional; nil-safe for old
  callers.

- repl.lua ask_ai — captures req_class from router.classify_model
  (already returned by Phase 5; previously discarded after the
  status line). Looks up config.routing.system_prompts[req_class];
  passes as opts.system_prompt_override to ctx:to_messages each
  iteration of the tool-sub-loop.

- Gating: override fires only when routing.auto is on (no class ->
  no override). If system_prompts[class] absent for a class, fall
  through to the default system_prompt (no surprise).

- Norris unaffected: safety.norris_step builds its own messages
  array; doesn't go through this path.

- config.lua gains a commented-out example showing routing.system_
  prompts with the code/default examples from the FR body.

Smoke verified:
  - 12-case context.lua unit test: opts nil/absent/present, override
    replaces base, dynamic blocks still compose, state unchanged
    after call, Norris-mode coexistence (suffix still present;
    background still suppressed).
  - E2E against cloud broker with routing.system_prompts.code set:
    triple-backtick prompt -> code class -> override fires; model
    emits terse code-only output. Non-code prompt -> default class
    -> no override -> normal verbose-ish reply.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 05:41:15 +00:00
marfrit df59ee2f2c config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4)
config.lua header gains a Phase 9 paragraph documenting the
project-overlay feature + the R7 shallow-merge warning ("if your
.aish.lua sets a top-level block, it REPLACES the user's entire
block — list every entry OR omit the block"). Inspect at runtime
via `:config show`.

docs/PHASE9.md status header bumped: "Plan + review fold-in" ->
"Implement". Lists the 4 implement commits inline:
  e525063  history: trust file helpers
  34b465d  main: project-overlay loader
  5b6ee55  repl: :config show meta + HELP
  this     config template comment + status bump

Phase 9 implementation complete. Next inner-loop step: verify
(file TCs, run autonomous, close) + memory-update.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:54:53 +00:00
marfrit 5b6ee553db repl: :config show meta + HELP (Phase 9 commit #3)
User-facing diagnostic for the project-overlay layer. Reads
config._sources (R3 cfg-embedded by main.lua's load_config_with_
overlay in commit #2) + the effective config; surfaces which file
contributed each top-level key.

:config show           top-level keys + which source set each
                        (nested tables collapsed to inner-key list)
:config show full      recursive dump with sensitive-key masking

Masking heuristic (any key containing token/secret/auth/key,
case-insensitive) -> "(set)" instead of the value. R6: applied
RECURSIVELY in full mode so the actual leak vector
(mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught.

Defensive depth cap (5) prevents pathological recursion.

When config._sources is absent (caller didn't go through
load_config_with_overlay), status: "(unknown — main didn't pass
_sources)" — meta still runs, just labels source as "?".

N2 known cosmetic false-positive: `key_env` / `auth_env` config
fields hold env-var NAMES (not secrets) but match the heuristic.
Future polish exempts `*_env` patterns. Same for `token_budget`
(contains "token") — also masked despite being a plain number.
Acceptable; errs toward over-masking.

HELP gains 1 :config line.

E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME:
  A. No project overlay: 6 user keys; nested tables collapsed.
     `secrets` masked as (set) at top level.
  B. Project overlay accepted: source map cleanly partitioned
     (user has 4 keys; project has 2 — default_model + models);
     each top-level row tagged [user] or [project].
  C. :config show full: nested dump; auth_token in models.cloud
     correctly masked as (set); SECRET_VAL never appears in
     output (grep count = 0).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Commit #4 next: config.lua template comment + PHASE9.md status
header -> Implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:54:30 +00:00
marfrit 34b465d6dc main: project-overlay loader (Phase 9 commit #2)
Wires the project-overlay step around the existing load_config.
Activates only when a trusted .aish.lua is found in/above cwd.

Changes:

- _find_project_config() walks libc.getcwd() up to $HOME, returning
  first .aish.lua found. R1 fix folded: proper-prefix check (`dir ==
  home OR dir starts with home .. "/"`) avoids the false positive
  where /home/user2 matches HOME=/home/user via byte prefix.

- _trust_file_path() resolves via $AISH_TRUST_FILE env override,
  else ~/.aish/trusted-projects. Plan-time decision per N3.

- _check_and_maybe_prompt(project_path, history) — calls
  history._sha256_file ONCE; routes through history.is_trusted; on
  miss prompts via rl.readline; on accept persists via
  history.add_trusted. A8 mitigation: if rl.readline fails to load,
  decline silently (no io.read fallback that would consume stdin).

- load_config_with_overlay(opts):
    * Calls existing load_config; seeds sources={k="user", ...}
    * Walks for .aish.lua; if found:
      - In opts.prompt mode (-p, R2): skip the prompt entirely;
        only PRE-TRUSTED overlays load. Avoids io consuming the
        piped stdin that -p will read for context.
      - Else: interactive trust check + prompt.
    * On accept + successful dofile: shallow-merge top-level keys
      ONTO user config; update sources[k]="project" for overlapping.
    * R3: embeds sources on cfg._sources for repl.lua's :config
      show meta to read. No global.
    * Returns (cfg, user_path, project_path | nil).

- main() now calls load_config_with_overlay; on project layer
  active, emits the "[aish] project config: <path> (overlaid on
  <user>)" status line per A4 (AFTER the user-config status).

E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME:
  1. Decline -> overlay skipped; user config active.
  2. Accept -> overlay loaded; project_model active; status line
     "[aish] project config: ... (overlaid on ...)" visible.
  3. Re-startup -> NO prompt (cached via sha); overlay loaded
     transparently. R4 single-sha-call confirmed.
  4. -p mode with untrusted overlay -> skipped silently; piped
     stdin preserved for run_one_shot.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Commit #3 lands :config show + HELP next; commit #4 the config
template comment + status -> Implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:48:22 +00:00
marfrit e525063df3 history: trust file helpers for Phase 9 (commit #1)
Foundation for the project-overlay trust mechanism. No callers yet —
commit #2 wires main.lua to use these.

Three new functions:

  history._sha256_file(path) -> hex digest or nil
    Shells `sha256sum`; parses first whitespace-separated field;
    validates 64-hex-char length. nil on any failure (path missing,
    binary missing, file unreadable). Caller treats nil as "skip
    the trust path" — never crashes.

  history.is_trusted(trust_path, project_path, sha256) -> bool
    Reads trust_path as JSONL; returns true iff an entry exists
    matching BOTH project_path AND sha256. Missing / corrupt /
    unreadable trust file -> false (re-prompt). Per-line JSON
    decode means partial-write corruption affects at most one line.

  history.add_trusted(trust_path, project_path, sha256) -> bool
    mkdir -p parent; append JSONL line {path, sha256, ts (ISO)};
    chmod 600 the trust file (best-effort; ignore failure). Single
    writer per call; append-only.

11 unit cases verified:
  - sha256 known value matches manual `sha256sum`
  - nil / missing-file -> nil (no crash)
  - is_trusted on missing trust file -> false
  - add_trusted + is_trusted roundtrip works
  - Different sha -> not trusted (content-binding)
  - Different path -> not trusted
  - Multi-entry trust file: each entry independently checked
  - chmod 600 verified via stat

Regression: test_safety 87/87, test_router_model 31/31.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:45:07 +00:00
marfrit e796142a23 docs/PHASE9: review fold-in — 0 BLOCKERs + 7 CONCERNs + 5 NITs
Sonnet review of PHASE9 (formulate + analyze + baseline + plan at
31e5de5). No BLOCKERs (manifest design sound); seven real CONCERNs
including a path-prefix bug + a piped-stdin interaction that would
have surfaced at implement time.

CONCERNs (FOLDED):

R1. HOME-prefix walk-up false positive — dir:sub(1, #home) ~= home
    matches /home/user2 when HOME=/home/user. Real bug. Fix:
    `dir ~= home and dir:sub(1, #home + 1) ~= home .. "/"`.

R2. A8's io.read("*l") fallback for trust prompt would consume the
    first line of piped stdin in aish -p mode. Fix: SKIP trust
    prompt in one-shot mode (load only pre-trusted overlays).
    If rl.readline misbehaves interactively, emit status + skip
    overlay (no fallback to stdin in either mode).

R3. Sources-map delivery decided: cfg-embedded as config._sources.
    Globals across module boundaries explicitly avoided. Backward-
    compat: if absent, :config show reports "(sources unknown)".

R4. _prompt_trust signature fixed — takes pre-computed sha; single
    sha256 call per startup per project file.

R5. _check_trusted no longer reimplements trust-file read logic;
    routes through history.is_trusted / history.add_trusted with
    AISH_TRUST_FILE env override (single resolution site).

R6. :config show `full` mode masking now spec'd: same heuristic
    applied RECURSIVELY to nested values (mcp.servers.X.auth_token
    is the actual leak vector).

R7. Shallow-merge UX trap reframed — was "documented as predictable";
    now an explicit conspicuous warning in done-when + UX surface +
    config.lua template that "if your .aish.lua sets a top-level
    block, it REPLACES the user's entire block". Deep-merge with
    explicit-replace-syntax v2 polish.

NITs (APPLIED):

N1. (no doc change — review-prompt clarification only)
N2. key_env / auth_env over-masking documented as known cosmetic
    false-positive (env-var names, not secrets).
N3. Sources-map decision added to open-at-plan-time before
    falling-into-commit-2 surprise.
N4. Trust-file first-write atomicity edge case documented (manual
    delete to recover); temp-file+rename = v2.
N5. Stale "stat" mention in §3 module table removed (A2: io.open
    is sufficient; no new FFI).

Code sketches in §4 + §5 + §6 + §13 commits 2+3 all updated to
reflect the fixes. Manifest is internally consistent + matches the
history.lua API to be added in commit 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:44:20 +00:00
marfrit 31e5de5ad5 docs/PHASE9: analyze + baseline + plan (single bundled commit)
Bundled the three doc steps since the surface is small (4-commit
impl, no major redesigns from formulate).

Analyze findings (12, A1-A12):
  A1-A2 — main.lua surface clean; no new FFI needed
  A3   — Q-P2 RESOLVED via baseline: sha256sum (GNU coreutils)
  A4   — Q-P1: trust prompt AFTER user-config status line
  A5   — Q-P3: don't log walk-up by default; :config show on demand
  A6   — Q-P5: :cfg show top-level by default; `full` for deep
  A7   — Q-P6: project may set secrets.vault (covered by trust prompt)
  A8   — Q-P4 DEFERRED: rl.readline early-startup smoke at impl time
  A9   — walk-up perf <1ms even pessimistic
  A10  — trust-file race: JSONL append-only handles concurrent writes
  A11  — sandboxed dofile out of scope (trust prompt IS the gate)
  A12  — bootstrap order is correct: user→project→secrets_session

Baseline:
  B1 — sha256sum + openssl agree byte-for-byte on noether;
       sha256sum chosen (universal + simpler parse).

§10 Open Qs table now shows resolutions inline (5/6 done; Q-P4
deferred to implement-time smoke).

§13 Implementation Plan added — 4 commits:
  1. history.lua: trust file helpers (read/add/is_trusted + _sha256_file)
  2. main.lua: walk-up + load_config_with_overlay + trust prompt
  3. repl.lua: :config show meta + startup status line
  4. config.lua header note + status -> Implement

Per-commit risk index covers sha256sum-missing case, JSONL partial
write, A8 rl.readline early-startup, symlink-loop walk-up,
:config show token leakage via conservative masking heuristic.

Open at plan-time (resolve at impl):
  - A8 rl.readline behavior; fall back to io.read if broken
  - $AISH_TRUST_FILE env override for CI isolation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:38:10 +00:00
marfrit 4f5c3aeba9 docs/PHASE9: formulate — project-local config overlay (.aish.lua)
Phase 9 formulate manifest + PHASE0 §11 amendment (adds Phase 9 row)
+ PHASE0 §10 amendment (config resolution order now references Phase
9's overlay step). Substrate-touch lands same commit per CLAUDE.md §3.

Four pillars:

  1. .aish.lua walk-up from cwd; stops at $HOME or filesystem root.
     First found file becomes the project layer. Absence = no-op.

  2. Shallow merge over user config: project top-level keys REPLACE
     user keys. Predictable; deep merge surprises with array/table
     semantics. Users compose full blocks explicitly.

  3. Trust prompt + sha256-pinned persistence in ~/.aish/trusted-
     projects (JSONL, mode 0600). First encounter prompts; subsequent
     startups load only if recorded sha matches. Content change ->
     re-prompt. Matches direnv-allow security posture.

  4. :config show meta — lists each source path with the top-level
     keys it contributed + sanitized effective config dump
     (token-bearing fields masked).

Key design decisions documented:

  - Trust mechanism is explicit (not default-trust-all-cwds) —
    .aish.lua runs arbitrary Lua via dofile; hostile cloned-repo
    case is a real concern.
  - $HOME boundary on walk-up — don't search /tmp or /. Repos
    outside $HOME get no project layer.
  - Reload on cd: NO. Config resolved at startup only.
  - sha256 via shelled `sha256sum` (POSIX-portable; avoid
    vendoring a Lua impl).

§9 risk table covers: hostile repo (trust prompt), corrupted trust
file (best-effort skip), updated repo (sha mismatch re-prompts),
dofile errors (pcall-protected), walk-up safety ($HOME boundary).

6 open questions for analyze:
  Q-P1 — trust prompt before/after startup status
  Q-P2 — sha256sum vs openssl dgst (baseline)
  Q-P3 — log walk-up path?
  Q-P4 — rl.readline safe at startup?
  Q-P5 — :config show full vs top-level
  Q-P6 — project-set secrets.vault security

Scope confirmed via AskUserQuestion: project-local overlay (chosen
over cost preflight enforcement and cross-session cost persistence,
both deferred as Phase 10 candidates per §11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:36:35 +00:00
marfrit 08dba69fce config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5)
config.lua:
  - Commented-out `tokenize = { use_endpoint = true }` block with
    parity to the Phase 1-7 example blocks.
  - Documents the two consequences: (1) per-turn network cost
    (~30ms first time, cached after) and (2) token_budget is now
    actually enforced — sessions that fit under char/4 may evict
    earlier under accurate counts.
  - Notes cloud /tokenize 404 fallback path.

docs/PHASE8.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 5 implement commits inline for traceability:
      7ef2a6e  broker: token_count + endpoint cache
      8502517  context: tokenize_fn + _tokens cache
      db26d0c  context: enforce_budget honors token_budget (R2 guard)
      94b7d86  repl: wire tokenize_fn + :cost detail estimate row
      this     config example + status bump

Phase 8 implementation is complete. Resolves Q1 (PHASE0 §13,
originally Phase 3, deferred forward). Next inner-loop step: verify
(7) — file test cases, run autonomous, close. Then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:32:16 +00:00
marfrit 94b7d86926 repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4)
Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised
:cost detail trailing line.

Changes:

- When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is
  set to `function(text) return broker.token_count(active_cfg, text) end`
  before Context.new fires. R4: the closure body references
  active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call
  time, so subsequent :model switches re-route to the new model's
  tokenizer automatically (verified by E2E: :model cloud after the
  fast call still produces clean estimate row).

- :cost detail gains a trailing line per R3:
    estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used)
  N comes from ctx:estimate_tokens() (current in-memory snapshot,
  NOT a comparison against the accumulator sum above which is
  cumulative across calls + evicted turns). Gives at-a-glance
  budget utilization.

E2E verified against live broker:
  - fast model call -> 168 tokens estimated (real BPE via /tokenize)
  - :model cloud + cloud call -> 178 tokens estimated (closure
    follows :model switch correctly per R4)
  - 21% / 22.3% budget utilization shown
  - Accumulator sums and estimate are intentionally different
    (sums are cumulative, estimate is current snapshot) — R3-
    correctly displayed as separate lines

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

With this commit landed, Phase 8 is functionally complete; commit
#5 is config example + status bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:31:40 +00:00
marfrit db26d0ccb7 context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)
Pillar 5 (analyze finding A1) — the real value-add of Phase 8.
Until now, ctx.token_budget = 4096 was set but never enforced;
enforce_budget only looked at max_turns. With commit #2's accurate
tokenization wired in (via commit #4), eviction now finally fires
when the actual context fills the budget.

Loop condition change:

  before:
    while #self.turns > self.max_turns do

  after:
    while (#self.turns > self.max_turns
           or self:estimate_tokens() > self.token_budget)
          and #self.turns > 0 do

R2 guard: the `and #self.turns > 0` clause is essential. When
system_prompt alone exceeds token_budget (e.g. a 5000-token [project]
block with token_budget=4096), the OR-condition stays true even
when turns are empty — table.remove on a 0-length list would
no-op forever while evicted++ spins. Sonnet review caught this;
without the guard, real users could hit an infinite loop just
by setting a small token_budget + opening a large project tree.

Per-pair eviction logic (summarize callback + pair-pop) inside the
loop is unchanged. The estimate_tokens call is potentially expensive
under tokenize_fn — commit #2's per-turn cache amortizes to O(N)
per iteration after first fill; for max_turns=40 + budget=4096
sessions the worst case is microseconds per call.

Unit-verified across 5 cases (with and without tokenize_fn):
  1. max_turns eviction unchanged (no behavior regression).
  2. char/4 path: tight budget evicts to 0 when sys > budget,
     exits via R2 guard.
  3. char/4 path: practical budget evicts to a stable count.
  4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn
     count.
  5. R2 critical: zero turns + oversize sys -> immediate exit,
     evicted=0, no spin.

Behavior change for existing users: a session that fit under
token_budget=4096 by char/4 (~16K chars) may now evict earlier
because accurate counts are HIGHER for most natural-text inputs
(per baseline B2). Users on cloud presets with very large context
windows (Claude 200K) should raise token_budget to match — see §9
risk row in PHASE8.md.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:30:37 +00:00
marfrit 8502517021 context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2)
Foundation for accurate Context:estimate_tokens. When the optional
tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua),
estimate_tokens uses it with per-turn caching for O(1) amortized
cost. char/4 path unchanged when tokenize_fn nil.

Changes:

- Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn.

- Context:estimate_tokens:
    if tokenize_fn nil -> existing char/4 (no behavior change).
    if tokenize_fn set ->
      - tokenize self.system_prompt every call (dynamic per
        compose_background/project/summary; can't cache).
      - for each turn: if t._tokens nil -> compute + cache; else
        use cached. Turn content immutable after append (we never
        mutate stored turns) so cache never goes stale.

- :reset wipes self.turns which takes the _tokens cache with them;
  new turns start with t._tokens == nil and lazy-set on first count.

8/8 unit cases verified:
  - char/4 path unchanged when no tokenize_fn
  - tokenize_fn called 1+ N times on first estimate (sys + N turns)
  - subsequent estimates fire only 1 tokenize call (sys; turns cached)
  - new turn fires +1 tokenize call on next estimate
  - :reset + fresh turn fires fresh tokenize call (cache died with turn)

No callers wire tokenize_fn yet — Phase 8 commit #4 lands the
repl.lua wiring (after commit #3 adds the enforce_budget extension
that's the real beneficiary of accuracy).

Regression: test_safety 87/87, test_router_model 31/31.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:29:56 +00:00
marfrit 7ef2a6ed5c broker: token_count + endpoint capability cache (Phase 8 commit #1)
Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize
where supported, char/4 fallback otherwise.

Changes:

- `M.token_count(model_cfg, text)`:
    Empty text -> 0.
    No endpoint -> char/4 immediately.
    Capability cache says false -> char/4.
    Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`,
    2s timeout. On 200 + parseable `{tokens=[...]}`: cache true,
    return #tokens. Anything else (non-200 / parse-fail / transport
    err / timeout): cache false, char/4.

- `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1
  confirmed /tokenize ignores the model field, so same-endpoint
  presets share one cache entry. If a future broker honors the
  model field, revisit.

- `M.tokenize_supported(model_cfg)`: returns nil/true/false for
  the cached state (introspection for tests + future :tokenize meta).

- `M._reset_tokenize_cache()`: test hook so the session-local cache
  doesn't leak between test runs sharing a LuaJIT VM.

Live verified against hossenfelder + a deliberately-broken endpoint:
  - "hello world" -> 2 tokens (matches manual curl probe)
  - 901-char text -> 201 real tokens vs 225 char/4 (24-token gap;
    real is LOWER here, opposite direction from the README probe
    where it was higher — confirms heuristic is inaccurate in both
    directions)
  - Pre-probe: tokenize_supported() returns nil
  - Post-probe: tokenize_supported() returns true (local) / false (broken)
  - Broken endpoint second call: still char/4, no re-probe
  - Empty / nil text edge cases handled

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:29:17 +00:00
marfrit 467e573d24 docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs
Sonnet-reviewed per reviews-use-sonnet memory directive.

BLOCKERs (RESOLVED in-place):

R1. §5 estimate_tokens pseudocode missing per-turn cache pattern.
    Prose described it; code block called tokenize_fn unconditionally.
    Implementer following code verbatim would hit the O(N round-
    trips per call) perf gap the prose flagged. Code block now
    shows explicit `if t._tokens then ... else t._tokens = ... end`.

R2. enforce_budget loop can spin forever when system_prompt alone
    exceeds token_budget (e.g. 5KB project block + budget=4096 +
    zero turns -> turns can't shrink further but OR-condition stays
    true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit
    3 row shows the explicit Lua-syntax condition.

CONCERNs (FOLDED):

R3. :cost detail per-slot ~est=N annotation was semantically
    undefined — accumulator sum (cumulative across calls + evicted
    turns) vs current-snapshot estimate are incommensurable. §6
    reworked: ONE trailing summary line "[estimated session ctx:
    N tokens; token_budget=M (X% used)]" instead of per-slot
    annotations. §13 commit 4 aligned.

R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT
    capture by value). Subtle but easy to miss — §13 commit 4 now
    spells out the correct vs wrong patterns explicitly.

R5. 2s tokenize timeout can spuriously cache-as-unsupported when
    llama.cpp is busy with a concurrent completion (single-threaded
    inference; /tokenize queues behind). Documented in §9; v1
    ships 2s, revisit during verify if it bites.

R6. Per-endpoint cache key conflated two same-endpoint/different-
    model presets (B1: /tokenize ignores the model field). Cache
    key simplified to endpoint-only. One probe per endpoint per
    session; if a future broker honors the model field, revisit.

NITs (APPLIED):

N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`.
N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1).
N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach
    (trailing summary line; per-slot annotation dropped).
N4. Status header tree-hash updated to current (aa64ad3 -> stays
    fresh through review fold-in; commit 5 will refresh again
    at "Implement" status).

PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for
implementation phase 6 of the inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:28:27 +00:00
marfrit aa64ad3eec docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).

5-commit roadmap, bottom-up:

  1. broker.lua — M.token_count helper + per-endpoint capability
     cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
     per (endpoint, model) for the session. char/4 fallback on any
     non-200 / parse-fail / transport err. M.tokenize_supported
     introspection helper.

  2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
     tokens widens to use it when set, with per-turn `_tokens` cache.
     char/4 path unchanged when tokenize_fn nil.

  3. context.lua — enforce_budget consults token_budget too (pillar
     5 from A1). Loop condition: turns>max_turns OR estimate_tokens
     >token_budget. Existing summarize-on-evict callback unchanged.

  4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
     Closure captures active_cfg upval (A5 — follows :model switches
     naturally). :cost detail extension: trailing line showing
     estimated session ctx tokens for comparison with the per-slot
     prompt_tokens sums in the accumulator.

  5. config.lua commented `tokenize = { use_endpoint = true }`
     example + PHASE8.md status -> Implement.

Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).

Two items open at plan, resolve at implement:
  - exact :cost detail layout for estimated session ctx row
  - whether to add a :tokenize debug meta (defer unless useful in verify)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:24:41 +00:00
marfrit 79bd40db79 docs/PHASE8-baseline: live /tokenize probes
Four findings, all align with formulate/analyze:

B1. /tokenize IGNORES the `model` request field — returns the
    tokenization of whichever model is currently loaded on the
    proxy backend, NOT the requested model. Acceptable: a real BPE
    count is still much better than char/4, and the gap between
    Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s
    regardless, so cloud falls back to char/4 via the capability
    cache.

B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars.
    Network round-trip dominates. Per-turn _tokens cache amortizes
    to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time
    cost on first enforce_budget call. Acceptable.

B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs;
    we use #response.tokens for count, discard the IDs). JSON not
    SSE; ffi.curl.M.post is the right call.

B4. Cloud /tokenize 404s as expected. Capability cache marks it
    unsupported on first probe; char/4 fallback silent thereafter.
    No design change.

Q-T5 RESOLVED per B1. All open questions now resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:22:05 +00:00
marfrit 1a136d81b7 docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)
Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).

MAJOR FINDING:

A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
    with accurate tokenization, eviction decisions are unaffected.
    The new estimate_tokens() would just feed the prompt template
    display. Pillar 5 added: enforce_budget evicts when EITHER
    max_turns OR token_budget is exceeded. This is the real
    motivation for accurate tokenization.

Other findings:

A2.  ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3.  Single caller of estimate_tokens today; enforce_budget becomes
     the second (more frequent) caller — per-turn _tokens cache
     becomes important.
A4.  Q-T1: cache lives on turn dict; dies with turns on :reset.
A5.  Q-T2: closure captures active_cfg upval; follows :model switch
     naturally.
A6.  Q-T3: opt-out skips the probe entirely (no wiring).
A7.  Q-T6: tools-schema tokens deferred to follow-up (fixed per
     session; under-count bounded).
A8.  _tokens cache invalidation: only :reset; turn content is
     immutable after append.
A9.  Probe latency ~50ms/call locally; per-turn cache amortizes to
     O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
     different endpoints, different code paths.

§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:21:24 +00:00
marfrit 00869ba412 docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).

Four pillars:

  1. Per-endpoint /tokenize probe (cached). One round-trip on first
     call per (endpoint, model); capability cached for session.
     hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
     tokenize — per real probe; the path is endpoint-local, not
     under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
     char/4 fallback.

  2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
     falls back to char/4 on miss. Always returns non-negative int;
     never errors. 2s tight timeout; failures cache as not-supported.

  3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
     Context.new; uses it when present, char/4 otherwise. repl.lua
     wires `tokenize_fn = function(text) return broker.token_count(
     active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
     Per-turn _tokens cache to amortize across estimate calls.

  4. :cost detail est-vs-actual annotation. When the heuristic
     disagrees with the actual prompt_tokens from broker usage by
     >10%, show `~est=N`. Silent otherwise. Display-only; no
     behavior change.

Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.

Baseline already observed during formulate:
  - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
  - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
  - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
    (508 vs 558 on a 2KB README sample). Material for context-
    budget eviction decisions.

Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).

Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:19:53 +00:00
marfrit 1f34b6dce8 config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE7.md). N5: PHASE0 §11 amendment landed in commit 3bad07b
(formulate); not re-applied here.

config.lua:
  - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block
    with parity to the Phase 1-6 example blocks.
  - Notes warn flags are independent (R4) and per-turn usage flows
    to session/*.jsonl for after-the-fact analysis.

docs/PHASE7.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits inline for traceability:
      7364963  broker: usage capture + opts widening
      7b4a9be  context: accumulator helpers
      8adebd5  repl: _record_usage + opts.category at 5 sites
      b30212a  safety + repl: opts.category for Norris + probe
      0d6ff93  repl: :cost meta surface
      this     config example + status bump

Phase 7 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:55 +00:00
marfrit 0d6ff93134 repl: :cost meta surface (Phase 7 commit #5)
User-facing reporter of the per-session accumulator. Three shapes:

  :cost            one-line summary (calls / tokens / cost)
  :cost detail     per-model + per-category breakdown
  :cost reset      zero the meter; clears warn flags

All read-only against ctx.usage_totals; no broker calls.

R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile
cost==0 heuristic. Summary line classifies:

  cloud only -> "cost=$X.XXXXXX"
  cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens
                       but no cost field)"
  local only -> "cost=$X.XXXXXX (local only; no cost field)"

R7 — :cost detail rows sort by (cost desc, model asc, category asc).
Three-level key for deterministic output across equal-cost rows
(table.sort is unstable; identical costs would otherwise reorder).

R10 — all dollar values use $%.6f formatting. Sub-cent precision is
critical: a Haiku call can cost $0.000028; $%.4f would round it to
$0.0000 — indistinguishable from local $0.

Column width widened to %-26s to fit fully-qualified cloud model
names (e.g. "anthropic/claude-haiku-4.5" = 25 chars).

E2E verified against live cloud + local broker:

  :cost (empty session)          -> "0 calls, $0.000000"
  ...after mixed-mode session...
  :cost                          -> "5 calls, prompt=472 / completion=26
                                     tokens, cost=$0.000377 (cloud only;
                                     local: tokens but no cost field)"
  :cost detail                   -> 4 rows: main cloud $0.000219, probe
                                     cloud $0.000128, delegate cloud
                                     $0.000030, main local $0.000000
                                     (local). Sort by cost desc within
                                     model.
  :cost reset                    -> "cost meter reset"; subsequent
                                     :cost shows zeros.

All 5 categories appeared in the same session: main (twice — cloud
+ local), delegate, probe (x2 from :safety check). Warn-threshold
firing already verified in commit #3 + #4.

HELP gains 3 :cost lines.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:24 +00:00
marfrit b30212af0f safety + repl: opts.category for Norris + probe (Phase 7 commit #4)
Closes the last two broker call sites that flow through safety.lua.
Together with commits #1-#3, all 7 broker call sites in aish now
attribute usage to the cost accumulator under the right category.

Changes:

  safety.lua:

  - llm_probe (the YES/NO destructive checker) — broker.chat call
    gains opts.category = "probe". Captures (text, usage) via
    (reply, second) and, when opts.on_usage is provided AND the
    call succeeded, routes second through opts.on_usage(model,
    category, payload). N4 signature chain: opts already flowed
    through llm_second_opinion -> M.is_destructive from #52's
    work; opts.on_usage rides along naturally with no further
    signature change.

  - M.norris_step (Norris main broker round-trip):
      * opts to broker.chat_stream gains category = "norris"
      * probe_opts (passed to is_destructive inside the loop)
        gains on_usage = helpers.on_usage so the LLM probe's
        cost lands under "probe" too
      * on_delta wrapper adds elseif kind == "usage" branch that
        calls helpers.on_usage(payload.model, payload.category,
        payload). Coexists cleanly with the existing text (rehydrator)
        and tool_call branches.

  repl.lua:

  - Norris helpers table gains on_usage = _record_usage. The R5
    central chokepoint (commit #3) does the warn-threshold check
    AND ctx:add_usage atomically.

  - :safety check meta's probe_opts always carries on_usage now
    (independently of whether secrets_session is set). secrets-aware
    scrub_msgs/rehydrate added conditionally as before.

E2E verified against live broker (safety.llm_model = "cloud"):
  - :safety check ls -la /tmp -> 2 cloud probe calls
  - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100"
  - probe category visible in accumulator (would appear in :cost detail
    once commit #5 ships the meta).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:01:21 +00:00
marfrit 8adebd52cc repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3)
Wires broker.lua's on_delta("usage", payload) and broker.chat's
(text, usage) return to the ctx accumulator via a single chokepoint.

Changes:

  - Forward decl `local _record_usage` near _bg_spawn — same pattern;
    the summarize-on-evict closure in make_summarize_fn (built at
    line 299) needs lexical access to _record_usage (assigned at
    line 695), so forward-declare and assign-without-`local`.

  - _record_usage(model, category, usage) — R5 central chokepoint:
    routes to ctx:add_usage, then checks the per-threshold warn
    state. R4: cost_warn_state has two independent flags (dollars
    and tokens) so first-to-fire doesn't suppress the other. R10:
    warn message uses $%.6f for sub-cent precision.

  - call_broker wrapper: wrapped on_delta now branches on
    kind == "usage" -> _record_usage(payload.model, payload.category,
    payload). R2: keys by payload.model (set inside broker.lua from
    model_cfg.model). When fallback fires, broker is called with
    fb_cfg, so payload.model IS the fallback's name automatically —
    wrapper doesn't track primary-vs-fallback itself.

  - 5 caller sites wired with opts.category:
      ask_ai call_broker             -> category="main"
      summarize-on-evict             -> category="summarize"
      DELEGATE: handler              -> category="delegate"
      :memory summarize              -> category="memory_summarize"
      :delegate meta                 -> category="delegate"

  - All 4 broker.chat call sites switched from
      local reply, err = broker.chat(...)
    to
      local reply, second = broker.chat(...)
    branching on reply nil-ness to interpret second (err on failure,
    usage on success). Captured usage routes through _record_usage.

E2E verified against live cloud broker:
  - cloud prompt -> reply "Hi! 👋"
  - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010"
  - R10 sub-cent precision visible in both numbers.

Norris + safety paths still untouched — commit #4 wires those.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:00:06 +00:00
marfrit 7b4a9becc2 context: cost/usage accumulator (Phase 7 commit #2)
Adds the per-conversation accumulator that broker.lua's
on_delta("usage", ...) payload feeds into. No callers yet —
commit #3 wires the broker callback to ctx:add_usage in repl.lua,
commit #4 in safety.lua.

Changes:

- Context.new: new fields `usage_totals = {}` and
  `cost_warn_state = { dollars = false, tokens = false }`. R4:
  two independent flags so warn_at_dollars firing doesn't
  suppress warn_at_tokens (or vice versa).

- Context:add_usage(model_name, category, usage):
  Increments usage_totals[model_name][category] slot. R6: when
  usage.cost is nil (local llama.cpp per B3), sets a sticky
  `is_local = true` flag on the slot AND does NOT add to cost
  (preserves the local-vs-cloud-zero distinction for :cost detail
  annotation). When usage.cost is a number (cloud), accumulates.

- Context:total_cost() / total_tokens() — pure-Lua summation
  across all slots; total_tokens returns (prompt, completion).

- Context:reset_usage() — explicit :cost reset path; zeros
  usage_totals AND clears both flags atomically.

- Context:reset() — R8 parity: does NOT clear usage_totals OR
  cost_warn_state. Matches the Phase 4 memory_items / Phase 6
  project rule ("ambient context survives a user-driven
  conversation reset").

Smoke verified (20/20 unit cases):
  - Empty zeros; cloud cost accumulation; local nil-cost preserves
    is_local=true sticky; calls counter; cost summation across
    multiple cloud calls; is_local sticky after a later nil-cost
    call on a cloud slot; separate slots per (model, category);
    :reset preserves; :reset_usage zeros both totals and flags.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:57:56 +00:00
marfrit 7364963b00 broker: usage capture + opts widening (Phase 7 commit #1)
Foundation for Phase 7. broker.chat_stream now emits a third
on_delta kind ("usage") after the stream completes successfully;
broker.chat returns (text, usage). Backward-compatible — existing
callers that ignore the new kind / second value continue working
via Lua's drop-extra-returns semantics.

Changes:

- build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`.
  opts.tools / opts.max_tokens / opts.include_usage / opts.category
  all live inside opts now. Both internal call sites updated.

- opts.include_usage defaults to true for streaming requests; sets
  `stream_options: { include_usage: true }` in the request body.
  B1: required for local llama.cpp to emit usage; cloud honors as
  a no-op (emits anyway).

- on_event captures `doc.usage` into a closure-local `final_usage`.
  N1: the check is INDEPENDENT of the choice/delta branches — local
  emits usage on choices=[] chunks (choice nil) while cloud emits
  with non-empty choices + finish_reason. Both shapes funnel here.

- After curl.post_sse returns successfully (NOT on transport/api
  errors), if final_usage is set, emit on_delta("usage", {prompt_tokens,
  completion_tokens, total_tokens, cost, model, category}). cost is
  nil for local (R6 preserves the nil vs 0 distinction the
  accumulator needs). model is model_cfg.model — caller-stable per
  B4 + R2 so call_broker's fallback retry attributes usage to the
  fallback's model name without wrapper-side tracking.

- M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage"
  alongside "text"; M.chat returns (text, usage). Without this fix
  4 of 5 non-streaming categories (summarize / delegate /
  memory_summarize / probe) would silently report zero usage.

Smoke verified against live hossenfelder:8082:
  - CLOUD chat   -> (text, usage); cost=2.9e-05, model=anthropic/...
  - LOCAL chat   -> (text, usage); cost=NIL (correct per R6),
                    model=qwen-coder-7b-snappy-8k
  - CLOUD stream -> on_delta("usage", {...}) with category="test"
                    echoed; model name caller-stable.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:57:14 +00:00
marfrit d4c20f09df docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs
Sonnet-reviewed (per the reviews-use-sonnet feedback memory).

BLOCKERs (RESOLVED in-place):

R1. M.chat would silently return (text, nil) for ALL non-streaming
    callers — 4 of 5 categories (summarize/delegate/memory_summarize/
    probe) flow through broker.chat, NOT chat_stream. §4 now shows
    the explicit M.chat update that captures kind=="usage" alongside
    "text" and returns (text, usage).

R2. call_broker fallback retry would credit usage to the wrong model
    name. Fix: broker emits payload.model = model_cfg.model (which IS
    the fallback's name when called with fb_cfg — chat_stream's
    upvar). Wrapper keys by payload.model, NOT outer model_name. §4
    + §13 commit 3 reflect.

R3. build_request has TWO internal callers inside broker.lua itself,
    not just the public surface. Plan §13 commit 1 risk row now
    spells this out explicitly so the implementer doesn't read "every
    caller already passes opts" as "external-only".

CONCERNs (FOLDED):

R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
    suppresses the other. Split into ctx.cost_warn_state = { dollars
    = false, tokens = false }; :cost reset clears both. §7 + §13.

R5. Warn-check centralization — single _record_usage helper in
    repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
    routes via helpers.on_usage / opts.on_usage callbacks. context.lua
    stays decoupled from renderer.

R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
    `is_local = true` (sticky) when ANY recorded usage had cost==nil.
    `:cost detail` annotation comes from is_local flag, not a
    fragile cost==0 heuristic.

R7. :cost detail sort needs 3-level deterministic key:
    (cost desc, model asc, category asc) — table.sort is unstable.

R8. call_broker fallback passes opts.include_usage unchanged.
    Documented as known assumption (B1 confirms both backends
    accept; future-broken fallback can pass include_usage=false).

R9. :resume does NOT restore historical usage_totals. Per-turn usage
    IS in session JSONL for scripting; cross-session aggregation is
    Q-C2 deferred. Documented in §8.

R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
     Widened to $%.6f in §6 + §7 warn message format.

NITs (APPLIED):

N1. §4 pseudocode comment notes `if doc.usage` branch is independent
    of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
    M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
    (3bad07b); commit 6 must NOT re-apply.

PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for
implementation phase pending user gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:55:59 +00:00
marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:50:39 +00:00
marfrit 2244a3f1ee docs/PHASE7-baseline: live broker probes for usage shape
Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.

B1. `stream_options.include_usage = true` is safely accepted by
    both backends. REQUIRED for local llama.cpp to emit usage;
    no-op for cloud (which emits anyway). Default-true is correct.

B2. Two emission patterns observed:
    - Cloud (Bedrock): usage rides the FINAL delta chunk with
      non-empty `choices` carrying finish_reason.
    - Local: usage rides a SEPARATE chunk with `choices: []`
      preceding `[DONE]`.
    Both shapes are handled by the same `if doc.usage then ...`
    check; the existing on_event choices-branch short-circuits
    safely when choices is empty.

B3. `cost` field is dollar-denominated (number) and cloud-only.
    Local returns `timings` instead (perf, not cost). Accumulator
    captures `usage.cost` as-is; nil treated as 0. :cost detail
    annotates local lines so $0 isn't misread.

B4. `doc.model` in the usage event reflects the upstream-API-version
    (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
    `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
    caller-intended `model_cfg.model`, NOT `doc.model`, for stable
    cross-call comparison.

B5. Usage event is always the LAST data event before `[DONE]`.
    Emission of `on_delta("usage", ...)` happens after curl.post_sse
    returns — one call per stream, after all text + tool_calls.

Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:53 +00:00
marfrit f0bccdec48 docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at 3bad07b probed).

11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline):

A1.  broker.chat_stream surface clean — usage capture via closure-local
     + on_delta("usage") emission after curl.post_sse returns.
A2.  7 caller sites for opts.category threading (probe / norris /
     summarize / main / delegate x2 / memory_summarize).
A3.  build_request signature widens to (model_cfg, msgs, stream, opts)
     to absorb tools / max_tokens / include_usage / stream_options
     without further positional growth.
A4.  Q-C3 RESOLVED: free-form categories (caller decides); matches
     Phase 6 helpers/skills convention.
A5.  Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call
     delay).
A6.  Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only
     :cost reset clears.
A7.  Norris call-graph rewires (commit 955bd82) — secrets streaming
     rehydrator wraps only "text" kind; new "usage" kind passes
     through unchanged. No new entanglement.
A8.  ctx.usage_totals survives :reset (R8 parity with memory_items,
     project).
A9.  Session JSONL inherits the new field automatically (dkjson
     opaque encoding).
A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage.
     Real probe required for local model — baseline action.
A11. Q-C4 deferred to baseline (real broker probe).

§2 build_request row updated to mention the A3 refactor.
§11 Open Qs table now shows all 6 with resolutions; only Q-C4
remains as a baseline-time probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:03 +00:00
marfrit 3bad07b2da docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).

Four pillars:

  1. Usage capture in broker.chat_stream — extract `usage` from the
     final SSE chunk (OpenAI streaming spec with `stream_options:
     {include_usage: true}`). Surface via new on_delta("usage",
     payload) kind. broker.chat returns (text, usage) — backward-
     compat: existing callers ignore the second value.

  2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
     tables (categories: main / delegate / summarize / memory_summarize
     / probe / norris, tagged at the call site via opts.category).
     :reset preserves usage_totals (R8 parity with memory_items /
     project). Session JSONL gains an optional `usage` field on
     assistant turns for after-the-fact analysis.

  3. :cost meta surface — :cost (summary), :cost detail (per-model +
     per-category breakdown), :cost reset (zero the meter). Pure-Lua
     read of ctx.usage_totals; no broker calls.

  4. Optional warn thresholds — cfg.cost.warn_at_dollars /
     warn_at_tokens emit a one-shot status when crossed. Default off;
     useful with cloud presets configured.

Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.

Open at formulate:
  Q-C1 — provider-without-usage handling (local llama.cpp probably)
  Q-C2 — cross-session persistence (defer to phase 8)
  Q-C3 — categories closed-set vs free-form
  Q-C4 — does hossenfelder forward stream_options to all backends?
  Q-C5 — warn fires on the call that crosses, or the next one?
  Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?

Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:47:58 +00:00
marfrit 955bd82efb safety + repl: wire secrets into safety.lua (closes #52)
Closes the last #13 gap — Norris broker call + is_destructive LLM
second-opinion probe were the two egress points NOT covered by the
scrub-at-egress design in commit d852aca.

Approach: option (b) per #52's fix sketch — callback-via-helpers/opts.
safety.lua does NOT gain a require("secrets") dependency (acceptance
criteria 3); integration is purely through the convention the rest
of the helpers table already uses.

safety.lua changes:

  - llm_probe gains an opts table. When opts.scrub_msgs is set, the
    {system, user(cmd)} message pair is scrubbed before broker.chat.
    When opts.rehydrate is set, the YES/NO reply is rehydrated before
    parsing (defensive — the verdict shouldn't carry placeholders but
    rehydration is a safe no-op if it doesn't).

  - llm_second_opinion threads opts through to llm_probe.

  - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is
    backwards-compatible (no scrub, original behavior).

  - M.norris_step:
      * outbound broker.chat_stream message scrubbed via
        helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided.
      * on_delta wrapped with helpers.streaming_rehydrator():push /
        :flush so the user sees rehydrated text AND text_parts
        accumulates rehydrated chunks (parity with ask_ai in repl.lua).
      * both M.is_destructive call sites (tool_call probe + CMD: probe)
        now pass probe_opts = {scrub_msgs, rehydrate} when the
        helpers carry them.

repl.lua changes:

  - Norris helpers table gains scrub_msgs / rehydrate /
    streaming_rehydrator closures, all nil-safe (return identity /
    nil when secrets_session is nil).

  - :safety check meta passes probe_opts to is_destructive when
    secrets_session is configured. Without secrets, behavior unchanged.

Unit-test verified end-to-end:
  - Stubbed broker.chat captures the messages it receives.
  - Without opts: probe SEES `ghp_realsecretvalue_...` (control).
  - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:40:30 +00:00
marfrit ac58b19da2 config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE6.md per the review fold-in).

config.lua:
  - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }`
    block with the same shape as the Phase 1-5 example blocks.
  - Note that :diff / :tree / :highlight all work without config; the
    `project` block ONLY controls the startup auto-inject.
  - Note about :highlight v1 having no config flag (runtime-only),
    cross-references the in-REPL install hint.

docs/PHASE6.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits in the header for traceability:
      c4fc7fd  context: compose_project plumbing
      d1dce83  _scan_project_tree + :tree + auto_tree hook
      4d5f93a  :diff + _git_clean_cmd (B1 helper)
      0d63f01  expand_mentions @<r1>..<r2> tiered resolution
      11d0e59  tree-sitter highlighter (renderer fence filter +
               highlighted dispatch + :highlight meta)
      this    config example + status bump

Phase 6 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests against the live broker on each pillar
plus filing of issues for any defects, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:27:58 +00:00
marfrit 11d0e599cd repl + renderer: tree-sitter highlighter (Phase 6 commit #5)
The largest Phase 6 commit — fence-aware stream filter in renderer.lua
+ external tree-sitter dispatch + :highlight meta in repl.lua.

renderer.lua — fence-aware filter wrapping assistant_delta:

  M.set_highlight(enabled, detected, highlight_fn)
      Called by repl.lua at startup AND on every :highlight toggle.
      Stores state in module-locals (off by default).

  State machine inside _hl_push:
    outside: pass chunks through; HOLD trailing partial-fence chars
             (per R1 — local llama.cpp splits ```python as `'``'`
             then `'`python\n'`, so naive pass-through drops the
             leading "``" and never recovers).
    inside:  buffer cumulatively until "\n```" appears; emit
             highlight_fn(body, lang) then the closing fence verbatim.
             Recursive call handles "rest" after the closing fence.

  N1: fences only open at start-of-stream OR after a newline
      (`^```` or `\n```` only). Inline backticks in prose
      ("use ``` to mark code") do not open a fence.

  R3 (PTY raw-mode toggle per highlight call): no change here — every
      executor.exec call already toggles raw-mode (existing behavior
      since Phase 1). The risk is theoretical; smoke-test interactively
      after install if multi-fence renders show flicker.

  assistant_flush handles end-of-stream gracefully: drains any held
  partial-fence tail OR an unterminated inside-fence buffer.

repl.lua — _detect_treesitter + highlighted + :highlight meta:

  _detect_treesitter()  one-shot popen probe of `tree-sitter --version`.
                        Run once at startup; cached as
                        highlight_detected.

  highlighted(body, lang_tag)   R2-placed in repl.lua (has _shq +
                                executor access). Translates the fence
                                tag (`py`, `python`, `lua`, etc.) to
                                a canonical lang via LANG_TAG, picks
                                the canonical extension via LANG_EXTENSION,
                                writes body to a tmpfile with that
                                extension, runs `tree-sitter highlight
                                <tmpfile>` via executor.exec, returns
                                the output. On ANY failure (CLI absent,
                                non-zero exit, empty output), returns
                                `body` unchanged — silent pass-through.

  R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help`
      on noether; confirmed:
        - NO `--lang` flag exists (formulate-time assumption wrong)
        - takes a PATH; language inferred from file extension
        - alternative `--scope source.X` exists but also unreliable
          without configured grammars
      Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]`
      and pass the path. Matches the documented upstream contract.

  B4-followup: even with the CLI installed, highlighting requires
      `~/.config/tree-sitter/config.json` parser-directories with
      cloned + built `tree-sitter-<lang>` grammars. Without parsers,
      every call exits non-zero and we silently pass through. The
      :highlight install hint surfaces all three install steps so the
      user knows what's actually needed.

  :highlight [on|off|status] meta:
      no arg     -> flip
      on/off     -> set explicit
      status     -> report toggle + CLI detection state
      When toggled on AND CLI absent: emit a 4-line install hint
        (CLI install, init-config, grammar clone reminder).
      When toggled on AND CLI present: emit a 1-line note that
        parser-directories must be set up for actual highlighting.

HELP gains :highlight entry.

Tested:
  10/10 unit cases on the renderer state machine, including:
    - plain prose passthrough
    - single-chunk fence
    - B2 split fence ("``" + "`python\n" + "x=42" + "\n```")
    - N1 SOL anchor (mid-line ``` does not open)
    - trailing \n properly emitted across chunks
    - SOL-only fence open
    - prose after closing fence preserved
    - two fences in one stream
    - highlight off = passthrough (callback never fires)

  E2E :highlight meta verified:
    :highlight status -> off / detected
    :highlight on     -> toggles + emits parser-dir reminder
    :highlight status -> on / detected
    :highlight off    -> off

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config
example block + status -> Implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:27:04 +00:00
marfrit 0d63f01601 repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4)
Per A6 (tiered resolution): @<token> tries file lookup first; if the
file doesn't exist AND the token contains "..", retry as a git
ref-range and substitute with a fenced `diff` block. Preserves the
existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels
the comma, resolves the ref, restores the comma after the closing
fence).

Resolution order for @<token>:
  1. io.open(token, "rb")    -- file lookup, with trailing-punct peel
  2. if (1) fails and token contains "..":
        git --no-pager -c color.ui=never diff <r1>..<r2>
     on exit 0 + non-empty body: substitute as ```diff fenced block
  3. else: leave literal `@token` + emit "[aish] @X: not found" status

Examples:
  @README.md            -> file (path branch)
  @../sibling.txt       -> file (path branch; `..` only triggers retry
                                 when path lookup FAILS, so existing
                                 paths with `..` segments are unaffected)
  @HEAD~1..HEAD         -> diff (path fails, ref succeeds)
  @origin/main..feature -> diff (path fails — no such literal file;
                                 ref succeeds; `/` in ref is fine because
                                 we don't use the path's `/`-absence as
                                 a discriminator)
  @nonsense..gibberish  -> literal preserved (both fail)

Required restructuring:
  - _shq and _git_clean_cmd lifted from M.run closure scope to module
    scope (above expand_mentions). Single source of truth for the
    B1 prefix shared with commit #3's :diff. The in-M.run duplicates
    are removed.
  - expand_mentions now references `executor` (already required at
    module scope on line 7) for the diff retry.

Status messages updated:
  - File expansion: "@<path> expanded (N bytes, truncated)"  (existing)
  - Diff expansion: "@<path> expanded (N bytes, diff)"        (new)

Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14):
  ref-range expansion shape, body contains `diff --git`, trailing
  prose preserved, @../path stays as file (not diff), neither-path-
  nor-ref preserves literal, trailing-comma peel composes with ref
  retry.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:20:25 +00:00
marfrit 4d5f93aaa5 repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3)
User-driven git diff injection. The model sees the diff on the next
ask_ai turn through the existing exec_output channel.

Changes:
  - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree.
    B1: every git invocation that flows into context MUST use
    `--no-pager -c color.ui=never`. Forkpty makes git think stdout
    is a TTY, enabling both color and the pager's keypad/line-clear
    escapes — these would pollute the captured context block. The
    helper is the single chokepoint; commit #4's @<r1>..<r2> retry
    will reuse it.

  - :diff [<args>] meta:
      - Reads cwd at meta invocation (R6: differs from :tree's
        scan-time cwd capture; documented in §5).
      - Runs `_git_clean_cmd("diff " .. args)` via executor.exec.
      - Empty output -> "(no diff): <label>" status, no context append.
      - Non-zero exit -> "diff failed (exit N): <label>" status,
        no context append. git's stderr already streamed to the
        user via executor.exec's live multiplex, so the failure
        reason is visible.
      - Success -> appends "[diff <label>]\n<output>" via
        ctx:append_exec_output. Label is "(working tree)" for empty
        args, else verbatim args.
      - Status confirms injection size: "diff injected: <label> (N bytes)".

  - HELP gains :diff line with three example arg shapes; N3-resolved
    (no `staged` alias — the meta is thin pass-through to git's grammar).

Smoke verified across four scenarios in an ephemeral test repo:
  - Working-tree dirty -> 110-byte diff injected, no ANSI escapes
  - --cached -> 118-byte staged diff injected, clean
  - garbage..nonexistent -> exit 128, status + skip
  - Clean working tree -> "(no diff)", status + skip

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:17:18 +00:00
marfrit d1dce832da repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2)
First user-visible Phase 6 verb. Builds on commit #1's compose_project
plumbing — sets ctx.project from either the :tree meta or the
cfg.project.auto_tree startup hook.

Changes:
  - _scan_project_tree(dir, opts) helper near _run_hook:
      git -C <dir> ls-files --cached --others --exclude-standard
      when <dir> is inside a git repo (N4: no subshell);
      find <dir> -mindepth 1 -maxdepth <depth+1> -type f
      -not -path '*/.*' otherwise.
    Returns (body, info={file_count, truncated, in_git}). Sorted
    paths, truncated to max_chars (default 4096 per cfg).

  - :tree [<depth>|refresh|off] meta:
      no arg     -> scan with config defaults; resets _project_opts
      <N>        -> scan with depth=N; caches as _project_opts
      refresh    -> re-scan with cached _project_opts (else defaults)
      off        -> clear ctx.project AND ctx._project_opts (R5)
    Status line reports file count + truncation flag + which backend
    fired (git/find).

  - cfg.project.auto_tree startup hook before the main loop:
      if true, scan libc.getcwd() once and set ctx.project.
      Failures status-logged once; REPL continues. Default off
      (existing configs unchanged).

  - HELP updated with three :tree lines.

Plan §12 deliberately defers the config.lua example block to commit #6
along with the status header bump (R9 single-owner).

Smoke (aish repo cwd):
  - :tree no-arg            -> "33 files (git ls-files)"
  - :tree refresh           -> same
  - :tree off               -> "project tree cleared"
  - :tree 1                 -> rescans
  - cfg.project.auto_tree=true at startup -> auto-injected status visible

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:14:36 +00:00
marfrit c4fc7fde01 context: [project] block plumbing (Phase 6 commit #1)
Foundation for Phase 6 — adds the field + composer + composition
order with no callers yet. Nothing sets ctx.project; the meta hookup
and startup auto-inject land in commit #2.

Changes:
  - Context.new gains `project` (string, nil) and `_project_opts`
    (cached scan opts for `:tree refresh`; R7).
  - compose_project(text) helper mirrors compose_background /
    compose_summary. Returns "" for nil/empty; otherwise emits
    "\n\n[project]\n" + text.
  - to_messages inserts compose_project BETWEEN compose_background
    and compose_summary so the model reads memory facts -> project
    tree -> earlier conversation -> NORRIS suffix.
  - Same Norris-suppression guard as the other two dynamic blocks
    (R-C1 / R-C4 parity; planner stays on goal anchor).
  - Context:reset preserves ctx.project (R8 — matches the Phase 4
    memory_items rule; startup-injected facts survive a user-driven
    context reset).

Smoke verified (14/14 inline cases):
  - project nil -> no [project] block in sys_content
  - project set -> block present with contents
  - ordering: [background] < [project] < [earlier conversation summary]
  - norris_active suppresses all three; NORRIS suffix still appears
  - :reset clears turns/pending_exec_output/summary; preserves
    memory_items AND project

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:08:54 +00:00
marfrit 261b230be8 docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs
Independent agent review of PHASE6 (manifest + baseline + plan at
4407029). Status header: Plan -> Plan + review fold-in.

BLOCKERs (RESOLVED in-place):

R1. §4 fence detector's `outside`-state dropped the leading `'``'`
    chunk of a split fence — contradicted B2's local-model
    split-fence requirement (4-char median chunk size). Algorithm
    rewritten: outside-state now holds a tail (up to 10 chars) when
    the chunk's suffix could be a fence prefix; flushes on next push.
    Same accumulator pattern as the secrets streaming rehydrator.

R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives
    in repl.lua (where _shq and executor are accessible);
    renderer.lua exposes set_highlight(enabled, detected, highlight_fn)
    and calls back. Keeps renderer.lua free of the executor require.

CONCERNs (FOLDED):

R3. PTY raw-mode toggle on every code-block render — smoke-test for
    cursor flicker / SIGWINCH races before locking in. Risk row 5.
R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream
    CLI canonically takes a path with extension. Implement-time
    check required; fallback path documented (extension-based
    tmpfile + path arg). Added to risk row 5 + open-at-plan.
R5. :tree off semantics clarified — one-shot clear of ctx.project
    + ctx._project_opts; no "disabled" flag.
R6. cwd-coupling difference between :diff (call-time) and :tree
    (scan-time) now documented in §5.
R7. :tree refresh opts caching specified — caches ctx._project_opts;
    `:tree refresh` reuses last explicit opts.
R8. :reset preserves ctx.project (parity with memory_items per
    Phase 4). §12 commit 1 smoke updated.
R9. Status-bump duplication between §12 commits 5e and 6 resolved
    — commit 6 owns the bump.

NITs (APPLIED):

N1. §4 algorithm pseudocode now includes SOL/post-newline anchor
    (mid-line backticks in prose don't open a fence).
N2. _detect_treesitter() gained a comment explaining the popen
    pattern doesn't gate on exit code (B3).
N3. :diff staged shorthand dropped — meta is a thin pass-through
    to git's own grammar.
N4. _scan_project_tree switched from `cd && git ...` to
    `git -C <dir> ...` — no subshell, more idiomatic.
N5. Open-at-plan dir-arg bullet dropped (already decided in §6);
    replaced with R3 + R4 implement-time verification items.
N6. §11 wording on #52 left as-is (cosmetic only).

PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for
implementation phase 6 of the inner loop pending user gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:06:19 +00:00
marfrit 4407029296 docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap
Status header: Analyze -> Plan.

Baseline findings folded into the design sections:

  §1 (highlighter pillar) gains B4: tree-sitter absent on every
  probed host; :highlight on emits install-hint when missing.

  §4 (highlighter sketch) revised per B3: io.popen():close() doesn't
  expose exit codes in LuaJIT. Route via executor.exec("cat tmp |
  tree-sitter ...") which uses pty.spawn+waitpid and returns code
  reliably. Tmpfile design retained (avoids ARGMAX + shell-escape).

  §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git
  invocation must use `--no-pager -c color.ui=never` to suppress
  the color/keypad/line-clear escapes forkpty triggers. Factored
  recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff
  and the @-mention diff retry.

New §12 Implementation Plan — 6 commits, bottom-up:

  1. context.lua: ctx.project + compose_project + composition order
  2. repl.lua: _scan_project_tree helper + :tree meta
  3. repl.lua: :diff meta + _git_clean_cmd helper (B1)
  4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6)
  5. renderer.lua + repl.lua: tree-sitter detect + fence filter +
     :highlight meta (B3-revised tmpfile dispatch)
  6. config.lua project example + status -> Implement

Per-commit risk index + smoke criteria. Highlighter (commit 5) is
the largest experimental surface — placed last so the rest of Phase 6
ships even if highlighter slips. Order is independent enough that
swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each
commit individually green.

Things deliberately not split: _shq reuse, lang map duplication for
v1, streaming-rehydration order (rehydrate -> highlight -> emit
inherits naturally from existing chunk pipeline).

Two items open at plan time, resolve at implement: _scan_project_tree
dir-arg vs hardcoded getcwd; :highlight status probing
tree-sitter --print-langs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:01:40 +00:00
marfrit 9f50206ca6 docs/PHASE6-baseline: substrate probes ahead of implementation
Six findings from probing the world before tree-sitter / diff / project
tree implementation lands:

B1. `git` subcommands through executor.exec emit ANSI color + DEC
    keypad/line-clear escapes by default (forkpty enables interactive
    mode). `:diff` impl MUST use `git --no-pager --color=never <args>`.
    Same flags apply to any future git verbs.

B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks
    (median 4 chars, max 13) AND splits code fences across boundaries
    (`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers
    big chunks (median 26 chars), fences intact. The §4 fence-aware
    filter accumulator design covers both — confirmed necessary by
    local-model behavior.

B3. **LuaJIT io.popen():close() does NOT return exit codes** — Lua
    5.1 contract, not 5.2+. Breaks the A4 highlighter resolution.
    Revised: route via `executor.exec("cat tmp | tree-sitter ...")`
    which uses pty.spawn + waitpid and returns (out, code) reliably.

B4. tree-sitter CLI absent on both probed hosts (noether, higgs).
    Highlighter is opt-in by design; absent-CLI path should emit a
    clear install hint, not silently no-op.

B5. Project-tree envelope: aish 32 files / 449 chars; similar local
    repos 15-25 files; scan time ~1-5ms. The 4096-char default cap
    accommodates ~290 typical paths. Large repos handled via
    tree_depth or cap tuning per existing §9 risk row.

B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for
    the B3-revised tmpfile-roundtrip pattern.

No structural changes to formulate/analyze. B1, B3, B4 will fold into
PHASE6.md §4 / §5 / §1 during plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:58:56 +00:00
marfrit ad52fe4538 docs/PHASE6: analyze — substrate probes + Q resolutions in-place
Analyze pass against tree at f596743. All 6 formulate-time questions
resolved without structural changes; pillar shapes intact.

A1. renderer.lua surface clean — assistant_delta/flush accumulate via
    stream_buf; fence-aware filter slots in between chunk receipt and
    emit without touching anything else.

A2. executor.exec via pty.spawn already handles git diff / find;
    cwd-aware (inherits from libc.chdir). No new IO model.

A3. context composition order locked: base + [background] + [earlier
    summary] + NORRIS. [project] inserts between [background] and
    [earlier summary]; Norris-suppression guard inherited.

A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3
    (io.popen("w") + redirect stdout to tmp file; io.open reads back).
    Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per
    code block.

A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only;
    matches :history / :sessions / :safety check.

A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first,
    then ref-range retry when path fails AND token contains "..".
    @origin/main..feature works naturally; @../sibling.txt unaffected.

A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1.
    @-mention echo via readline is a different code path; deferred
    to v2 (added to §8 out-of-scope).

A8. Q-T1 RESOLVED: project tree captured at scan time, not auto-
    refreshed on cd. v1 verb is :tree refresh; cd-intercept auto-
    refresh deferred to v2.

A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard`
    in repos; find fallback outside. Custom globs deferred to v2.

A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD,
     peels comma cleanly and the diff retry catches the cleaned token.

A11. Auto-injection ordering: memory load → tree scan → first ask_ai.
     Composition reads memory facts before file tree.

A12. [project] Norris-suppressed (parity with R-C1/R-C4).

§3 module-changes table: context.lua row updated (project string +
compose_project + ordering note + Norris suppression). §4 highlighter
code sample replaced with the tmpfile-roundtrip resolved form. §5
@-mention section rewritten as tiered-resolution with worked examples.
§8 out-of-scope gained three v2-polish items (echo highlight, cd-
intercept auto-refresh, custom globs) so they're tracked. §10 Open
Questions table now shows all 6 Qs with their resolutions inline.
§9 Risks row for @-mention collision updated to point at A6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:53:58 +00:00
marfrit f596743834 docs/PHASE6: formulate — tree-sitter highlight + diff + project tree
Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6:

  1. Tree-sitter syntax highlighting hooks
     External `tree-sitter` CLI when present, no-op otherwise.
     Honors PHASE0 §3 (no compiled extensions). Toggleable
     at runtime; off by default so existing UX is unchanged.

  2. Diff-aware code injection
     :diff [args] meta + @<ref1>..<ref2> @-mention extension.
     Shells out to `git diff`; output flows through the existing
     exec-output context channel.

  3. Project-level file-tree context
     :tree meta + optional cfg.project.auto_tree startup inject.
     git ls-files in a repo, find fallback otherwise. Composed
     into the system prompt as a new [project] block between
     [background] and [earlier summary]. Suppressed under Norris
     (R-C1 / R-C4 parity).

Module changes: renderer.lua (fence-aware highlight filter), context.lua
(compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions
extension). No new module files in v1.

Doc covers: scope + done-when criteria, tech decisions table, module
changes table, per-pillar deep dive with example code, UX surface
summary, out-of-scope list, risks, and 6 open questions to resolve
in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree).

Scope confirmed via AskUserQuestion: all three subsurfaces in scope;
tree-sitter approach is external CLI w/ no-op fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:47:00 +00:00
marfrit d852acadc2 repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args
Plumbs the secrets.lua module (commit e4b818b) into the conversation
pipeline. Hook points:

  ask_ai          — scrub_messages(ctx:to_messages(), mode) before
                    call_broker; rehydrate streamed deltas via
                    streaming_rehydrator so the user sees real values
                    while text_parts accumulates rehydrated chunks
                    (final_resp is plain — CMD: / DELEGATE: extractors
                    see plain values)

  MCP dispatch    — dispatch_tool_call rehydrates the args table before
                    sess:call_tool so the trusted MCP server receives
                    real values (the model emitted placeholders because
                    it saw a scrubbed context)

  DELEGATE: & :delegate
                  — scrub sub_msgs before broker.chat; rehydrate sub_text
                    before appending to context, so future turns see
                    real values restored

  Phase 5 summarize-on-evict
                  — scrub sum_msgs before broker.chat; rehydrate the
                    reply that becomes ctx.summary

  :memory summarize
                  — same scrub + rehydrate pair

Mode resolution per call: model_cfg.redact → config.secrets.default →
"vault+autodetect" if vault loaded, else "off".

ctx storage convention: PLAIN values throughout. The scrub happens at
the egress (broker call) per the active redact mode; ctx.turns never
holds placeholders for content the user typed or executor produced.
The model's own emissions (assistant tool_call arguments) may carry
placeholders because the model saw the scrubbed context — rehydrated
at MCP dispatch and otherwise harmless on re-serialization (idempotent
re-scrubbing).

New meta:
  :secrets [status]         vault entries, placeholders allocated this
                            session, active broker mode. Never prints
                            actual values (vault file is itself a
                            secret per gotcha 7).
  :secrets check <text>     dry-run scrub against the active broker's
                            mode — shows the output transformation.

Documented in config.lua with a commented-out block + per-broker
redact field example.

Deferred to a follow-up issue (clearly scoped):
  - safety.lua broker call sites (Norris main loop, is_destructive
    LLM second-opinion probe) — same wiring pattern, but they don't
    currently see secrets_session; needs threading through helpers.
  - @-mention file content is appended PLAIN to ctx and scrubbed at
    egress alongside the rest of the user turn (covered by the
    ask_ai scrub).
  - exec output streamed live to terminal is pre-scrub (user sees
    real values in their own shell — by design); the captured-for-
    context copy is scrubbed at egress alongside the rest.

This is the "full scope" implementation chosen via AskUserQuestion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:38:23 +00:00
marfrit e4b818b0e9 secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1)
Standalone module — no wiring yet. Lands the substrate for issue #13:

  secrets.load(path)            — vault file loader; refuses non-0600
  secrets.make_session(vault)   — per-conversation scrub/rehydrate state
  session:scrub(text, mode)     — substitute literals (+ autodetect)
  session:rehydrate(text)       — restore placeholders
  secrets.streaming_rehydrator  — chunk-boundary-tolerant streaming wrapper

Mode semantics (chosen per call by the caller):
  "off"               — identity, no mapping
  "vault"             — vault literals only, placeholders, rehydratable
  "vault+autodetect"  — + heuristic regexes, placeholders, rehydratable
  "stealth"           — + heuristic regexes, opaque decoys, one-way

Placeholders are stable across the session: the same literal always
maps to the same $AISH_SECRET_NNN slot, so re-scrubbing the same
context is idempotent and the model sees a consistent vocabulary.

AUTODETECT_PATTERNS (ordered; longer prefixes first):
  sk-or-v<N>-...  OpenRouter
  ghp_/gho_/ghs_  GitHub PATs
  AKIA<16>        AWS access keys
  eyJ...x.y.z     JWTs
  sk-...          OpenAI (generic; matched after openrouter)
  -----BEGIN ... PRIVATE KEY-----  SSH/GPG key headers

Streaming rehydrator: tolerates a placeholder split across SSE chunks
($AISH_SE then CRET_001). It holds back the trailing partial-match
in a buffer, emits the rest, and resolves on the next push or flush.
Verified with 20 unit cases (vault sub, stable mapping, autodetect
across all label kinds, stealth decoys, mode=off, streaming with
mid-placeholder splits, non-placeholder $-prose pass-through).

Vault file mode enforcement: 0600 only — matches ssh's behavior for
~/.ssh/id_rsa. Loud failure (status + skip) if mode is wider.

Next commit (issue #13 follow-up): wire into broker / tool dispatch
/ display, add per-broker `redact` policy, :secrets meta, config
example block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:36:39 +00:00
marfrit cdf4e86679 repl: sub-broker delegation via DELEGATE: marker (closes #6)
Cost and context-window control: a "heavy" preset's model can offload
work to a cheaper preset without spending its own tokens on the result.
Example: deep model is mid-conversation and asks fast to summarize a
20k-line build log; the summary comes back as exec-output for the
next turn, deep stays small.

Marker syntax: DELEGATE: <preset> "<prompt>"

(Single or double quotes; one DELEGATE per line; lines without the
quoted shape are dropped — let the user write about delegation in
prose without accidental dispatch.)

Dispatch flow (mirrors CMD: / CMD&: extraction):
  1. ask_ai's stream completes
  2. extract_delegate_lines walks the final response
  3. For each {preset, prompt}: broker.chat(config.models[preset], ...)
     synchronously; result is appended via ctx:append_exec_output as
     "[delegate <preset>]: <result>"
  4. The model sees the delegate result on its next turn

Implementation choice — marker over tool: option 1 from the issue
("inline delegate marker") works with any model regardless of
tool_calls support. Option 2 (aish_delegate as a tool dispatched in
the existing Phase 2 sub-loop) is the better UX for capable models
since it returns the result mid-turn — filed as follow-up if needed.

Meta surface:
  :delegate <preset> <prompt>   one-shot direct invocation (useful for
                                testing without depending on the model
                                emitting DELEGATE:, and as a manual
                                "ask <preset> something" verb)

Scope:
  - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch
  - Norris: not extended; the planner's model anchor would conflict with
    mid-plan switching (R-C3-adjacent risk)
  - No self-delegation guard: each DELEGATE is a separate broker call,
    not recursive; a delegate result reaching the next turn could
    contain another DELEGATE but that's bounded by max_tool_depth-style
    iteration cap on the parent
  - No cost prompt: configuring a paid cloud preset already implies
    consent to spend on it
  - Unknown preset → error status + exec-output note "[delegate X failed:
    unknown preset]"

Extractor unit-tested with 8 cases (single-quote, double-quote, multi-
line prose, empty prompt, no-quotes, case-sensitive, wrong prefix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:29:09 +00:00
marfrit f94d16fc89 repl: background CMD&: with handle/poll (closes #8)
Builds, long-running network calls, and file watches no longer block
the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL
to spawn the command in the background, return immediately, and poll
for completion between user inputs.

Process model: shell-wrapped to avoid needing fork()/execv() FFI.

  nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null
       >/dev/null 2>&1 & echo $!

The child is reparented to init; we hold only the PID and the path to
the .status sidecar. Completion is detected by the .status file
existing (the wrapper writes it as its last act). No waitpid needed —
the child isn't ours after the popen subshell exits.

Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is
created lazily at startup (mkdir -p). Requires history.dir to be
configured; without it CMD&: emits an error status and the model
sees an "[bg failed to start]" exec-output note.

check_bg_done() runs at the top of each main-loop iteration alongside
check_every_due(). When a job is detected as exited, the REPL:
  - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>"
  - appends the same string to ctx as exec output, so the model sees
    the completion on its next turn (natural follow-up: "ok the build
    finished; let me check the log")

Meta surface:
  :bg-spawn <cmd>       start a bg job directly (no AI needed; also
                        useful for testing without depending on the
                        model emitting CMD&:)
  :bg-list              show running/done jobs (id, pid, state, runtime, cmd)
  :bg-output <id>       dump the log file to stdout
  :bg-kill <id>         SIGTERM (note: only delivers if the PID is
                        still the actual command — long-lived shells
                        may need pkill by name)

Scope (deliberately limited for v1):
  - No callback-mode readline: bg completion detection is pre-prompt,
    not mid-readline. If a build finishes while the user is typing,
    notification comes when they hit Enter.
  - Permission policy DSL (#9) does NOT apply to CMD&: — the
    asynchronous gating model wasn't designed for the y/N flow.
    Filed as follow-up if needed.
  - Norris not extended: helpers.exec_cmd is still synchronous; the
    planner doesn't dispatch bg jobs.
  - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>"
    and a "[plan] would bg-run: <cmd>" exec-output note, no spawn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:25:55 +00:00
marfrit 67d80e1047 repl: :every recurring prompts via pre-prompt due-check (closes #11)
In-session timer that re-injects a prompt every N seconds. "Watch this
thing" workflows (`:every 5m "check journalctl -u nginx for errors"`)
without spawning a separate aish process.

Approach: minimum viable. check_every_due() runs at the top of each
main-loop iteration — timers fire BETWEEN user inputs, not during
readline waits or active broker calls. Mid-stream firing would require
rewriting ffi/readline to callback mode (substantial scope). If the
on-the-fly firing requirement matters in practice it can land as a
follow-up issue against the readline FFI.

Meta:
  :every <interval> <prompt>   schedule (interval: 30s | 5m | 2h | bare int)
  :every list                  show jobs (id, interval, time-until-next, model, prompt)
  :every cancel <id>           remove

Defaults:
  - Model: "fast" preset if defined in config.models, else active model
    (per the issue's "recurring prompts should default to fast preset").
  - In-memory only — jobs don't persist across restarts.
  - Suppressed while ctx.norris_active (planner stays on goal anchor).
  - Quotes around the prompt are stripped if present.
  - Each tick fires the job once, re-schedules next_fire = now + interval
    (no catch-up if the interval elapsed multiple times during a long
    user input).

Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed),
load via require, end-to-end :every list / cancel surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:23:07 +00:00
marfrit 17e62c0326 safety: permission policy DSL — allow/confirm/deny rule lists (closes #9)
The confirm_cmd boolean was too coarse: true interrupts every harmless
ls; false ungates everything. Most workflows want trust for read-only
ops while still gating writes/network/sudo.

New config:

    permissions = {
        allow   = { "^ls%s", "^cat%s", "^git status" },
        confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" },
        deny    = { "^ssh%s+root@", "^curl%s+http[^s]" },
    }

Verdict order: deny > confirm > allow. First match in the chosen
category wins. Unmatched defaults to "confirm". Patterns are Lua
patterns (not regex) per PHASE0.md §3 — no compiled extensions.

Verdict behavior in the interactive CMD: loop:
  - allow   → run without prompt
  - deny    → status line, skip
  - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true)

Backward compat:
  - permissions unset + confirm_cmd=true  → always confirm
  - permissions unset + confirm_cmd=false → always allow
  - permissions set                        → policy table is authoritative

Scope deliberately limited to the interactive AI-suggested CMD: gate.
Norris autonomous mode keeps its own safety.is_destructive machinery
(combining the two would double-gate or replace the LLM probe — both
non-obvious behavioral changes that belong in their own issues).
User-typed shell-routed lines (`router.classify → "shell"`) and
:exec also bypass the policy by design — those are direct user intent.

New introspection:
  :perms list           — show the configured rule lists
  :perms check <cmd>    — report verdict + matching rule (debug)

safety.classify_command is exported and unit-tested with 12 cases
covering each category, priority order (deny > allow on overlap),
and both fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:20:56 +00:00
marfrit 518c01a9f5 repl: user-defined skills loader (closes #2)
PHASE0.md §5.2 froze the meta-command set at compile time. Skills let
the user package repeatable workflows (project queries, prompt
templates, audit routines) without forking aish.

Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever
$AISH_SKILLS_DIR points at — used both by users with non-XDG layouts
and by CI). Each module exports:

    return {
        name        = "<meta-cmd-name>",     -- must match [%w_-]+
        description = "<one-line>",          -- shown by :skills
        run         = function(args, h) ... end,
    }

Helpers passed to run():
    h.ask(text)   — same path as :ask (with @path expansion)
    h.status(s)   — emit "[aish] s"
    h.exec(cmd)   — run a shell command (subject to plan_mode, hooks)
    h.model()     — current active model name
    h.ctx         — raw Context object (advanced)
    h.config      — the loaded config table

Validation rejects modules that miss name/run, use whitespace in the
name, or collide with an existing meta command (built-in or earlier
skill). Each rejection emits a status line so the user sees why a
skill didn't appear.

New meta command :skills lists what's loaded (sorted, with description).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:17:00 +00:00