e4780483ad8d9fe7cb41f676f689f1cee496e59e
141 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e4780483ad |
executor: extract_task_lines for Phase 10 preplan parsing
Pure function parallel to extract_cmd_lines, but more permissive to accommodate cloud-model output variation: tolerates leading whitespace (cloud often indents), tolerates extra whitespace after the colon, strips trailing whitespace. Strict on the literal "TASK:" prefix. Returns an array of trimmed strings; empty TASKs and non-TASK lines dropped silently. Callers cap the list size per cfg.norris.tasks_max. 10 inline unit cases verified: empty/nil, single TASK, mixed CMD+TASK (only TASKs returned), leading whitespace tolerated, empty-body TASKs dropped, trailing whitespace stripped, extra-spaces-after-colon AND no-space-after-colon both tolerated, prose interleaving (3 TASKs extracted from a realistic cloud response with intro+outro prose), TASK content with embedded quotes/punctuation preserved. Nothing in the tree calls this yet (Phase 10 C1 is the foundation commit; C4 lights it up). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cbef05ff40 |
phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits
All 8 actionable findings accepted; R9-R11 were confirmations.
Blockers:
- R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
numbering" → "16o prose" etc. Switch to %d + string.format.
- R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
contradicted §11's "don't change the template; append helper
output after". §5 now shows the helper-append approach.
Important:
- R3: preplan bypasses call_broker (no fallback retry) — keep that
by design; retry would silently swap planning models. Documented
in §10 Risks so it doesn't get "fixed" later.
- R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
can leak across launches if a Norris step crashes. Fix: clear all
three at the TOP of run_norris before preplan. Cheaper than full
pcall wrap; handles the stale-tasks vector.
- R5: clarified C3 commit scope — safety.lua ONLY in C3; the
executor cfg resolution + preplan wiring lands in C4.
- R6: Context:reset() now also clears self.norris_tasks (defensive;
:reset is unreachable mid-Norris but one line is cheap).
Nits:
- R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
configured per-model timeout).
- R8: "Status:" → "Terminal output:" in §1 acceptance criterion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cb2f948e76 |
phase10: analyze + plan — answer Q-PP1..6, 5-commit roadmap
Analysis resolves 6 OQs from the formulate: - executor cfg independent of preplanner cfg (Q-PP1) - preplan non-streaming for v1 (Q-PP2) - re-launch fires preplan again, naturally (Q-PP3) - executor sees goal + current task (Q-PP4) - :norris introspection out-of-scope v1 (Q-PP5) - 1-task degenerate case runs as normal (Q-PP6) Code-reading findings: safety.norris_step signature unchanged (executor cfg flows in as model_cfg param); NORRIS_SUFFIX_TEMPLATE stays stable (task hint appends after); renderer.norris_step already accepts descr (just unused by safety.norris_step today). Plan: 5 commits — executor / context / safety / repl / config-and- memory. Each commit verifiable in isolation; the orchestration lights up at C4 (repl preplan wiring); C5 documents. Sonnet review next (per ~/.claude/projects/.../memory rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a7cbe22d1d |
phase10: formulate manifest — cloud preplanner / local executor split
Resolves direction for #89. Splits Norris into two roles: - Preplanner (cloud) fires ONCE at :norris launch; emits TASK: list. - Executor (local) handles each TASK; existing HALT protocol intact. ctx.norris_tasks anchor survives eviction (mirrors ctx.norris_goal). Cost category 'norris-preplan' separates the cloud preplan call from per-step executor cost in :cost detail. Graceful fall-back when cfg.norris.preplanner is unset OR preplan call fails — Norris runs as today (single-model). No regression for existing users. PHASE0 §11 amended to add Phase 10 row. Manifest declares 6 Open Questions for analyze step; 12 design decisions table; module-touch table; 4-pillar plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c55077bc07 |
context + repl + config: route-aware context compression (closes #87)
Small local models effectively use a fraction of their advertised
context window. Per-request compression for routes that hit a
local-compress-flagged model preset: keeps only the last N turns
and tail-truncates oversized content. Cloud routes get the full
context unchanged.
Changes:
- context.lua _compress_turns(turns, keep, max_chars): returns a
new list (self.turns NEVER mutated) with the last `keep` turns
preserved + content tail-truncated to `max_chars`. Defensive:
drops tool turns at the slice head (orphaned without their
assistant-with-tool_calls anchor — strict chat templates would
reject them; same gotcha PHASE0 §6 warned about for user/user).
- Context:to_messages(opts) — opts.compress = { keep_turns,
max_turn_chars } swaps the turn iterable for the compressed
view. Affects BOTH the use_tool_role=true path and the
use_tool_role=false fallback (PHASE2.md Q18 strict-template
workaround). Persistence + display via :history see the full
uncompressed ctx.turns.
- repl.lua ask_ai: when req_cfg (the routed model's cfg) has
`local_compress = true`, build compress_opts from
config.context.compress (defaults keep_turns=2, max_turn_chars=800).
Pass through ctx:to_messages alongside the existing
system_prompt_override (#86) — orthogonal opts that compose.
- Norris unaffected: safety.norris_step builds its own messages
array; the planner needs full history per PHASE3 design.
- config.lua gains a header comment explaining the per-model opt-in
+ the context.compress defaults block + the documented tool-turn
truncation trade-off.
13 unit cases verified:
- no opts -> full turn list (no regression)
- keep_turns=2 -> exactly last 2 emitted
- long content tail-truncated to max_chars
- self.turns unchanged after render
- orphan tool-turn at slice head dropped (no chat-template violation)
- tool turn included WITH its assistant anchor when keep_turns >= 3
E2E against live local broker:
- models.fast.local_compress = true; keep_turns=1; max=200
- 4-turn session: each broker call sees ONLY the current turn
(verified by short coherent CMD replies despite no cross-turn
memory available to the model). FR-promised small-model
friendliness in action; conversation continuity is the
documented trade-off.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
74e4bffb37 |
broker + repl + safety: GBNF grammar-sampling passthrough (closes #88)
llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
047d629a66 |
context + repl + config: per-class system_prompt override (closes #86)
Small local models follow precise structured instructions better than
natural language. Per-routing-class system_prompt override gives them
tighter instructions for THAT request while preserving ambient context.
Changes:
- Context:to_messages(opts) — opts.system_prompt_override REPLACES
the base system_prompt for THIS render only (state unchanged).
Dynamic blocks ([background], [project], [earlier summary], NORRIS
suffix) still compose on top. opts is optional; nil-safe for old
callers.
- repl.lua ask_ai — captures req_class from router.classify_model
(already returned by Phase 5; previously discarded after the
status line). Looks up config.routing.system_prompts[req_class];
passes as opts.system_prompt_override to ctx:to_messages each
iteration of the tool-sub-loop.
- Gating: override fires only when routing.auto is on (no class ->
no override). If system_prompts[class] absent for a class, fall
through to the default system_prompt (no surprise).
- Norris unaffected: safety.norris_step builds its own messages
array; doesn't go through this path.
- config.lua gains a commented-out example showing routing.system_
prompts with the code/default examples from the FR body.
Smoke verified:
- 12-case context.lua unit test: opts nil/absent/present, override
replaces base, dynamic blocks still compose, state unchanged
after call, Norris-mode coexistence (suffix still present;
background still suppressed).
- E2E against cloud broker with routing.system_prompts.code set:
triple-backtick prompt -> code class -> override fires; model
emits terse code-only output. Non-code prompt -> default class
-> no override -> normal verbose-ish reply.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
df59ee2f2c |
config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4)
config.lua header gains a Phase 9 paragraph documenting the
project-overlay feature + the R7 shallow-merge warning ("if your
.aish.lua sets a top-level block, it REPLACES the user's entire
block — list every entry OR omit the block"). Inspect at runtime
via `:config show`.
docs/PHASE9.md status header bumped: "Plan + review fold-in" ->
"Implement". Lists the 4 implement commits inline:
|
||
|
|
5b6ee553db |
repl: :config show meta + HELP (Phase 9 commit #3)
User-facing diagnostic for the project-overlay layer. Reads config._sources (R3 cfg-embedded by main.lua's load_config_with_ overlay in commit #2) + the effective config; surfaces which file contributed each top-level key. :config show top-level keys + which source set each (nested tables collapsed to inner-key list) :config show full recursive dump with sensitive-key masking Masking heuristic (any key containing token/secret/auth/key, case-insensitive) -> "(set)" instead of the value. R6: applied RECURSIVELY in full mode so the actual leak vector (mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught. Defensive depth cap (5) prevents pathological recursion. When config._sources is absent (caller didn't go through load_config_with_overlay), status: "(unknown — main didn't pass _sources)" — meta still runs, just labels source as "?". N2 known cosmetic false-positive: `key_env` / `auth_env` config fields hold env-var NAMES (not secrets) but match the heuristic. Future polish exempts `*_env` patterns. Same for `token_budget` (contains "token") — also masked despite being a plain number. Acceptable; errs toward over-masking. HELP gains 1 :config line. E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: A. No project overlay: 6 user keys; nested tables collapsed. `secrets` masked as (set) at top level. B. Project overlay accepted: source map cleanly partitioned (user has 4 keys; project has 2 — default_model + models); each top-level row tagged [user] or [project]. C. :config show full: nested dump; auth_token in models.cloud correctly masked as (set); SECRET_VAL never appears in output (grep count = 0). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #4 next: config.lua template comment + PHASE9.md status header -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
34b465d6dc |
main: project-overlay loader (Phase 9 commit #2)
Wires the project-overlay step around the existing load_config.
Activates only when a trusted .aish.lua is found in/above cwd.
Changes:
- _find_project_config() walks libc.getcwd() up to $HOME, returning
first .aish.lua found. R1 fix folded: proper-prefix check (`dir ==
home OR dir starts with home .. "/"`) avoids the false positive
where /home/user2 matches HOME=/home/user via byte prefix.
- _trust_file_path() resolves via $AISH_TRUST_FILE env override,
else ~/.aish/trusted-projects. Plan-time decision per N3.
- _check_and_maybe_prompt(project_path, history) — calls
history._sha256_file ONCE; routes through history.is_trusted; on
miss prompts via rl.readline; on accept persists via
history.add_trusted. A8 mitigation: if rl.readline fails to load,
decline silently (no io.read fallback that would consume stdin).
- load_config_with_overlay(opts):
* Calls existing load_config; seeds sources={k="user", ...}
* Walks for .aish.lua; if found:
- In opts.prompt mode (-p, R2): skip the prompt entirely;
only PRE-TRUSTED overlays load. Avoids io consuming the
piped stdin that -p will read for context.
- Else: interactive trust check + prompt.
* On accept + successful dofile: shallow-merge top-level keys
ONTO user config; update sources[k]="project" for overlapping.
* R3: embeds sources on cfg._sources for repl.lua's :config
show meta to read. No global.
* Returns (cfg, user_path, project_path | nil).
- main() now calls load_config_with_overlay; on project layer
active, emits the "[aish] project config: <path> (overlaid on
<user>)" status line per A4 (AFTER the user-config status).
E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME:
1. Decline -> overlay skipped; user config active.
2. Accept -> overlay loaded; project_model active; status line
"[aish] project config: ... (overlaid on ...)" visible.
3. Re-startup -> NO prompt (cached via sha); overlay loaded
transparently. R4 single-sha-call confirmed.
4. -p mode with untrusted overlay -> skipped silently; piped
stdin preserved for run_one_shot.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Commit #3 lands :config show + HELP next; commit #4 the config
template comment + status -> Implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e525063df3 |
history: trust file helpers for Phase 9 (commit #1)
Foundation for the project-overlay trust mechanism. No callers yet — commit #2 wires main.lua to use these. Three new functions: history._sha256_file(path) -> hex digest or nil Shells `sha256sum`; parses first whitespace-separated field; validates 64-hex-char length. nil on any failure (path missing, binary missing, file unreadable). Caller treats nil as "skip the trust path" — never crashes. history.is_trusted(trust_path, project_path, sha256) -> bool Reads trust_path as JSONL; returns true iff an entry exists matching BOTH project_path AND sha256. Missing / corrupt / unreadable trust file -> false (re-prompt). Per-line JSON decode means partial-write corruption affects at most one line. history.add_trusted(trust_path, project_path, sha256) -> bool mkdir -p parent; append JSONL line {path, sha256, ts (ISO)}; chmod 600 the trust file (best-effort; ignore failure). Single writer per call; append-only. 11 unit cases verified: - sha256 known value matches manual `sha256sum` - nil / missing-file -> nil (no crash) - is_trusted on missing trust file -> false - add_trusted + is_trusted roundtrip works - Different sha -> not trusted (content-binding) - Different path -> not trusted - Multi-entry trust file: each entry independently checked - chmod 600 verified via stat Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e796142a23 |
docs/PHASE9: review fold-in — 0 BLOCKERs + 7 CONCERNs + 5 NITs
Sonnet review of PHASE9 (formulate + analyze + baseline + plan at
|
||
|
|
31e5de5ad5 |
docs/PHASE9: analyze + baseline + plan (single bundled commit)
Bundled the three doc steps since the surface is small (4-commit
impl, no major redesigns from formulate).
Analyze findings (12, A1-A12):
A1-A2 — main.lua surface clean; no new FFI needed
A3 — Q-P2 RESOLVED via baseline: sha256sum (GNU coreutils)
A4 — Q-P1: trust prompt AFTER user-config status line
A5 — Q-P3: don't log walk-up by default; :config show on demand
A6 — Q-P5: :cfg show top-level by default; `full` for deep
A7 — Q-P6: project may set secrets.vault (covered by trust prompt)
A8 — Q-P4 DEFERRED: rl.readline early-startup smoke at impl time
A9 — walk-up perf <1ms even pessimistic
A10 — trust-file race: JSONL append-only handles concurrent writes
A11 — sandboxed dofile out of scope (trust prompt IS the gate)
A12 — bootstrap order is correct: user→project→secrets_session
Baseline:
B1 — sha256sum + openssl agree byte-for-byte on noether;
sha256sum chosen (universal + simpler parse).
§10 Open Qs table now shows resolutions inline (5/6 done; Q-P4
deferred to implement-time smoke).
§13 Implementation Plan added — 4 commits:
1. history.lua: trust file helpers (read/add/is_trusted + _sha256_file)
2. main.lua: walk-up + load_config_with_overlay + trust prompt
3. repl.lua: :config show meta + startup status line
4. config.lua header note + status -> Implement
Per-commit risk index covers sha256sum-missing case, JSONL partial
write, A8 rl.readline early-startup, symlink-loop walk-up,
:config show token leakage via conservative masking heuristic.
Open at plan-time (resolve at impl):
- A8 rl.readline behavior; fall back to io.read if broken
- $AISH_TRUST_FILE env override for CI isolation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4f5c3aeba9 |
docs/PHASE9: formulate — project-local config overlay (.aish.lua)
Phase 9 formulate manifest + PHASE0 §11 amendment (adds Phase 9 row)
+ PHASE0 §10 amendment (config resolution order now references Phase
9's overlay step). Substrate-touch lands same commit per CLAUDE.md §3.
Four pillars:
1. .aish.lua walk-up from cwd; stops at $HOME or filesystem root.
First found file becomes the project layer. Absence = no-op.
2. Shallow merge over user config: project top-level keys REPLACE
user keys. Predictable; deep merge surprises with array/table
semantics. Users compose full blocks explicitly.
3. Trust prompt + sha256-pinned persistence in ~/.aish/trusted-
projects (JSONL, mode 0600). First encounter prompts; subsequent
startups load only if recorded sha matches. Content change ->
re-prompt. Matches direnv-allow security posture.
4. :config show meta — lists each source path with the top-level
keys it contributed + sanitized effective config dump
(token-bearing fields masked).
Key design decisions documented:
- Trust mechanism is explicit (not default-trust-all-cwds) —
.aish.lua runs arbitrary Lua via dofile; hostile cloned-repo
case is a real concern.
- $HOME boundary on walk-up — don't search /tmp or /. Repos
outside $HOME get no project layer.
- Reload on cd: NO. Config resolved at startup only.
- sha256 via shelled `sha256sum` (POSIX-portable; avoid
vendoring a Lua impl).
§9 risk table covers: hostile repo (trust prompt), corrupted trust
file (best-effort skip), updated repo (sha mismatch re-prompts),
dofile errors (pcall-protected), walk-up safety ($HOME boundary).
6 open questions for analyze:
Q-P1 — trust prompt before/after startup status
Q-P2 — sha256sum vs openssl dgst (baseline)
Q-P3 — log walk-up path?
Q-P4 — rl.readline safe at startup?
Q-P5 — :config show full vs top-level
Q-P6 — project-set secrets.vault security
Scope confirmed via AskUserQuestion: project-local overlay (chosen
over cost preflight enforcement and cross-session cost persistence,
both deferred as Phase 10 candidates per §11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
08dba69fce |
config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5)
config.lua:
- Commented-out `tokenize = { use_endpoint = true }` block with
parity to the Phase 1-7 example blocks.
- Documents the two consequences: (1) per-turn network cost
(~30ms first time, cached after) and (2) token_budget is now
actually enforced — sessions that fit under char/4 may evict
earlier under accurate counts.
- Notes cloud /tokenize 404 fallback path.
docs/PHASE8.md:
- Status header bumped: "Plan + review fold-in" -> "Implement"
- Lists the 5 implement commits inline for traceability:
|
||
|
|
94b7d86926 |
repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4)
Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised
:cost detail trailing line.
Changes:
- When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is
set to `function(text) return broker.token_count(active_cfg, text) end`
before Context.new fires. R4: the closure body references
active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call
time, so subsequent :model switches re-route to the new model's
tokenizer automatically (verified by E2E: :model cloud after the
fast call still produces clean estimate row).
- :cost detail gains a trailing line per R3:
estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used)
N comes from ctx:estimate_tokens() (current in-memory snapshot,
NOT a comparison against the accumulator sum above which is
cumulative across calls + evicted turns). Gives at-a-glance
budget utilization.
E2E verified against live broker:
- fast model call -> 168 tokens estimated (real BPE via /tokenize)
- :model cloud + cloud call -> 178 tokens estimated (closure
follows :model switch correctly per R4)
- 21% / 22.3% budget utilization shown
- Accumulator sums and estimate are intentionally different
(sums are cumulative, estimate is current snapshot) — R3-
correctly displayed as separate lines
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
With this commit landed, Phase 8 is functionally complete; commit
#5 is config example + status bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
db26d0ccb7 |
context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)
Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8502517021 |
context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2)
Foundation for accurate Context:estimate_tokens. When the optional tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua), estimate_tokens uses it with per-turn caching for O(1) amortized cost. char/4 path unchanged when tokenize_fn nil. Changes: - Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn. - Context:estimate_tokens: if tokenize_fn nil -> existing char/4 (no behavior change). if tokenize_fn set -> - tokenize self.system_prompt every call (dynamic per compose_background/project/summary; can't cache). - for each turn: if t._tokens nil -> compute + cache; else use cached. Turn content immutable after append (we never mutate stored turns) so cache never goes stale. - :reset wipes self.turns which takes the _tokens cache with them; new turns start with t._tokens == nil and lazy-set on first count. 8/8 unit cases verified: - char/4 path unchanged when no tokenize_fn - tokenize_fn called 1+ N times on first estimate (sys + N turns) - subsequent estimates fire only 1 tokenize call (sys; turns cached) - new turn fires +1 tokenize call on next estimate - :reset + fresh turn fires fresh tokenize call (cache died with turn) No callers wire tokenize_fn yet — Phase 8 commit #4 lands the repl.lua wiring (after commit #3 adds the enforce_budget extension that's the real beneficiary of accuracy). Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7ef2a6ed5c |
broker: token_count + endpoint capability cache (Phase 8 commit #1)
Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize
where supported, char/4 fallback otherwise.
Changes:
- `M.token_count(model_cfg, text)`:
Empty text -> 0.
No endpoint -> char/4 immediately.
Capability cache says false -> char/4.
Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`,
2s timeout. On 200 + parseable `{tokens=[...]}`: cache true,
return #tokens. Anything else (non-200 / parse-fail / transport
err / timeout): cache false, char/4.
- `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1
confirmed /tokenize ignores the model field, so same-endpoint
presets share one cache entry. If a future broker honors the
model field, revisit.
- `M.tokenize_supported(model_cfg)`: returns nil/true/false for
the cached state (introspection for tests + future :tokenize meta).
- `M._reset_tokenize_cache()`: test hook so the session-local cache
doesn't leak between test runs sharing a LuaJIT VM.
Live verified against hossenfelder + a deliberately-broken endpoint:
- "hello world" -> 2 tokens (matches manual curl probe)
- 901-char text -> 201 real tokens vs 225 char/4 (24-token gap;
real is LOWER here, opposite direction from the README probe
where it was higher — confirms heuristic is inaccurate in both
directions)
- Pre-probe: tokenize_supported() returns nil
- Post-probe: tokenize_supported() returns true (local) / false (broken)
- Broken endpoint second call: still char/4, no re-probe
- Empty / nil text edge cases handled
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
467e573d24 |
docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs
Sonnet-reviewed per reviews-use-sonnet memory directive.
BLOCKERs (RESOLVED in-place):
R1. §5 estimate_tokens pseudocode missing per-turn cache pattern.
Prose described it; code block called tokenize_fn unconditionally.
Implementer following code verbatim would hit the O(N round-
trips per call) perf gap the prose flagged. Code block now
shows explicit `if t._tokens then ... else t._tokens = ... end`.
R2. enforce_budget loop can spin forever when system_prompt alone
exceeds token_budget (e.g. 5KB project block + budget=4096 +
zero turns -> turns can't shrink further but OR-condition stays
true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit
3 row shows the explicit Lua-syntax condition.
CONCERNs (FOLDED):
R3. :cost detail per-slot ~est=N annotation was semantically
undefined — accumulator sum (cumulative across calls + evicted
turns) vs current-snapshot estimate are incommensurable. §6
reworked: ONE trailing summary line "[estimated session ctx:
N tokens; token_budget=M (X% used)]" instead of per-slot
annotations. §13 commit 4 aligned.
R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT
capture by value). Subtle but easy to miss — §13 commit 4 now
spells out the correct vs wrong patterns explicitly.
R5. 2s tokenize timeout can spuriously cache-as-unsupported when
llama.cpp is busy with a concurrent completion (single-threaded
inference; /tokenize queues behind). Documented in §9; v1
ships 2s, revisit during verify if it bites.
R6. Per-endpoint cache key conflated two same-endpoint/different-
model presets (B1: /tokenize ignores the model field). Cache
key simplified to endpoint-only. One probe per endpoint per
session; if a future broker honors the model field, revisit.
NITs (APPLIED):
N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`.
N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1).
N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach
(trailing summary line; per-slot annotation dropped).
N4. Status header tree-hash updated to current (
|
||
|
|
aa64ad3eec |
docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).
5-commit roadmap, bottom-up:
1. broker.lua — M.token_count helper + per-endpoint capability
cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
per (endpoint, model) for the session. char/4 fallback on any
non-200 / parse-fail / transport err. M.tokenize_supported
introspection helper.
2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
tokens widens to use it when set, with per-turn `_tokens` cache.
char/4 path unchanged when tokenize_fn nil.
3. context.lua — enforce_budget consults token_budget too (pillar
5 from A1). Loop condition: turns>max_turns OR estimate_tokens
>token_budget. Existing summarize-on-evict callback unchanged.
4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
Closure captures active_cfg upval (A5 — follows :model switches
naturally). :cost detail extension: trailing line showing
estimated session ctx tokens for comparison with the per-slot
prompt_tokens sums in the accumulator.
5. config.lua commented `tokenize = { use_endpoint = true }`
example + PHASE8.md status -> Implement.
Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).
Two items open at plan, resolve at implement:
- exact :cost detail layout for estimated session ctx row
- whether to add a :tokenize debug meta (defer unless useful in verify)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
79bd40db79 |
docs/PHASE8-baseline: live /tokenize probes
Four findings, all align with formulate/analyze:
B1. /tokenize IGNORES the `model` request field — returns the
tokenization of whichever model is currently loaded on the
proxy backend, NOT the requested model. Acceptable: a real BPE
count is still much better than char/4, and the gap between
Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s
regardless, so cloud falls back to char/4 via the capability
cache.
B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars.
Network round-trip dominates. Per-turn _tokens cache amortizes
to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time
cost on first enforce_budget call. Acceptable.
B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs;
we use #response.tokens for count, discard the IDs). JSON not
SSE; ffi.curl.M.post is the right call.
B4. Cloud /tokenize 404s as expected. Capability cache marks it
unsupported on first probe; char/4 fallback silent thereafter.
No design change.
Q-T5 RESOLVED per B1. All open questions now resolved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1a136d81b7 |
docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)
Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).
MAJOR FINDING:
A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
with accurate tokenization, eviction decisions are unaffected.
The new estimate_tokens() would just feed the prompt template
display. Pillar 5 added: enforce_budget evicts when EITHER
max_turns OR token_budget is exceeded. This is the real
motivation for accurate tokenization.
Other findings:
A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3. Single caller of estimate_tokens today; enforce_budget becomes
the second (more frequent) caller — per-turn _tokens cache
becomes important.
A4. Q-T1: cache lives on turn dict; dies with turns on :reset.
A5. Q-T2: closure captures active_cfg upval; follows :model switch
naturally.
A6. Q-T3: opt-out skips the probe entirely (no wiring).
A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per
session; under-count bounded).
A8. _tokens cache invalidation: only :reset; turn content is
immutable after append.
A9. Probe latency ~50ms/call locally; per-turn cache amortizes to
O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
different endpoints, different code paths.
§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
00869ba412 |
docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).
Four pillars:
1. Per-endpoint /tokenize probe (cached). One round-trip on first
call per (endpoint, model); capability cached for session.
hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
tokenize — per real probe; the path is endpoint-local, not
under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
char/4 fallback.
2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
falls back to char/4 on miss. Always returns non-negative int;
never errors. 2s tight timeout; failures cache as not-supported.
3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
Context.new; uses it when present, char/4 otherwise. repl.lua
wires `tokenize_fn = function(text) return broker.token_count(
active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
Per-turn _tokens cache to amortize across estimate calls.
4. :cost detail est-vs-actual annotation. When the heuristic
disagrees with the actual prompt_tokens from broker usage by
>10%, show `~est=N`. Silent otherwise. Display-only; no
behavior change.
Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.
Baseline already observed during formulate:
- /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
- Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
- Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
(508 vs 558 on a 2KB README sample). Material for context-
budget eviction decisions.
Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).
Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1f34b6dce8 |
config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit |
||
|
|
0d6ff93134 |
repl: :cost meta surface (Phase 7 commit #5)
User-facing reporter of the per-session accumulator. Three shapes:
:cost one-line summary (calls / tokens / cost)
:cost detail per-model + per-category breakdown
:cost reset zero the meter; clears warn flags
All read-only against ctx.usage_totals; no broker calls.
R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile
cost==0 heuristic. Summary line classifies:
cloud only -> "cost=$X.XXXXXX"
cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens
but no cost field)"
local only -> "cost=$X.XXXXXX (local only; no cost field)"
R7 — :cost detail rows sort by (cost desc, model asc, category asc).
Three-level key for deterministic output across equal-cost rows
(table.sort is unstable; identical costs would otherwise reorder).
R10 — all dollar values use $%.6f formatting. Sub-cent precision is
critical: a Haiku call can cost $0.000028; $%.4f would round it to
$0.0000 — indistinguishable from local $0.
Column width widened to %-26s to fit fully-qualified cloud model
names (e.g. "anthropic/claude-haiku-4.5" = 25 chars).
E2E verified against live cloud + local broker:
:cost (empty session) -> "0 calls, $0.000000"
...after mixed-mode session...
:cost -> "5 calls, prompt=472 / completion=26
tokens, cost=$0.000377 (cloud only;
local: tokens but no cost field)"
:cost detail -> 4 rows: main cloud $0.000219, probe
cloud $0.000128, delegate cloud
$0.000030, main local $0.000000
(local). Sort by cost desc within
model.
:cost reset -> "cost meter reset"; subsequent
:cost shows zeros.
All 5 categories appeared in the same session: main (twice — cloud
+ local), delegate, probe (x2 from :safety check). Warn-threshold
firing already verified in commit #3 + #4.
HELP gains 3 :cost lines.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b30212af0f |
safety + repl: opts.category for Norris + probe (Phase 7 commit #4)
Closes the last two broker call sites that flow through safety.lua.
Together with commits #1-#3, all 7 broker call sites in aish now
attribute usage to the cost accumulator under the right category.
Changes:
safety.lua:
- llm_probe (the YES/NO destructive checker) — broker.chat call
gains opts.category = "probe". Captures (text, usage) via
(reply, second) and, when opts.on_usage is provided AND the
call succeeded, routes second through opts.on_usage(model,
category, payload). N4 signature chain: opts already flowed
through llm_second_opinion -> M.is_destructive from #52's
work; opts.on_usage rides along naturally with no further
signature change.
- M.norris_step (Norris main broker round-trip):
* opts to broker.chat_stream gains category = "norris"
* probe_opts (passed to is_destructive inside the loop)
gains on_usage = helpers.on_usage so the LLM probe's
cost lands under "probe" too
* on_delta wrapper adds elseif kind == "usage" branch that
calls helpers.on_usage(payload.model, payload.category,
payload). Coexists cleanly with the existing text (rehydrator)
and tool_call branches.
repl.lua:
- Norris helpers table gains on_usage = _record_usage. The R5
central chokepoint (commit #3) does the warn-threshold check
AND ctx:add_usage atomically.
- :safety check meta's probe_opts always carries on_usage now
(independently of whether secrets_session is set). secrets-aware
scrub_msgs/rehydrate added conditionally as before.
E2E verified against live broker (safety.llm_model = "cloud"):
- :safety check ls -la /tmp -> 2 cloud probe calls
- "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100"
- probe category visible in accumulator (would appear in :cost detail
once commit #5 ships the meta).
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8adebd52cc |
repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3)
Wires broker.lua's on_delta("usage", payload) and broker.chat's
(text, usage) return to the ctx accumulator via a single chokepoint.
Changes:
- Forward decl `local _record_usage` near _bg_spawn — same pattern;
the summarize-on-evict closure in make_summarize_fn (built at
line 299) needs lexical access to _record_usage (assigned at
line 695), so forward-declare and assign-without-`local`.
- _record_usage(model, category, usage) — R5 central chokepoint:
routes to ctx:add_usage, then checks the per-threshold warn
state. R4: cost_warn_state has two independent flags (dollars
and tokens) so first-to-fire doesn't suppress the other. R10:
warn message uses $%.6f for sub-cent precision.
- call_broker wrapper: wrapped on_delta now branches on
kind == "usage" -> _record_usage(payload.model, payload.category,
payload). R2: keys by payload.model (set inside broker.lua from
model_cfg.model). When fallback fires, broker is called with
fb_cfg, so payload.model IS the fallback's name automatically —
wrapper doesn't track primary-vs-fallback itself.
- 5 caller sites wired with opts.category:
ask_ai call_broker -> category="main"
summarize-on-evict -> category="summarize"
DELEGATE: handler -> category="delegate"
:memory summarize -> category="memory_summarize"
:delegate meta -> category="delegate"
- All 4 broker.chat call sites switched from
local reply, err = broker.chat(...)
to
local reply, second = broker.chat(...)
branching on reply nil-ness to interpret second (err on failure,
usage on success). Captured usage routes through _record_usage.
E2E verified against live cloud broker:
- cloud prompt -> reply "Hi! 👋"
- Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010"
- R10 sub-cent precision visible in both numbers.
Norris + safety paths still untouched — commit #4 wires those.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7b4a9becc2 |
context: cost/usage accumulator (Phase 7 commit #2)
Adds the per-conversation accumulator that broker.lua's
on_delta("usage", ...) payload feeds into. No callers yet —
commit #3 wires the broker callback to ctx:add_usage in repl.lua,
commit #4 in safety.lua.
Changes:
- Context.new: new fields `usage_totals = {}` and
`cost_warn_state = { dollars = false, tokens = false }`. R4:
two independent flags so warn_at_dollars firing doesn't
suppress warn_at_tokens (or vice versa).
- Context:add_usage(model_name, category, usage):
Increments usage_totals[model_name][category] slot. R6: when
usage.cost is nil (local llama.cpp per B3), sets a sticky
`is_local = true` flag on the slot AND does NOT add to cost
(preserves the local-vs-cloud-zero distinction for :cost detail
annotation). When usage.cost is a number (cloud), accumulates.
- Context:total_cost() / total_tokens() — pure-Lua summation
across all slots; total_tokens returns (prompt, completion).
- Context:reset_usage() — explicit :cost reset path; zeros
usage_totals AND clears both flags atomically.
- Context:reset() — R8 parity: does NOT clear usage_totals OR
cost_warn_state. Matches the Phase 4 memory_items / Phase 6
project rule ("ambient context survives a user-driven
conversation reset").
Smoke verified (20/20 unit cases):
- Empty zeros; cloud cost accumulation; local nil-cost preserves
is_local=true sticky; calls counter; cost summation across
multiple cloud calls; is_local sticky after a later nil-cost
call on a cloud slot; separate slots per (model, category);
:reset preserves; :reset_usage zeros both totals and flags.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7364963b00 |
broker: usage capture + opts widening (Phase 7 commit #1)
Foundation for Phase 7. broker.chat_stream now emits a third
on_delta kind ("usage") after the stream completes successfully;
broker.chat returns (text, usage). Backward-compatible — existing
callers that ignore the new kind / second value continue working
via Lua's drop-extra-returns semantics.
Changes:
- build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`.
opts.tools / opts.max_tokens / opts.include_usage / opts.category
all live inside opts now. Both internal call sites updated.
- opts.include_usage defaults to true for streaming requests; sets
`stream_options: { include_usage: true }` in the request body.
B1: required for local llama.cpp to emit usage; cloud honors as
a no-op (emits anyway).
- on_event captures `doc.usage` into a closure-local `final_usage`.
N1: the check is INDEPENDENT of the choice/delta branches — local
emits usage on choices=[] chunks (choice nil) while cloud emits
with non-empty choices + finish_reason. Both shapes funnel here.
- After curl.post_sse returns successfully (NOT on transport/api
errors), if final_usage is set, emit on_delta("usage", {prompt_tokens,
completion_tokens, total_tokens, cost, model, category}). cost is
nil for local (R6 preserves the nil vs 0 distinction the
accumulator needs). model is model_cfg.model — caller-stable per
B4 + R2 so call_broker's fallback retry attributes usage to the
fallback's model name without wrapper-side tracking.
- M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage"
alongside "text"; M.chat returns (text, usage). Without this fix
4 of 5 non-streaming categories (summarize / delegate /
memory_summarize / probe) would silently report zero usage.
Smoke verified against live hossenfelder:8082:
- CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/...
- LOCAL chat -> (text, usage); cost=NIL (correct per R6),
model=qwen-coder-7b-snappy-8k
- CLOUD stream -> on_delta("usage", {...}) with category="test"
echoed; model name caller-stable.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d4c20f09df |
docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs
Sonnet-reviewed (per the reviews-use-sonnet feedback memory).
BLOCKERs (RESOLVED in-place):
R1. M.chat would silently return (text, nil) for ALL non-streaming
callers — 4 of 5 categories (summarize/delegate/memory_summarize/
probe) flow through broker.chat, NOT chat_stream. §4 now shows
the explicit M.chat update that captures kind=="usage" alongside
"text" and returns (text, usage).
R2. call_broker fallback retry would credit usage to the wrong model
name. Fix: broker emits payload.model = model_cfg.model (which IS
the fallback's name when called with fb_cfg — chat_stream's
upvar). Wrapper keys by payload.model, NOT outer model_name. §4
+ §13 commit 3 reflect.
R3. build_request has TWO internal callers inside broker.lua itself,
not just the public surface. Plan §13 commit 1 risk row now
spells this out explicitly so the implementer doesn't read "every
caller already passes opts" as "external-only".
CONCERNs (FOLDED):
R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
suppresses the other. Split into ctx.cost_warn_state = { dollars
= false, tokens = false }; :cost reset clears both. §7 + §13.
R5. Warn-check centralization — single _record_usage helper in
repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
routes via helpers.on_usage / opts.on_usage callbacks. context.lua
stays decoupled from renderer.
R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
`is_local = true` (sticky) when ANY recorded usage had cost==nil.
`:cost detail` annotation comes from is_local flag, not a
fragile cost==0 heuristic.
R7. :cost detail sort needs 3-level deterministic key:
(cost desc, model asc, category asc) — table.sort is unstable.
R8. call_broker fallback passes opts.include_usage unchanged.
Documented as known assumption (B1 confirms both backends
accept; future-broken fallback can pass include_usage=false).
R9. :resume does NOT restore historical usage_totals. Per-turn usage
IS in session JSONL for scripting; cross-session aggregation is
Q-C2 deferred. Documented in §8.
R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
Widened to $%.6f in §6 + §7 warn message format.
NITs (APPLIED):
N1. §4 pseudocode comment notes `if doc.usage` branch is independent
of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
(
|
||
|
|
0f14dc1727 |
docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.
Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).
§13 Implementation Plan added — 6 commits, bottom-up:
1. broker.lua: usage extraction from final SSE chunk; build_request
signature widening to (model_cfg, msgs, stream, opts); on_delta
("usage", payload); chat returns (text, usage); opts.category
passthrough.
2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
total_cost / total_tokens helpers; :reset preserves both.
3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
delegate x2, summarize, memory_summarize); on_delta("usage")
branch routes to ctx:add_usage.
4. safety.lua: wire opts.category for Norris main broker + is_
destructive LLM probe; helpers.on_usage callback convention
(no new module dep — matches #52's scrub_msgs pattern).
5. repl.lua: :cost meta surface + warn-threshold check + HELP.
6. config.lua: commented cost example block + PHASE7.md status
bump to Implement.
Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.
Two items left open at plan, resolve at implement:
- is_destructive opts.on_usage vs cfg.helpers threading
- per-turn verbose mode (deferred; v1 = :cost on demand only)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2244a3f1ee |
docs/PHASE7-baseline: live broker probes for usage shape
Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.
B1. `stream_options.include_usage = true` is safely accepted by
both backends. REQUIRED for local llama.cpp to emit usage;
no-op for cloud (which emits anyway). Default-true is correct.
B2. Two emission patterns observed:
- Cloud (Bedrock): usage rides the FINAL delta chunk with
non-empty `choices` carrying finish_reason.
- Local: usage rides a SEPARATE chunk with `choices: []`
preceding `[DONE]`.
Both shapes are handled by the same `if doc.usage then ...`
check; the existing on_event choices-branch short-circuits
safely when choices is empty.
B3. `cost` field is dollar-denominated (number) and cloud-only.
Local returns `timings` instead (perf, not cost). Accumulator
captures `usage.cost` as-is; nil treated as 0. :cost detail
annotates local lines so $0 isn't misread.
B4. `doc.model` in the usage event reflects the upstream-API-version
(e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
`anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
caller-intended `model_cfg.model`, NOT `doc.model`, for stable
cross-call comparison.
B5. Usage event is always the LAST data event before `[DONE]`.
Emission of `on_delta("usage", ...)` happens after curl.post_sse
returns — one call per stream, after all text + tool_calls.
Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f0bccdec48 |
docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at |
||
|
|
3bad07b2da |
docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).
Four pillars:
1. Usage capture in broker.chat_stream — extract `usage` from the
final SSE chunk (OpenAI streaming spec with `stream_options:
{include_usage: true}`). Surface via new on_delta("usage",
payload) kind. broker.chat returns (text, usage) — backward-
compat: existing callers ignore the second value.
2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
tables (categories: main / delegate / summarize / memory_summarize
/ probe / norris, tagged at the call site via opts.category).
:reset preserves usage_totals (R8 parity with memory_items /
project). Session JSONL gains an optional `usage` field on
assistant turns for after-the-fact analysis.
3. :cost meta surface — :cost (summary), :cost detail (per-model +
per-category breakdown), :cost reset (zero the meter). Pure-Lua
read of ctx.usage_totals; no broker calls.
4. Optional warn thresholds — cfg.cost.warn_at_dollars /
warn_at_tokens emit a one-shot status when crossed. Default off;
useful with cloud presets configured.
Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.
Open at formulate:
Q-C1 — provider-without-usage handling (local llama.cpp probably)
Q-C2 — cross-session persistence (defer to phase 8)
Q-C3 — categories closed-set vs free-form
Q-C4 — does hossenfelder forward stream_options to all backends?
Q-C5 — warn fires on the call that crosses, or the next one?
Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?
Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
955bd82efb |
safety + repl: wire secrets into safety.lua (closes #52)
Closes the last #13 gap — Norris broker call + is_destructive LLM
second-opinion probe were the two egress points NOT covered by the
scrub-at-egress design in commit
|
||
|
|
ac58b19da2 |
config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: |
||
|
|
11d0e599cd |
repl + renderer: tree-sitter highlighter (Phase 6 commit #5)
The largest Phase 6 commit — fence-aware stream filter in renderer.lua
+ external tree-sitter dispatch + :highlight meta in repl.lua.
renderer.lua — fence-aware filter wrapping assistant_delta:
M.set_highlight(enabled, detected, highlight_fn)
Called by repl.lua at startup AND on every :highlight toggle.
Stores state in module-locals (off by default).
State machine inside _hl_push:
outside: pass chunks through; HOLD trailing partial-fence chars
(per R1 — local llama.cpp splits ```python as `'``'`
then `'`python\n'`, so naive pass-through drops the
leading "``" and never recovers).
inside: buffer cumulatively until "\n```" appears; emit
highlight_fn(body, lang) then the closing fence verbatim.
Recursive call handles "rest" after the closing fence.
N1: fences only open at start-of-stream OR after a newline
(`^```` or `\n```` only). Inline backticks in prose
("use ``` to mark code") do not open a fence.
R3 (PTY raw-mode toggle per highlight call): no change here — every
executor.exec call already toggles raw-mode (existing behavior
since Phase 1). The risk is theoretical; smoke-test interactively
after install if multi-fence renders show flicker.
assistant_flush handles end-of-stream gracefully: drains any held
partial-fence tail OR an unterminated inside-fence buffer.
repl.lua — _detect_treesitter + highlighted + :highlight meta:
_detect_treesitter() one-shot popen probe of `tree-sitter --version`.
Run once at startup; cached as
highlight_detected.
highlighted(body, lang_tag) R2-placed in repl.lua (has _shq +
executor access). Translates the fence
tag (`py`, `python`, `lua`, etc.) to
a canonical lang via LANG_TAG, picks
the canonical extension via LANG_EXTENSION,
writes body to a tmpfile with that
extension, runs `tree-sitter highlight
<tmpfile>` via executor.exec, returns
the output. On ANY failure (CLI absent,
non-zero exit, empty output), returns
`body` unchanged — silent pass-through.
R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help`
on noether; confirmed:
- NO `--lang` flag exists (formulate-time assumption wrong)
- takes a PATH; language inferred from file extension
- alternative `--scope source.X` exists but also unreliable
without configured grammars
Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]`
and pass the path. Matches the documented upstream contract.
B4-followup: even with the CLI installed, highlighting requires
`~/.config/tree-sitter/config.json` parser-directories with
cloned + built `tree-sitter-<lang>` grammars. Without parsers,
every call exits non-zero and we silently pass through. The
:highlight install hint surfaces all three install steps so the
user knows what's actually needed.
:highlight [on|off|status] meta:
no arg -> flip
on/off -> set explicit
status -> report toggle + CLI detection state
When toggled on AND CLI absent: emit a 4-line install hint
(CLI install, init-config, grammar clone reminder).
When toggled on AND CLI present: emit a 1-line note that
parser-directories must be set up for actual highlighting.
HELP gains :highlight entry.
Tested:
10/10 unit cases on the renderer state machine, including:
- plain prose passthrough
- single-chunk fence
- B2 split fence ("``" + "`python\n" + "x=42" + "\n```")
- N1 SOL anchor (mid-line ``` does not open)
- trailing \n properly emitted across chunks
- SOL-only fence open
- prose after closing fence preserved
- two fences in one stream
- highlight off = passthrough (callback never fires)
E2E :highlight meta verified:
:highlight status -> off / detected
:highlight on -> toggles + emits parser-dir reminder
:highlight status -> on / detected
:highlight off -> off
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config
example block + status -> Implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0d63f01601 |
repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4)
Per A6 (tiered resolution): @<token> tries file lookup first; if the
file doesn't exist AND the token contains "..", retry as a git
ref-range and substitute with a fenced `diff` block. Preserves the
existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels
the comma, resolves the ref, restores the comma after the closing
fence).
Resolution order for @<token>:
1. io.open(token, "rb") -- file lookup, with trailing-punct peel
2. if (1) fails and token contains "..":
git --no-pager -c color.ui=never diff <r1>..<r2>
on exit 0 + non-empty body: substitute as ```diff fenced block
3. else: leave literal `@token` + emit "[aish] @X: not found" status
Examples:
@README.md -> file (path branch)
@../sibling.txt -> file (path branch; `..` only triggers retry
when path lookup FAILS, so existing
paths with `..` segments are unaffected)
@HEAD~1..HEAD -> diff (path fails, ref succeeds)
@origin/main..feature -> diff (path fails — no such literal file;
ref succeeds; `/` in ref is fine because
we don't use the path's `/`-absence as
a discriminator)
@nonsense..gibberish -> literal preserved (both fail)
Required restructuring:
- _shq and _git_clean_cmd lifted from M.run closure scope to module
scope (above expand_mentions). Single source of truth for the
B1 prefix shared with commit #3's :diff. The in-M.run duplicates
are removed.
- expand_mentions now references `executor` (already required at
module scope on line 7) for the diff retry.
Status messages updated:
- File expansion: "@<path> expanded (N bytes, truncated)" (existing)
- Diff expansion: "@<path> expanded (N bytes, diff)" (new)
Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14):
ref-range expansion shape, body contains `diff --git`, trailing
prose preserved, @../path stays as file (not diff), neither-path-
nor-ref preserves literal, trailing-comma peel composes with ref
retry.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4d5f93aaa5 |
repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3)
User-driven git diff injection. The model sees the diff on the next
ask_ai turn through the existing exec_output channel.
Changes:
- _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree.
B1: every git invocation that flows into context MUST use
`--no-pager -c color.ui=never`. Forkpty makes git think stdout
is a TTY, enabling both color and the pager's keypad/line-clear
escapes — these would pollute the captured context block. The
helper is the single chokepoint; commit #4's @<r1>..<r2> retry
will reuse it.
- :diff [<args>] meta:
- Reads cwd at meta invocation (R6: differs from :tree's
scan-time cwd capture; documented in §5).
- Runs `_git_clean_cmd("diff " .. args)` via executor.exec.
- Empty output -> "(no diff): <label>" status, no context append.
- Non-zero exit -> "diff failed (exit N): <label>" status,
no context append. git's stderr already streamed to the
user via executor.exec's live multiplex, so the failure
reason is visible.
- Success -> appends "[diff <label>]\n<output>" via
ctx:append_exec_output. Label is "(working tree)" for empty
args, else verbatim args.
- Status confirms injection size: "diff injected: <label> (N bytes)".
- HELP gains :diff line with three example arg shapes; N3-resolved
(no `staged` alias — the meta is thin pass-through to git's grammar).
Smoke verified across four scenarios in an ephemeral test repo:
- Working-tree dirty -> 110-byte diff injected, no ANSI escapes
- --cached -> 118-byte staged diff injected, clean
- garbage..nonexistent -> exit 128, status + skip
- Clean working tree -> "(no diff)", status + skip
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d1dce832da |
repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2)
First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '*/.*' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>|refresh|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c4fc7fde01 |
context: [project] block plumbing (Phase 6 commit #1)
Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
261b230be8 |
docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs
Independent agent review of PHASE6 (manifest + baseline + plan at
|
||
|
|
4407029296 |
docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap
Status header: Analyze -> Plan.
Baseline findings folded into the design sections:
§1 (highlighter pillar) gains B4: tree-sitter absent on every
probed host; :highlight on emits install-hint when missing.
§4 (highlighter sketch) revised per B3: io.popen():close() doesn't
expose exit codes in LuaJIT. Route via executor.exec("cat tmp |
tree-sitter ...") which uses pty.spawn+waitpid and returns code
reliably. Tmpfile design retained (avoids ARGMAX + shell-escape).
§5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git
invocation must use `--no-pager -c color.ui=never` to suppress
the color/keypad/line-clear escapes forkpty triggers. Factored
recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff
and the @-mention diff retry.
New §12 Implementation Plan — 6 commits, bottom-up:
1. context.lua: ctx.project + compose_project + composition order
2. repl.lua: _scan_project_tree helper + :tree meta
3. repl.lua: :diff meta + _git_clean_cmd helper (B1)
4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6)
5. renderer.lua + repl.lua: tree-sitter detect + fence filter +
:highlight meta (B3-revised tmpfile dispatch)
6. config.lua project example + status -> Implement
Per-commit risk index + smoke criteria. Highlighter (commit 5) is
the largest experimental surface — placed last so the rest of Phase 6
ships even if highlighter slips. Order is independent enough that
swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each
commit individually green.
Things deliberately not split: _shq reuse, lang map duplication for
v1, streaming-rehydration order (rehydrate -> highlight -> emit
inherits naturally from existing chunk pipeline).
Two items open at plan time, resolve at implement: _scan_project_tree
dir-arg vs hardcoded getcwd; :highlight status probing
tree-sitter --print-langs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9f50206ca6 |
docs/PHASE6-baseline: substrate probes ahead of implementation
Six findings from probing the world before tree-sitter / diff / project
tree implementation lands:
B1. `git` subcommands through executor.exec emit ANSI color + DEC
keypad/line-clear escapes by default (forkpty enables interactive
mode). `:diff` impl MUST use `git --no-pager --color=never <args>`.
Same flags apply to any future git verbs.
B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks
(median 4 chars, max 13) AND splits code fences across boundaries
(`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers
big chunks (median 26 chars), fences intact. The §4 fence-aware
filter accumulator design covers both — confirmed necessary by
local-model behavior.
B3. **LuaJIT io.popen():close() does NOT return exit codes** — Lua
5.1 contract, not 5.2+. Breaks the A4 highlighter resolution.
Revised: route via `executor.exec("cat tmp | tree-sitter ...")`
which uses pty.spawn + waitpid and returns (out, code) reliably.
B4. tree-sitter CLI absent on both probed hosts (noether, higgs).
Highlighter is opt-in by design; absent-CLI path should emit a
clear install hint, not silently no-op.
B5. Project-tree envelope: aish 32 files / 449 chars; similar local
repos 15-25 files; scan time ~1-5ms. The 4096-char default cap
accommodates ~290 typical paths. Large repos handled via
tree_depth or cap tuning per existing §9 risk row.
B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for
the B3-revised tmpfile-roundtrip pattern.
No structural changes to formulate/analyze. B1, B3, B4 will fold into
PHASE6.md §4 / §5 / §1 during plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ad52fe4538 |
docs/PHASE6: analyze — substrate probes + Q resolutions in-place
Analyze pass against tree at
|
||
|
|
f596743834 |
docs/PHASE6: formulate — tree-sitter highlight + diff + project tree
Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6:
1. Tree-sitter syntax highlighting hooks
External `tree-sitter` CLI when present, no-op otherwise.
Honors PHASE0 §3 (no compiled extensions). Toggleable
at runtime; off by default so existing UX is unchanged.
2. Diff-aware code injection
:diff [args] meta + @<ref1>..<ref2> @-mention extension.
Shells out to `git diff`; output flows through the existing
exec-output context channel.
3. Project-level file-tree context
:tree meta + optional cfg.project.auto_tree startup inject.
git ls-files in a repo, find fallback otherwise. Composed
into the system prompt as a new [project] block between
[background] and [earlier summary]. Suppressed under Norris
(R-C1 / R-C4 parity).
Module changes: renderer.lua (fence-aware highlight filter), context.lua
(compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions
extension). No new module files in v1.
Doc covers: scope + done-when criteria, tech decisions table, module
changes table, per-pillar deep dive with example code, UX surface
summary, out-of-scope list, risks, and 6 open questions to resolve
in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree).
Scope confirmed via AskUserQuestion: all three subsurfaces in scope;
tree-sitter approach is external CLI w/ no-op fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d852acadc2 |
repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args
Plumbs the secrets.lua module (commit
|
||
|
|
e4b818b0e9 |
secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1)
Standalone module — no wiring yet. Lands the substrate for issue #13: secrets.load(path) — vault file loader; refuses non-0600 secrets.make_session(vault) — per-conversation scrub/rehydrate state session:scrub(text, mode) — substitute literals (+ autodetect) session:rehydrate(text) — restore placeholders secrets.streaming_rehydrator — chunk-boundary-tolerant streaming wrapper Mode semantics (chosen per call by the caller): "off" — identity, no mapping "vault" — vault literals only, placeholders, rehydratable "vault+autodetect" — + heuristic regexes, placeholders, rehydratable "stealth" — + heuristic regexes, opaque decoys, one-way Placeholders are stable across the session: the same literal always maps to the same $AISH_SECRET_NNN slot, so re-scrubbing the same context is idempotent and the model sees a consistent vocabulary. AUTODETECT_PATTERNS (ordered; longer prefixes first): sk-or-v<N>-... OpenRouter ghp_/gho_/ghs_ GitHub PATs AKIA<16> AWS access keys eyJ...x.y.z JWTs sk-... OpenAI (generic; matched after openrouter) -----BEGIN ... PRIVATE KEY----- SSH/GPG key headers Streaming rehydrator: tolerates a placeholder split across SSE chunks ($AISH_SE then CRET_001). It holds back the trailing partial-match in a buffer, emits the rest, and resolves on the next push or flush. Verified with 20 unit cases (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder $-prose pass-through). Vault file mode enforcement: 0600 only — matches ssh's behavior for ~/.ssh/id_rsa. Loud failure (status + skip) if mode is wider. Next commit (issue #13 follow-up): wire into broker / tool dispatch / display, add per-broker `redact` policy, :secrets meta, config example block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cdf4e86679 |
repl: sub-broker delegation via DELEGATE: marker (closes #6)
Cost and context-window control: a "heavy" preset's model can offload
work to a cheaper preset without spending its own tokens on the result.
Example: deep model is mid-conversation and asks fast to summarize a
20k-line build log; the summary comes back as exec-output for the
next turn, deep stays small.
Marker syntax: DELEGATE: <preset> "<prompt>"
(Single or double quotes; one DELEGATE per line; lines without the
quoted shape are dropped — let the user write about delegation in
prose without accidental dispatch.)
Dispatch flow (mirrors CMD: / CMD&: extraction):
1. ask_ai's stream completes
2. extract_delegate_lines walks the final response
3. For each {preset, prompt}: broker.chat(config.models[preset], ...)
synchronously; result is appended via ctx:append_exec_output as
"[delegate <preset>]: <result>"
4. The model sees the delegate result on its next turn
Implementation choice — marker over tool: option 1 from the issue
("inline delegate marker") works with any model regardless of
tool_calls support. Option 2 (aish_delegate as a tool dispatched in
the existing Phase 2 sub-loop) is the better UX for capable models
since it returns the result mid-turn — filed as follow-up if needed.
Meta surface:
:delegate <preset> <prompt> one-shot direct invocation (useful for
testing without depending on the model
emitting DELEGATE:, and as a manual
"ask <preset> something" verb)
Scope:
- Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch
- Norris: not extended; the planner's model anchor would conflict with
mid-plan switching (R-C3-adjacent risk)
- No self-delegation guard: each DELEGATE is a separate broker call,
not recursive; a delegate result reaching the next turn could
contain another DELEGATE but that's bounded by max_tool_depth-style
iteration cap on the parent
- No cost prompt: configuring a paid cloud preset already implies
consent to spend on it
- Unknown preset → error status + exec-output note "[delegate X failed:
unknown preset]"
Extractor unit-tested with 8 cases (single-quote, double-quote, multi-
line prose, empty prompt, no-quotes, case-sensitive, wrong prefix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|