a3c1813465d28bec3aa6b43269721b93a1113cf5
148 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a3c1813465 |
context: proactive periodic summarization (closes #101)
Closes #101 (FR-A from the 2026-05-17 German strategy analysis, small-model improvement strategy 5: "History-Zusammenfassung via local"). Phase 5 summarize-on-evict only fires at budget pressure — exactly when the local model is already suffering. Small models benefit from tight context from turn 1, not "after eviction". This commit adds CADENCE-triggered summarization that fires every N appends regardless of budget, folding turns older than `summarize_keep_recent` into ctx.summary via the existing Phase 5 summarize_fn closure. context.lua additions: - New ctx fields: summarize_every_n_turns, summarize_keep_recent (default 4), _turns_since_summarize (counter). - Context:append bumps the counter on every store. - Context:enforce_cadence — the new entry point. Returns the number of turns folded (0 on no-op). Guards: * disabled (cfg unset OR summarize_fn unset) -> 0 * not yet due (_turns_since_summarize < N) -> 0 * Norris-active (Phase 5 R-C4 parity — planner stays on goal) -> 0 * #turns <= keep_recent (nothing to fold) -> 0 * summarize_fn returns nil/empty -> 0 (defer to enforce_budget later) Orphan-tool guard: when the fold slice would end on an assistant-with-tool_calls, peel back the right edge until the next live turn isn't role=tool. Strict chat templates reject tool-without-assistant-anchor (#87 already encountered this). - If ctx.summary grows past max_summary_chars after the fold, compress in a second pass (same shape as enforce_budget's Phase 5 logic). repl.lua wiring: - ctx_opts continues to copy all config.context keys; the new summarize_every_n_turns / summarize_keep_recent fields flow through automatically. - make_summarize_fn is now wired when EITHER summarize_on_evict OR summarize_every_n_turns is set (same closure, different trigger — Phase 5's #51 #issue eviction path uses it on budget; #101 uses it on cadence). - New status_cadence_fold helper: "[aish] proactively summarized N older turns". - ask_ai's existing enforce_budget call site now first fires enforce_cadence, then enforce_budget. Cadence comes first so the token estimate enforce_budget sees is the tighter post-fold one — no spurious eviction of turns we just summarized. - Norris path NOT wired: enforce_cadence is a no-op there via the norris_active guard (consistent with Phase 5 R-C4). 18 inline unit cases for enforce_cadence: - cfg disabled / no summarize_fn / below cadence -> 0 - cadence met -> exact fold count (N - keep) - summary contains folded contents; first/last live turn IDs match - cadence counter resets; second fold fires after another N appends - Norris-active -> suppressed - orphan-tool guard: peels back when last folded = asst+tool_calls - summary compression triggers when over max_summary_chars E2E verified on hossenfelder:8082, summarize_every_n_turns=4 / summarize_keep_recent=2: 5 user turns -> 2 cadence fires: [aish] proactively summarized 2 older turns [aish] proactively summarized 4 older turns :cost detail shows main=5 calls, summarize=2 calls (matches fires). Estimated ctx token count: 180 (vs ~1000 unsummarized). Flag-off path: no status, identical to pre-#101 behavior. Regression: 87/87 safety, 31/31 router_model, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c9009399d6 |
config: example block for cfg.memory.auto_summarize_on_quit (#102)
Documents the new opt-in keys (auto_summarize_on_quit, min_turns_for_summary, summary_model) inline with the existing Phase 4 memory block. Notes that the older summarizer_model key is still honored for back-compat. Config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
299719f4de |
repl: auto-summarize on :q into memory.jsonl (closes #102)
Closes #102 (FR-B from the 2026-05-17 German strategy analysis, small-model improvement strategy 5: "History-Zusammenfassung via local"). Today the `:memory summarize` distill flow is a manual meta — users have to remember to run it before quitting. This commit wires the same flow into shutdown_session under an opt-in cfg flag, so the local fast model can absorb each non-trivial session into the persistent memory.jsonl without user burden. Next-session startup's [background] block picks the new entries up automatically (Phase 4). Implementation: - Extract the `:memory summarize` body into _do_memory_summarize(opts). opts.auto = true: skip the per-candidate readline keep?[y/N/edit] loop and auto-add every parsed candidate (trust the model + the explicit opt-in via cfg.memory.auto_summarize_on_quit). opts.min_turns is the silent-no-op cutoff. Status messages suppressed for fast-path no-ops so :q stays quiet on trivial sessions. - :memory summarize meta now one line: _do_memory_summarize({ auto=false }). - shutdown_session checks cfg.memory.auto_summarize_on_quit; if set, pcall(_do_memory_summarize, { auto=true, min_turns=N }). pcall so a broker failure NEVER blocks :q (memory is best-effort). New config keys (all opt-in; default behavior unchanged): memory = { enabled = true, auto_summarize_on_quit = true, min_turns_for_summary = 5, -- skip trivial sessions summary_model = "fast", -- cfg.memory.summarizer_model is -- still honored for back-compat } E2E verified on hossenfelder:8082 with qwen-coder-7b as summary_model: 3 user turns ("remember venus...", "remember mars...", "remember pluto..."): :q -> "[aish] summarizing session for memory via fast ..." -> "[aish] auto-summarize: added 3 memory items" -> memory.jsonl gained 3 fact: entries (correctly extracted) Below threshold (1 user turn, min=10): :q -> silent, no broker call, no memory.jsonl change Flag off (default behavior, 4 turns): :q -> silent, identical to pre-#102 behavior Regression: 87/87 safety, 31/31 router_model, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cb37fa861a |
phase10: config.lua example for cfg.norris.{preplanner,executor}
Documents the new Phase 10 / #89 surface for users: when, why, and how to set cfg.norris.preplanner + .executor + .tasks_max. Notes the graceful single-model fall-back when preplanner is unset OR fails, and the design choice that preplan does NOT route via call_broker (retry would silently swap planning models). C5: config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
76a8f97009 |
repl: cloud preplanner + local executor split for Norris (closes #89)
Phase 10 C4 — the orchestration commit. Splits Norris autonomous
mode into a one-shot cloud preplan + per-step local executor flow,
with graceful fall-back to single-model Norris when preplan is
disabled or fails.
run_norris additions (in order):
1. R4 fix: clear ctx.norris_active/_goal/_tasks at the TOP so a
prior crashed Norris can't leak stale state into the new launch.
2. Preplan block (gated on cfg.norris.preplanner):
- Look up the preplanner preset in cfg.models; warn + skip if
absent.
- Build a system prompt asking for TASK: <imperative> lines
(R1: %d via string.format — gsub("N", ...) would corrupt
"No prose / commentary / numbering" to "16o prose").
- Scrub messages per the preplan model's redact policy; run
broker.chat (non-streaming, per Q-PP2) with category
"norris-preplan"; R7: respect pre_cfg.timeout_ms.
- On success: rehydrate; record usage via _record_usage;
extract_task_lines; cap to tasks_max; populate
ctx.norris_tasks = { current = 1, list = parsed }.
- On ANY failure (transport err / empty list / bogus preset):
status log + leave ctx.norris_tasks nil → single-model
fall-back. R3 design: NOT routed via call_broker; a fallback
retry would silently swap planning models which is worse
than a clean hard-fail.
3. Executor cfg resolution (independent of preplan per Q-PP1):
cfg.norris.executor names a preset → executor_cfg = that cfg.
Unset / missing preset → executor_cfg = active_cfg (existing
:model-selection behavior).
4. Loop body: pass executor_cfg (not active_cfg) to
safety.norris_step. After each "continue" result, advance
ctx.norris_tasks.current. When current > #list, exit with
synthesized status "tasks_complete" + reason "all N preplanned
tasks executed".
5. Exit cleanup: clear ctx.norris_tasks alongside the existing
norris_active/_goal clears so a re-launch starts fresh.
renderer.norris_end gains "tasks_complete" as a non-error status
(cyan, same as "done"). Distinct from "done" (executor said
GOAL: complete) — executor exhausted the plan but didn't confirm
goal, which is a clean exit, not an error.
E2E verified (preplanner=fast, executor=fast on hossenfelder:8082):
:norris print the date and the current uptime
→ preplanned 2 tasks via fast
→ ─ step 1/3 ─ Print the current date.
→ CMD: date → Sun May 17 ...
→ ─ step 2/3 ─ Print the current uptime.
→ CMD: uptime → ... up 1 day ...
→ NORRIS TASKS COMPLETE: all 2 preplanned tasks executed
:cost detail correctly shows two rows for the same model:
norris-preplan 1 calls, 95 / 12 tokens
norris 1 calls, 364 / 9 tokens
Fall-back verified:
cfg.norris.preplanner = "doesnotexist" →
"[aish] preplanner 'doesnotexist' is not in cfg.models;
running single-model" → Norris runs as Phase 6.
No-preplan path verified (no cfg.norris block):
Norris runs exactly as Phase 6, no behavior change.
Regression: 87/87 safety, 31/31 router_model, repl loads.
Closes #89.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
fa2cfc66ed |
safety: pass current task descr to render_step (Phase 10 C3)
helpers.render_step has supported a (step_n, max_n, descr) signature since Phase 3 (renderer.lua:246) but safety.norris_step has only ever called it with two args. Phase 10 lights up the descr slot: when ctx.norris_tasks is populated (cloud preplanner ran at :norris launch), the current task text becomes the per-step description so the user sees `─ step k/N ─ <task>` in real time. ctx.norris_tasks is nil when: - preplan disabled (cfg.norris.preplanner unset) - preplan failed (transport / parse / empty) - preplan emitted TASKs but already exhausted In all those cases descr falls through to nil → renderer prints just the step bar (Phase 3 behavior, no regression). Regression: 87/87 safety, 31/31 router_model, repl loads. No e2e visible change yet — ctx.norris_tasks is always nil until C4 wires the preplan call. R5 fix: this commit touches safety.lua ONLY (no repl.lua change as the prior plan implied). Executor cfg resolution + preplan wiring lands in C4 (next commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
477d8a76cc |
context: norris_tasks anchor + task-hint composition + reset clear
Phase 10 C2. Three additive changes; no regression. - compose_norris_task_hint(self) — module-scope helper. Returns "" when norris_tasks is nil OR list empty OR current pointer past end. Otherwise returns "\n\nCurrent step k/N:\n <task text>". - Context:to_messages appends the hint AFTER the NORRIS suffix, inside the existing `if self.norris_active and self.norris_goal` branch. NORRIS_SUFFIX_TEMPLATE is UNCHANGED (R2 fix); the hint is a separate concatenation. Goal anchor stays the primary per-step instruction; task hint sharpens current focus. - Context:reset() now clears self.norris_tasks (R6 fix). :reset is unreachable mid-Norris (planner runs without readline prompt), but if a Norris session crashed leaving stale state, :reset recovers cleanly. One line; defensive. 15 unit cases verified: - nil/empty/exhausted norris_tasks -> no hint block - current=1/3 -> "Current step 1/3" + task text in output - NORRIS suffix precedes hint (ordering preserved) - hint suppressed when norris_active=false even if tasks set - self.turns + self.norris_tasks table identity unmutated - Context:reset clears norris_tasks AND turns Regression: 87/87 safety, 31/31 router_model, repl loads. C2 isn't called from anywhere yet (ctx.norris_tasks is always nil until C4 wires the preplan call). No behavior change in the live tree until then. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e4780483ad |
executor: extract_task_lines for Phase 10 preplan parsing
Pure function parallel to extract_cmd_lines, but more permissive to accommodate cloud-model output variation: tolerates leading whitespace (cloud often indents), tolerates extra whitespace after the colon, strips trailing whitespace. Strict on the literal "TASK:" prefix. Returns an array of trimmed strings; empty TASKs and non-TASK lines dropped silently. Callers cap the list size per cfg.norris.tasks_max. 10 inline unit cases verified: empty/nil, single TASK, mixed CMD+TASK (only TASKs returned), leading whitespace tolerated, empty-body TASKs dropped, trailing whitespace stripped, extra-spaces-after-colon AND no-space-after-colon both tolerated, prose interleaving (3 TASKs extracted from a realistic cloud response with intro+outro prose), TASK content with embedded quotes/punctuation preserved. Nothing in the tree calls this yet (Phase 10 C1 is the foundation commit; C4 lights it up). No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
cbef05ff40 |
phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits
All 8 actionable findings accepted; R9-R11 were confirmations.
Blockers:
- R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
numbering" → "16o prose" etc. Switch to %d + string.format.
- R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
contradicted §11's "don't change the template; append helper
output after". §5 now shows the helper-append approach.
Important:
- R3: preplan bypasses call_broker (no fallback retry) — keep that
by design; retry would silently swap planning models. Documented
in §10 Risks so it doesn't get "fixed" later.
- R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
can leak across launches if a Norris step crashes. Fix: clear all
three at the TOP of run_norris before preplan. Cheaper than full
pcall wrap; handles the stale-tasks vector.
- R5: clarified C3 commit scope — safety.lua ONLY in C3; the
executor cfg resolution + preplan wiring lands in C4.
- R6: Context:reset() now also clears self.norris_tasks (defensive;
:reset is unreachable mid-Norris but one line is cheap).
Nits:
- R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
configured per-model timeout).
- R8: "Status:" → "Terminal output:" in §1 acceptance criterion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cb2f948e76 |
phase10: analyze + plan — answer Q-PP1..6, 5-commit roadmap
Analysis resolves 6 OQs from the formulate: - executor cfg independent of preplanner cfg (Q-PP1) - preplan non-streaming for v1 (Q-PP2) - re-launch fires preplan again, naturally (Q-PP3) - executor sees goal + current task (Q-PP4) - :norris introspection out-of-scope v1 (Q-PP5) - 1-task degenerate case runs as normal (Q-PP6) Code-reading findings: safety.norris_step signature unchanged (executor cfg flows in as model_cfg param); NORRIS_SUFFIX_TEMPLATE stays stable (task hint appends after); renderer.norris_step already accepts descr (just unused by safety.norris_step today). Plan: 5 commits — executor / context / safety / repl / config-and- memory. Each commit verifiable in isolation; the orchestration lights up at C4 (repl preplan wiring); C5 documents. Sonnet review next (per ~/.claude/projects/.../memory rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a7cbe22d1d |
phase10: formulate manifest — cloud preplanner / local executor split
Resolves direction for #89. Splits Norris into two roles: - Preplanner (cloud) fires ONCE at :norris launch; emits TASK: list. - Executor (local) handles each TASK; existing HALT protocol intact. ctx.norris_tasks anchor survives eviction (mirrors ctx.norris_goal). Cost category 'norris-preplan' separates the cloud preplan call from per-step executor cost in :cost detail. Graceful fall-back when cfg.norris.preplanner is unset OR preplan call fails — Norris runs as today (single-model). No regression for existing users. PHASE0 §11 amended to add Phase 10 row. Manifest declares 6 Open Questions for analyze step; 12 design decisions table; module-touch table; 4-pillar plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c55077bc07 |
context + repl + config: route-aware context compression (closes #87)
Small local models effectively use a fraction of their advertised
context window. Per-request compression for routes that hit a
local-compress-flagged model preset: keeps only the last N turns
and tail-truncates oversized content. Cloud routes get the full
context unchanged.
Changes:
- context.lua _compress_turns(turns, keep, max_chars): returns a
new list (self.turns NEVER mutated) with the last `keep` turns
preserved + content tail-truncated to `max_chars`. Defensive:
drops tool turns at the slice head (orphaned without their
assistant-with-tool_calls anchor — strict chat templates would
reject them; same gotcha PHASE0 §6 warned about for user/user).
- Context:to_messages(opts) — opts.compress = { keep_turns,
max_turn_chars } swaps the turn iterable for the compressed
view. Affects BOTH the use_tool_role=true path and the
use_tool_role=false fallback (PHASE2.md Q18 strict-template
workaround). Persistence + display via :history see the full
uncompressed ctx.turns.
- repl.lua ask_ai: when req_cfg (the routed model's cfg) has
`local_compress = true`, build compress_opts from
config.context.compress (defaults keep_turns=2, max_turn_chars=800).
Pass through ctx:to_messages alongside the existing
system_prompt_override (#86) — orthogonal opts that compose.
- Norris unaffected: safety.norris_step builds its own messages
array; the planner needs full history per PHASE3 design.
- config.lua gains a header comment explaining the per-model opt-in
+ the context.compress defaults block + the documented tool-turn
truncation trade-off.
13 unit cases verified:
- no opts -> full turn list (no regression)
- keep_turns=2 -> exactly last 2 emitted
- long content tail-truncated to max_chars
- self.turns unchanged after render
- orphan tool-turn at slice head dropped (no chat-template violation)
- tool turn included WITH its assistant anchor when keep_turns >= 3
E2E against live local broker:
- models.fast.local_compress = true; keep_turns=1; max=200
- 4-turn session: each broker call sees ONLY the current turn
(verified by short coherent CMD replies despite no cross-turn
memory available to the model). FR-promised small-model
friendliness in action; conversation continuity is the
documented trade-off.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
74e4bffb37 |
broker + repl + safety: GBNF grammar-sampling passthrough (closes #88)
llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
047d629a66 |
context + repl + config: per-class system_prompt override (closes #86)
Small local models follow precise structured instructions better than
natural language. Per-routing-class system_prompt override gives them
tighter instructions for THAT request while preserving ambient context.
Changes:
- Context:to_messages(opts) — opts.system_prompt_override REPLACES
the base system_prompt for THIS render only (state unchanged).
Dynamic blocks ([background], [project], [earlier summary], NORRIS
suffix) still compose on top. opts is optional; nil-safe for old
callers.
- repl.lua ask_ai — captures req_class from router.classify_model
(already returned by Phase 5; previously discarded after the
status line). Looks up config.routing.system_prompts[req_class];
passes as opts.system_prompt_override to ctx:to_messages each
iteration of the tool-sub-loop.
- Gating: override fires only when routing.auto is on (no class ->
no override). If system_prompts[class] absent for a class, fall
through to the default system_prompt (no surprise).
- Norris unaffected: safety.norris_step builds its own messages
array; doesn't go through this path.
- config.lua gains a commented-out example showing routing.system_
prompts with the code/default examples from the FR body.
Smoke verified:
- 12-case context.lua unit test: opts nil/absent/present, override
replaces base, dynamic blocks still compose, state unchanged
after call, Norris-mode coexistence (suffix still present;
background still suppressed).
- E2E against cloud broker with routing.system_prompts.code set:
triple-backtick prompt -> code class -> override fires; model
emits terse code-only output. Non-code prompt -> default class
-> no override -> normal verbose-ish reply.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
df59ee2f2c |
config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4)
config.lua header gains a Phase 9 paragraph documenting the
project-overlay feature + the R7 shallow-merge warning ("if your
.aish.lua sets a top-level block, it REPLACES the user's entire
block — list every entry OR omit the block"). Inspect at runtime
via `:config show`.
docs/PHASE9.md status header bumped: "Plan + review fold-in" ->
"Implement". Lists the 4 implement commits inline:
|
||
|
|
5b6ee553db |
repl: :config show meta + HELP (Phase 9 commit #3)
User-facing diagnostic for the project-overlay layer. Reads config._sources (R3 cfg-embedded by main.lua's load_config_with_ overlay in commit #2) + the effective config; surfaces which file contributed each top-level key. :config show top-level keys + which source set each (nested tables collapsed to inner-key list) :config show full recursive dump with sensitive-key masking Masking heuristic (any key containing token/secret/auth/key, case-insensitive) -> "(set)" instead of the value. R6: applied RECURSIVELY in full mode so the actual leak vector (mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught. Defensive depth cap (5) prevents pathological recursion. When config._sources is absent (caller didn't go through load_config_with_overlay), status: "(unknown — main didn't pass _sources)" — meta still runs, just labels source as "?". N2 known cosmetic false-positive: `key_env` / `auth_env` config fields hold env-var NAMES (not secrets) but match the heuristic. Future polish exempts `*_env` patterns. Same for `token_budget` (contains "token") — also masked despite being a plain number. Acceptable; errs toward over-masking. HELP gains 1 :config line. E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME: A. No project overlay: 6 user keys; nested tables collapsed. `secrets` masked as (set) at top level. B. Project overlay accepted: source map cleanly partitioned (user has 4 keys; project has 2 — default_model + models); each top-level row tagged [user] or [project]. C. :config show full: nested dump; auth_token in models.cloud correctly masked as (set); SECRET_VAL never appears in output (grep count = 0). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Commit #4 next: config.lua template comment + PHASE9.md status header -> Implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
34b465d6dc |
main: project-overlay loader (Phase 9 commit #2)
Wires the project-overlay step around the existing load_config.
Activates only when a trusted .aish.lua is found in/above cwd.
Changes:
- _find_project_config() walks libc.getcwd() up to $HOME, returning
first .aish.lua found. R1 fix folded: proper-prefix check (`dir ==
home OR dir starts with home .. "/"`) avoids the false positive
where /home/user2 matches HOME=/home/user via byte prefix.
- _trust_file_path() resolves via $AISH_TRUST_FILE env override,
else ~/.aish/trusted-projects. Plan-time decision per N3.
- _check_and_maybe_prompt(project_path, history) — calls
history._sha256_file ONCE; routes through history.is_trusted; on
miss prompts via rl.readline; on accept persists via
history.add_trusted. A8 mitigation: if rl.readline fails to load,
decline silently (no io.read fallback that would consume stdin).
- load_config_with_overlay(opts):
* Calls existing load_config; seeds sources={k="user", ...}
* Walks for .aish.lua; if found:
- In opts.prompt mode (-p, R2): skip the prompt entirely;
only PRE-TRUSTED overlays load. Avoids io consuming the
piped stdin that -p will read for context.
- Else: interactive trust check + prompt.
* On accept + successful dofile: shallow-merge top-level keys
ONTO user config; update sources[k]="project" for overlapping.
* R3: embeds sources on cfg._sources for repl.lua's :config
show meta to read. No global.
* Returns (cfg, user_path, project_path | nil).
- main() now calls load_config_with_overlay; on project layer
active, emits the "[aish] project config: <path> (overlaid on
<user>)" status line per A4 (AFTER the user-config status).
E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME:
1. Decline -> overlay skipped; user config active.
2. Accept -> overlay loaded; project_model active; status line
"[aish] project config: ... (overlaid on ...)" visible.
3. Re-startup -> NO prompt (cached via sha); overlay loaded
transparently. R4 single-sha-call confirmed.
4. -p mode with untrusted overlay -> skipped silently; piped
stdin preserved for run_one_shot.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Commit #3 lands :config show + HELP next; commit #4 the config
template comment + status -> Implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e525063df3 |
history: trust file helpers for Phase 9 (commit #1)
Foundation for the project-overlay trust mechanism. No callers yet — commit #2 wires main.lua to use these. Three new functions: history._sha256_file(path) -> hex digest or nil Shells `sha256sum`; parses first whitespace-separated field; validates 64-hex-char length. nil on any failure (path missing, binary missing, file unreadable). Caller treats nil as "skip the trust path" — never crashes. history.is_trusted(trust_path, project_path, sha256) -> bool Reads trust_path as JSONL; returns true iff an entry exists matching BOTH project_path AND sha256. Missing / corrupt / unreadable trust file -> false (re-prompt). Per-line JSON decode means partial-write corruption affects at most one line. history.add_trusted(trust_path, project_path, sha256) -> bool mkdir -p parent; append JSONL line {path, sha256, ts (ISO)}; chmod 600 the trust file (best-effort; ignore failure). Single writer per call; append-only. 11 unit cases verified: - sha256 known value matches manual `sha256sum` - nil / missing-file -> nil (no crash) - is_trusted on missing trust file -> false - add_trusted + is_trusted roundtrip works - Different sha -> not trusted (content-binding) - Different path -> not trusted - Multi-entry trust file: each entry independently checked - chmod 600 verified via stat Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e796142a23 |
docs/PHASE9: review fold-in — 0 BLOCKERs + 7 CONCERNs + 5 NITs
Sonnet review of PHASE9 (formulate + analyze + baseline + plan at
|
||
|
|
31e5de5ad5 |
docs/PHASE9: analyze + baseline + plan (single bundled commit)
Bundled the three doc steps since the surface is small (4-commit
impl, no major redesigns from formulate).
Analyze findings (12, A1-A12):
A1-A2 — main.lua surface clean; no new FFI needed
A3 — Q-P2 RESOLVED via baseline: sha256sum (GNU coreutils)
A4 — Q-P1: trust prompt AFTER user-config status line
A5 — Q-P3: don't log walk-up by default; :config show on demand
A6 — Q-P5: :cfg show top-level by default; `full` for deep
A7 — Q-P6: project may set secrets.vault (covered by trust prompt)
A8 — Q-P4 DEFERRED: rl.readline early-startup smoke at impl time
A9 — walk-up perf <1ms even pessimistic
A10 — trust-file race: JSONL append-only handles concurrent writes
A11 — sandboxed dofile out of scope (trust prompt IS the gate)
A12 — bootstrap order is correct: user→project→secrets_session
Baseline:
B1 — sha256sum + openssl agree byte-for-byte on noether;
sha256sum chosen (universal + simpler parse).
§10 Open Qs table now shows resolutions inline (5/6 done; Q-P4
deferred to implement-time smoke).
§13 Implementation Plan added — 4 commits:
1. history.lua: trust file helpers (read/add/is_trusted + _sha256_file)
2. main.lua: walk-up + load_config_with_overlay + trust prompt
3. repl.lua: :config show meta + startup status line
4. config.lua header note + status -> Implement
Per-commit risk index covers sha256sum-missing case, JSONL partial
write, A8 rl.readline early-startup, symlink-loop walk-up,
:config show token leakage via conservative masking heuristic.
Open at plan-time (resolve at impl):
- A8 rl.readline behavior; fall back to io.read if broken
- $AISH_TRUST_FILE env override for CI isolation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4f5c3aeba9 |
docs/PHASE9: formulate — project-local config overlay (.aish.lua)
Phase 9 formulate manifest + PHASE0 §11 amendment (adds Phase 9 row)
+ PHASE0 §10 amendment (config resolution order now references Phase
9's overlay step). Substrate-touch lands same commit per CLAUDE.md §3.
Four pillars:
1. .aish.lua walk-up from cwd; stops at $HOME or filesystem root.
First found file becomes the project layer. Absence = no-op.
2. Shallow merge over user config: project top-level keys REPLACE
user keys. Predictable; deep merge surprises with array/table
semantics. Users compose full blocks explicitly.
3. Trust prompt + sha256-pinned persistence in ~/.aish/trusted-
projects (JSONL, mode 0600). First encounter prompts; subsequent
startups load only if recorded sha matches. Content change ->
re-prompt. Matches direnv-allow security posture.
4. :config show meta — lists each source path with the top-level
keys it contributed + sanitized effective config dump
(token-bearing fields masked).
Key design decisions documented:
- Trust mechanism is explicit (not default-trust-all-cwds) —
.aish.lua runs arbitrary Lua via dofile; hostile cloned-repo
case is a real concern.
- $HOME boundary on walk-up — don't search /tmp or /. Repos
outside $HOME get no project layer.
- Reload on cd: NO. Config resolved at startup only.
- sha256 via shelled `sha256sum` (POSIX-portable; avoid
vendoring a Lua impl).
§9 risk table covers: hostile repo (trust prompt), corrupted trust
file (best-effort skip), updated repo (sha mismatch re-prompts),
dofile errors (pcall-protected), walk-up safety ($HOME boundary).
6 open questions for analyze:
Q-P1 — trust prompt before/after startup status
Q-P2 — sha256sum vs openssl dgst (baseline)
Q-P3 — log walk-up path?
Q-P4 — rl.readline safe at startup?
Q-P5 — :config show full vs top-level
Q-P6 — project-set secrets.vault security
Scope confirmed via AskUserQuestion: project-local overlay (chosen
over cost preflight enforcement and cross-session cost persistence,
both deferred as Phase 10 candidates per §11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
08dba69fce |
config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5)
config.lua:
- Commented-out `tokenize = { use_endpoint = true }` block with
parity to the Phase 1-7 example blocks.
- Documents the two consequences: (1) per-turn network cost
(~30ms first time, cached after) and (2) token_budget is now
actually enforced — sessions that fit under char/4 may evict
earlier under accurate counts.
- Notes cloud /tokenize 404 fallback path.
docs/PHASE8.md:
- Status header bumped: "Plan + review fold-in" -> "Implement"
- Lists the 5 implement commits inline for traceability:
|
||
|
|
94b7d86926 |
repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4)
Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised
:cost detail trailing line.
Changes:
- When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is
set to `function(text) return broker.token_count(active_cfg, text) end`
before Context.new fires. R4: the closure body references
active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call
time, so subsequent :model switches re-route to the new model's
tokenizer automatically (verified by E2E: :model cloud after the
fast call still produces clean estimate row).
- :cost detail gains a trailing line per R3:
estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used)
N comes from ctx:estimate_tokens() (current in-memory snapshot,
NOT a comparison against the accumulator sum above which is
cumulative across calls + evicted turns). Gives at-a-glance
budget utilization.
E2E verified against live broker:
- fast model call -> 168 tokens estimated (real BPE via /tokenize)
- :model cloud + cloud call -> 178 tokens estimated (closure
follows :model switch correctly per R4)
- 21% / 22.3% budget utilization shown
- Accumulator sums and estimate are intentionally different
(sums are cumulative, estimate is current snapshot) — R3-
correctly displayed as separate lines
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
With this commit landed, Phase 8 is functionally complete; commit
#5 is config example + status bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
db26d0ccb7 |
context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)
Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8502517021 |
context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2)
Foundation for accurate Context:estimate_tokens. When the optional tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua), estimate_tokens uses it with per-turn caching for O(1) amortized cost. char/4 path unchanged when tokenize_fn nil. Changes: - Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn. - Context:estimate_tokens: if tokenize_fn nil -> existing char/4 (no behavior change). if tokenize_fn set -> - tokenize self.system_prompt every call (dynamic per compose_background/project/summary; can't cache). - for each turn: if t._tokens nil -> compute + cache; else use cached. Turn content immutable after append (we never mutate stored turns) so cache never goes stale. - :reset wipes self.turns which takes the _tokens cache with them; new turns start with t._tokens == nil and lazy-set on first count. 8/8 unit cases verified: - char/4 path unchanged when no tokenize_fn - tokenize_fn called 1+ N times on first estimate (sys + N turns) - subsequent estimates fire only 1 tokenize call (sys; turns cached) - new turn fires +1 tokenize call on next estimate - :reset + fresh turn fires fresh tokenize call (cache died with turn) No callers wire tokenize_fn yet — Phase 8 commit #4 lands the repl.lua wiring (after commit #3 adds the enforce_budget extension that's the real beneficiary of accuracy). Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7ef2a6ed5c |
broker: token_count + endpoint capability cache (Phase 8 commit #1)
Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize
where supported, char/4 fallback otherwise.
Changes:
- `M.token_count(model_cfg, text)`:
Empty text -> 0.
No endpoint -> char/4 immediately.
Capability cache says false -> char/4.
Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`,
2s timeout. On 200 + parseable `{tokens=[...]}`: cache true,
return #tokens. Anything else (non-200 / parse-fail / transport
err / timeout): cache false, char/4.
- `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1
confirmed /tokenize ignores the model field, so same-endpoint
presets share one cache entry. If a future broker honors the
model field, revisit.
- `M.tokenize_supported(model_cfg)`: returns nil/true/false for
the cached state (introspection for tests + future :tokenize meta).
- `M._reset_tokenize_cache()`: test hook so the session-local cache
doesn't leak between test runs sharing a LuaJIT VM.
Live verified against hossenfelder + a deliberately-broken endpoint:
- "hello world" -> 2 tokens (matches manual curl probe)
- 901-char text -> 201 real tokens vs 225 char/4 (24-token gap;
real is LOWER here, opposite direction from the README probe
where it was higher — confirms heuristic is inaccurate in both
directions)
- Pre-probe: tokenize_supported() returns nil
- Post-probe: tokenize_supported() returns true (local) / false (broken)
- Broken endpoint second call: still char/4, no re-probe
- Empty / nil text edge cases handled
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
467e573d24 |
docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs
Sonnet-reviewed per reviews-use-sonnet memory directive.
BLOCKERs (RESOLVED in-place):
R1. §5 estimate_tokens pseudocode missing per-turn cache pattern.
Prose described it; code block called tokenize_fn unconditionally.
Implementer following code verbatim would hit the O(N round-
trips per call) perf gap the prose flagged. Code block now
shows explicit `if t._tokens then ... else t._tokens = ... end`.
R2. enforce_budget loop can spin forever when system_prompt alone
exceeds token_budget (e.g. 5KB project block + budget=4096 +
zero turns -> turns can't shrink further but OR-condition stays
true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit
3 row shows the explicit Lua-syntax condition.
CONCERNs (FOLDED):
R3. :cost detail per-slot ~est=N annotation was semantically
undefined — accumulator sum (cumulative across calls + evicted
turns) vs current-snapshot estimate are incommensurable. §6
reworked: ONE trailing summary line "[estimated session ctx:
N tokens; token_budget=M (X% used)]" instead of per-slot
annotations. §13 commit 4 aligned.
R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT
capture by value). Subtle but easy to miss — §13 commit 4 now
spells out the correct vs wrong patterns explicitly.
R5. 2s tokenize timeout can spuriously cache-as-unsupported when
llama.cpp is busy with a concurrent completion (single-threaded
inference; /tokenize queues behind). Documented in §9; v1
ships 2s, revisit during verify if it bites.
R6. Per-endpoint cache key conflated two same-endpoint/different-
model presets (B1: /tokenize ignores the model field). Cache
key simplified to endpoint-only. One probe per endpoint per
session; if a future broker honors the model field, revisit.
NITs (APPLIED):
N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`.
N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1).
N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach
(trailing summary line; per-slot annotation dropped).
N4. Status header tree-hash updated to current (
|
||
|
|
aa64ad3eec |
docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).
5-commit roadmap, bottom-up:
1. broker.lua — M.token_count helper + per-endpoint capability
cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
per (endpoint, model) for the session. char/4 fallback on any
non-200 / parse-fail / transport err. M.tokenize_supported
introspection helper.
2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
tokens widens to use it when set, with per-turn `_tokens` cache.
char/4 path unchanged when tokenize_fn nil.
3. context.lua — enforce_budget consults token_budget too (pillar
5 from A1). Loop condition: turns>max_turns OR estimate_tokens
>token_budget. Existing summarize-on-evict callback unchanged.
4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
Closure captures active_cfg upval (A5 — follows :model switches
naturally). :cost detail extension: trailing line showing
estimated session ctx tokens for comparison with the per-slot
prompt_tokens sums in the accumulator.
5. config.lua commented `tokenize = { use_endpoint = true }`
example + PHASE8.md status -> Implement.
Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).
Two items open at plan, resolve at implement:
- exact :cost detail layout for estimated session ctx row
- whether to add a :tokenize debug meta (defer unless useful in verify)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
79bd40db79 |
docs/PHASE8-baseline: live /tokenize probes
Four findings, all align with formulate/analyze:
B1. /tokenize IGNORES the `model` request field — returns the
tokenization of whichever model is currently loaded on the
proxy backend, NOT the requested model. Acceptable: a real BPE
count is still much better than char/4, and the gap between
Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s
regardless, so cloud falls back to char/4 via the capability
cache.
B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars.
Network round-trip dominates. Per-turn _tokens cache amortizes
to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time
cost on first enforce_budget call. Acceptable.
B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs;
we use #response.tokens for count, discard the IDs). JSON not
SSE; ffi.curl.M.post is the right call.
B4. Cloud /tokenize 404s as expected. Capability cache marks it
unsupported on first probe; char/4 fallback silent thereafter.
No design change.
Q-T5 RESOLVED per B1. All open questions now resolved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1a136d81b7 |
docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)
Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).
MAJOR FINDING:
A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
with accurate tokenization, eviction decisions are unaffected.
The new estimate_tokens() would just feed the prompt template
display. Pillar 5 added: enforce_budget evicts when EITHER
max_turns OR token_budget is exceeded. This is the real
motivation for accurate tokenization.
Other findings:
A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3. Single caller of estimate_tokens today; enforce_budget becomes
the second (more frequent) caller — per-turn _tokens cache
becomes important.
A4. Q-T1: cache lives on turn dict; dies with turns on :reset.
A5. Q-T2: closure captures active_cfg upval; follows :model switch
naturally.
A6. Q-T3: opt-out skips the probe entirely (no wiring).
A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per
session; under-count bounded).
A8. _tokens cache invalidation: only :reset; turn content is
immutable after append.
A9. Probe latency ~50ms/call locally; per-turn cache amortizes to
O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
different endpoints, different code paths.
§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
00869ba412 |
docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).
Four pillars:
1. Per-endpoint /tokenize probe (cached). One round-trip on first
call per (endpoint, model); capability cached for session.
hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
tokenize — per real probe; the path is endpoint-local, not
under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
char/4 fallback.
2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
falls back to char/4 on miss. Always returns non-negative int;
never errors. 2s tight timeout; failures cache as not-supported.
3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
Context.new; uses it when present, char/4 otherwise. repl.lua
wires `tokenize_fn = function(text) return broker.token_count(
active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
Per-turn _tokens cache to amortize across estimate calls.
4. :cost detail est-vs-actual annotation. When the heuristic
disagrees with the actual prompt_tokens from broker usage by
>10%, show `~est=N`. Silent otherwise. Display-only; no
behavior change.
Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.
Baseline already observed during formulate:
- /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
- Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
- Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
(508 vs 558 on a 2KB README sample). Material for context-
budget eviction decisions.
Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).
Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1f34b6dce8 |
config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit |
||
|
|
0d6ff93134 |
repl: :cost meta surface (Phase 7 commit #5)
User-facing reporter of the per-session accumulator. Three shapes:
:cost one-line summary (calls / tokens / cost)
:cost detail per-model + per-category breakdown
:cost reset zero the meter; clears warn flags
All read-only against ctx.usage_totals; no broker calls.
R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile
cost==0 heuristic. Summary line classifies:
cloud only -> "cost=$X.XXXXXX"
cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens
but no cost field)"
local only -> "cost=$X.XXXXXX (local only; no cost field)"
R7 — :cost detail rows sort by (cost desc, model asc, category asc).
Three-level key for deterministic output across equal-cost rows
(table.sort is unstable; identical costs would otherwise reorder).
R10 — all dollar values use $%.6f formatting. Sub-cent precision is
critical: a Haiku call can cost $0.000028; $%.4f would round it to
$0.0000 — indistinguishable from local $0.
Column width widened to %-26s to fit fully-qualified cloud model
names (e.g. "anthropic/claude-haiku-4.5" = 25 chars).
E2E verified against live cloud + local broker:
:cost (empty session) -> "0 calls, $0.000000"
...after mixed-mode session...
:cost -> "5 calls, prompt=472 / completion=26
tokens, cost=$0.000377 (cloud only;
local: tokens but no cost field)"
:cost detail -> 4 rows: main cloud $0.000219, probe
cloud $0.000128, delegate cloud
$0.000030, main local $0.000000
(local). Sort by cost desc within
model.
:cost reset -> "cost meter reset"; subsequent
:cost shows zeros.
All 5 categories appeared in the same session: main (twice — cloud
+ local), delegate, probe (x2 from :safety check). Warn-threshold
firing already verified in commit #3 + #4.
HELP gains 3 :cost lines.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b30212af0f |
safety + repl: opts.category for Norris + probe (Phase 7 commit #4)
Closes the last two broker call sites that flow through safety.lua.
Together with commits #1-#3, all 7 broker call sites in aish now
attribute usage to the cost accumulator under the right category.
Changes:
safety.lua:
- llm_probe (the YES/NO destructive checker) — broker.chat call
gains opts.category = "probe". Captures (text, usage) via
(reply, second) and, when opts.on_usage is provided AND the
call succeeded, routes second through opts.on_usage(model,
category, payload). N4 signature chain: opts already flowed
through llm_second_opinion -> M.is_destructive from #52's
work; opts.on_usage rides along naturally with no further
signature change.
- M.norris_step (Norris main broker round-trip):
* opts to broker.chat_stream gains category = "norris"
* probe_opts (passed to is_destructive inside the loop)
gains on_usage = helpers.on_usage so the LLM probe's
cost lands under "probe" too
* on_delta wrapper adds elseif kind == "usage" branch that
calls helpers.on_usage(payload.model, payload.category,
payload). Coexists cleanly with the existing text (rehydrator)
and tool_call branches.
repl.lua:
- Norris helpers table gains on_usage = _record_usage. The R5
central chokepoint (commit #3) does the warn-threshold check
AND ctx:add_usage atomically.
- :safety check meta's probe_opts always carries on_usage now
(independently of whether secrets_session is set). secrets-aware
scrub_msgs/rehydrate added conditionally as before.
E2E verified against live broker (safety.llm_model = "cloud"):
- :safety check ls -la /tmp -> 2 cloud probe calls
- "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100"
- probe category visible in accumulator (would appear in :cost detail
once commit #5 ships the meta).
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8adebd52cc |
repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3)
Wires broker.lua's on_delta("usage", payload) and broker.chat's
(text, usage) return to the ctx accumulator via a single chokepoint.
Changes:
- Forward decl `local _record_usage` near _bg_spawn — same pattern;
the summarize-on-evict closure in make_summarize_fn (built at
line 299) needs lexical access to _record_usage (assigned at
line 695), so forward-declare and assign-without-`local`.
- _record_usage(model, category, usage) — R5 central chokepoint:
routes to ctx:add_usage, then checks the per-threshold warn
state. R4: cost_warn_state has two independent flags (dollars
and tokens) so first-to-fire doesn't suppress the other. R10:
warn message uses $%.6f for sub-cent precision.
- call_broker wrapper: wrapped on_delta now branches on
kind == "usage" -> _record_usage(payload.model, payload.category,
payload). R2: keys by payload.model (set inside broker.lua from
model_cfg.model). When fallback fires, broker is called with
fb_cfg, so payload.model IS the fallback's name automatically —
wrapper doesn't track primary-vs-fallback itself.
- 5 caller sites wired with opts.category:
ask_ai call_broker -> category="main"
summarize-on-evict -> category="summarize"
DELEGATE: handler -> category="delegate"
:memory summarize -> category="memory_summarize"
:delegate meta -> category="delegate"
- All 4 broker.chat call sites switched from
local reply, err = broker.chat(...)
to
local reply, second = broker.chat(...)
branching on reply nil-ness to interpret second (err on failure,
usage on success). Captured usage routes through _record_usage.
E2E verified against live cloud broker:
- cloud prompt -> reply "Hi! 👋"
- Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010"
- R10 sub-cent precision visible in both numbers.
Norris + safety paths still untouched — commit #4 wires those.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7b4a9becc2 |
context: cost/usage accumulator (Phase 7 commit #2)
Adds the per-conversation accumulator that broker.lua's
on_delta("usage", ...) payload feeds into. No callers yet —
commit #3 wires the broker callback to ctx:add_usage in repl.lua,
commit #4 in safety.lua.
Changes:
- Context.new: new fields `usage_totals = {}` and
`cost_warn_state = { dollars = false, tokens = false }`. R4:
two independent flags so warn_at_dollars firing doesn't
suppress warn_at_tokens (or vice versa).
- Context:add_usage(model_name, category, usage):
Increments usage_totals[model_name][category] slot. R6: when
usage.cost is nil (local llama.cpp per B3), sets a sticky
`is_local = true` flag on the slot AND does NOT add to cost
(preserves the local-vs-cloud-zero distinction for :cost detail
annotation). When usage.cost is a number (cloud), accumulates.
- Context:total_cost() / total_tokens() — pure-Lua summation
across all slots; total_tokens returns (prompt, completion).
- Context:reset_usage() — explicit :cost reset path; zeros
usage_totals AND clears both flags atomically.
- Context:reset() — R8 parity: does NOT clear usage_totals OR
cost_warn_state. Matches the Phase 4 memory_items / Phase 6
project rule ("ambient context survives a user-driven
conversation reset").
Smoke verified (20/20 unit cases):
- Empty zeros; cloud cost accumulation; local nil-cost preserves
is_local=true sticky; calls counter; cost summation across
multiple cloud calls; is_local sticky after a later nil-cost
call on a cloud slot; separate slots per (model, category);
:reset preserves; :reset_usage zeros both totals and flags.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7364963b00 |
broker: usage capture + opts widening (Phase 7 commit #1)
Foundation for Phase 7. broker.chat_stream now emits a third
on_delta kind ("usage") after the stream completes successfully;
broker.chat returns (text, usage). Backward-compatible — existing
callers that ignore the new kind / second value continue working
via Lua's drop-extra-returns semantics.
Changes:
- build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`.
opts.tools / opts.max_tokens / opts.include_usage / opts.category
all live inside opts now. Both internal call sites updated.
- opts.include_usage defaults to true for streaming requests; sets
`stream_options: { include_usage: true }` in the request body.
B1: required for local llama.cpp to emit usage; cloud honors as
a no-op (emits anyway).
- on_event captures `doc.usage` into a closure-local `final_usage`.
N1: the check is INDEPENDENT of the choice/delta branches — local
emits usage on choices=[] chunks (choice nil) while cloud emits
with non-empty choices + finish_reason. Both shapes funnel here.
- After curl.post_sse returns successfully (NOT on transport/api
errors), if final_usage is set, emit on_delta("usage", {prompt_tokens,
completion_tokens, total_tokens, cost, model, category}). cost is
nil for local (R6 preserves the nil vs 0 distinction the
accumulator needs). model is model_cfg.model — caller-stable per
B4 + R2 so call_broker's fallback retry attributes usage to the
fallback's model name without wrapper-side tracking.
- M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage"
alongside "text"; M.chat returns (text, usage). Without this fix
4 of 5 non-streaming categories (summarize / delegate /
memory_summarize / probe) would silently report zero usage.
Smoke verified against live hossenfelder:8082:
- CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/...
- LOCAL chat -> (text, usage); cost=NIL (correct per R6),
model=qwen-coder-7b-snappy-8k
- CLOUD stream -> on_delta("usage", {...}) with category="test"
echoed; model name caller-stable.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d4c20f09df |
docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs
Sonnet-reviewed (per the reviews-use-sonnet feedback memory).
BLOCKERs (RESOLVED in-place):
R1. M.chat would silently return (text, nil) for ALL non-streaming
callers — 4 of 5 categories (summarize/delegate/memory_summarize/
probe) flow through broker.chat, NOT chat_stream. §4 now shows
the explicit M.chat update that captures kind=="usage" alongside
"text" and returns (text, usage).
R2. call_broker fallback retry would credit usage to the wrong model
name. Fix: broker emits payload.model = model_cfg.model (which IS
the fallback's name when called with fb_cfg — chat_stream's
upvar). Wrapper keys by payload.model, NOT outer model_name. §4
+ §13 commit 3 reflect.
R3. build_request has TWO internal callers inside broker.lua itself,
not just the public surface. Plan §13 commit 1 risk row now
spells this out explicitly so the implementer doesn't read "every
caller already passes opts" as "external-only".
CONCERNs (FOLDED):
R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
suppresses the other. Split into ctx.cost_warn_state = { dollars
= false, tokens = false }; :cost reset clears both. §7 + §13.
R5. Warn-check centralization — single _record_usage helper in
repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
routes via helpers.on_usage / opts.on_usage callbacks. context.lua
stays decoupled from renderer.
R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
`is_local = true` (sticky) when ANY recorded usage had cost==nil.
`:cost detail` annotation comes from is_local flag, not a
fragile cost==0 heuristic.
R7. :cost detail sort needs 3-level deterministic key:
(cost desc, model asc, category asc) — table.sort is unstable.
R8. call_broker fallback passes opts.include_usage unchanged.
Documented as known assumption (B1 confirms both backends
accept; future-broken fallback can pass include_usage=false).
R9. :resume does NOT restore historical usage_totals. Per-turn usage
IS in session JSONL for scripting; cross-session aggregation is
Q-C2 deferred. Documented in §8.
R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
Widened to $%.6f in §6 + §7 warn message format.
NITs (APPLIED):
N1. §4 pseudocode comment notes `if doc.usage` branch is independent
of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
(
|
||
|
|
0f14dc1727 |
docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.
Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).
§13 Implementation Plan added — 6 commits, bottom-up:
1. broker.lua: usage extraction from final SSE chunk; build_request
signature widening to (model_cfg, msgs, stream, opts); on_delta
("usage", payload); chat returns (text, usage); opts.category
passthrough.
2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
total_cost / total_tokens helpers; :reset preserves both.
3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
delegate x2, summarize, memory_summarize); on_delta("usage")
branch routes to ctx:add_usage.
4. safety.lua: wire opts.category for Norris main broker + is_
destructive LLM probe; helpers.on_usage callback convention
(no new module dep — matches #52's scrub_msgs pattern).
5. repl.lua: :cost meta surface + warn-threshold check + HELP.
6. config.lua: commented cost example block + PHASE7.md status
bump to Implement.
Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.
Two items left open at plan, resolve at implement:
- is_destructive opts.on_usage vs cfg.helpers threading
- per-turn verbose mode (deferred; v1 = :cost on demand only)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2244a3f1ee |
docs/PHASE7-baseline: live broker probes for usage shape
Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.
B1. `stream_options.include_usage = true` is safely accepted by
both backends. REQUIRED for local llama.cpp to emit usage;
no-op for cloud (which emits anyway). Default-true is correct.
B2. Two emission patterns observed:
- Cloud (Bedrock): usage rides the FINAL delta chunk with
non-empty `choices` carrying finish_reason.
- Local: usage rides a SEPARATE chunk with `choices: []`
preceding `[DONE]`.
Both shapes are handled by the same `if doc.usage then ...`
check; the existing on_event choices-branch short-circuits
safely when choices is empty.
B3. `cost` field is dollar-denominated (number) and cloud-only.
Local returns `timings` instead (perf, not cost). Accumulator
captures `usage.cost` as-is; nil treated as 0. :cost detail
annotates local lines so $0 isn't misread.
B4. `doc.model` in the usage event reflects the upstream-API-version
(e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
`anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
caller-intended `model_cfg.model`, NOT `doc.model`, for stable
cross-call comparison.
B5. Usage event is always the LAST data event before `[DONE]`.
Emission of `on_delta("usage", ...)` happens after curl.post_sse
returns — one call per stream, after all text + tool_calls.
Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f0bccdec48 |
docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at |
||
|
|
3bad07b2da |
docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).
Four pillars:
1. Usage capture in broker.chat_stream — extract `usage` from the
final SSE chunk (OpenAI streaming spec with `stream_options:
{include_usage: true}`). Surface via new on_delta("usage",
payload) kind. broker.chat returns (text, usage) — backward-
compat: existing callers ignore the second value.
2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
tables (categories: main / delegate / summarize / memory_summarize
/ probe / norris, tagged at the call site via opts.category).
:reset preserves usage_totals (R8 parity with memory_items /
project). Session JSONL gains an optional `usage` field on
assistant turns for after-the-fact analysis.
3. :cost meta surface — :cost (summary), :cost detail (per-model +
per-category breakdown), :cost reset (zero the meter). Pure-Lua
read of ctx.usage_totals; no broker calls.
4. Optional warn thresholds — cfg.cost.warn_at_dollars /
warn_at_tokens emit a one-shot status when crossed. Default off;
useful with cloud presets configured.
Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.
Open at formulate:
Q-C1 — provider-without-usage handling (local llama.cpp probably)
Q-C2 — cross-session persistence (defer to phase 8)
Q-C3 — categories closed-set vs free-form
Q-C4 — does hossenfelder forward stream_options to all backends?
Q-C5 — warn fires on the call that crosses, or the next one?
Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?
Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
955bd82efb |
safety + repl: wire secrets into safety.lua (closes #52)
Closes the last #13 gap — Norris broker call + is_destructive LLM
second-opinion probe were the two egress points NOT covered by the
scrub-at-egress design in commit
|
||
|
|
ac58b19da2 |
config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: |
||
|
|
11d0e599cd |
repl + renderer: tree-sitter highlighter (Phase 6 commit #5)
The largest Phase 6 commit — fence-aware stream filter in renderer.lua
+ external tree-sitter dispatch + :highlight meta in repl.lua.
renderer.lua — fence-aware filter wrapping assistant_delta:
M.set_highlight(enabled, detected, highlight_fn)
Called by repl.lua at startup AND on every :highlight toggle.
Stores state in module-locals (off by default).
State machine inside _hl_push:
outside: pass chunks through; HOLD trailing partial-fence chars
(per R1 — local llama.cpp splits ```python as `'``'`
then `'`python\n'`, so naive pass-through drops the
leading "``" and never recovers).
inside: buffer cumulatively until "\n```" appears; emit
highlight_fn(body, lang) then the closing fence verbatim.
Recursive call handles "rest" after the closing fence.
N1: fences only open at start-of-stream OR after a newline
(`^```` or `\n```` only). Inline backticks in prose
("use ``` to mark code") do not open a fence.
R3 (PTY raw-mode toggle per highlight call): no change here — every
executor.exec call already toggles raw-mode (existing behavior
since Phase 1). The risk is theoretical; smoke-test interactively
after install if multi-fence renders show flicker.
assistant_flush handles end-of-stream gracefully: drains any held
partial-fence tail OR an unterminated inside-fence buffer.
repl.lua — _detect_treesitter + highlighted + :highlight meta:
_detect_treesitter() one-shot popen probe of `tree-sitter --version`.
Run once at startup; cached as
highlight_detected.
highlighted(body, lang_tag) R2-placed in repl.lua (has _shq +
executor access). Translates the fence
tag (`py`, `python`, `lua`, etc.) to
a canonical lang via LANG_TAG, picks
the canonical extension via LANG_EXTENSION,
writes body to a tmpfile with that
extension, runs `tree-sitter highlight
<tmpfile>` via executor.exec, returns
the output. On ANY failure (CLI absent,
non-zero exit, empty output), returns
`body` unchanged — silent pass-through.
R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help`
on noether; confirmed:
- NO `--lang` flag exists (formulate-time assumption wrong)
- takes a PATH; language inferred from file extension
- alternative `--scope source.X` exists but also unreliable
without configured grammars
Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]`
and pass the path. Matches the documented upstream contract.
B4-followup: even with the CLI installed, highlighting requires
`~/.config/tree-sitter/config.json` parser-directories with
cloned + built `tree-sitter-<lang>` grammars. Without parsers,
every call exits non-zero and we silently pass through. The
:highlight install hint surfaces all three install steps so the
user knows what's actually needed.
:highlight [on|off|status] meta:
no arg -> flip
on/off -> set explicit
status -> report toggle + CLI detection state
When toggled on AND CLI absent: emit a 4-line install hint
(CLI install, init-config, grammar clone reminder).
When toggled on AND CLI present: emit a 1-line note that
parser-directories must be set up for actual highlighting.
HELP gains :highlight entry.
Tested:
10/10 unit cases on the renderer state machine, including:
- plain prose passthrough
- single-chunk fence
- B2 split fence ("``" + "`python\n" + "x=42" + "\n```")
- N1 SOL anchor (mid-line ``` does not open)
- trailing \n properly emitted across chunks
- SOL-only fence open
- prose after closing fence preserved
- two fences in one stream
- highlight off = passthrough (callback never fires)
E2E :highlight meta verified:
:highlight status -> off / detected
:highlight on -> toggles + emits parser-dir reminder
:highlight status -> on / detected
:highlight off -> off
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config
example block + status -> Implement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0d63f01601 |
repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4)
Per A6 (tiered resolution): @<token> tries file lookup first; if the
file doesn't exist AND the token contains "..", retry as a git
ref-range and substitute with a fenced `diff` block. Preserves the
existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels
the comma, resolves the ref, restores the comma after the closing
fence).
Resolution order for @<token>:
1. io.open(token, "rb") -- file lookup, with trailing-punct peel
2. if (1) fails and token contains "..":
git --no-pager -c color.ui=never diff <r1>..<r2>
on exit 0 + non-empty body: substitute as ```diff fenced block
3. else: leave literal `@token` + emit "[aish] @X: not found" status
Examples:
@README.md -> file (path branch)
@../sibling.txt -> file (path branch; `..` only triggers retry
when path lookup FAILS, so existing
paths with `..` segments are unaffected)
@HEAD~1..HEAD -> diff (path fails, ref succeeds)
@origin/main..feature -> diff (path fails — no such literal file;
ref succeeds; `/` in ref is fine because
we don't use the path's `/`-absence as
a discriminator)
@nonsense..gibberish -> literal preserved (both fail)
Required restructuring:
- _shq and _git_clean_cmd lifted from M.run closure scope to module
scope (above expand_mentions). Single source of truth for the
B1 prefix shared with commit #3's :diff. The in-M.run duplicates
are removed.
- expand_mentions now references `executor` (already required at
module scope on line 7) for the diff retry.
Status messages updated:
- File expansion: "@<path> expanded (N bytes, truncated)" (existing)
- Diff expansion: "@<path> expanded (N bytes, diff)" (new)
Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14):
ref-range expansion shape, body contains `diff --git`, trailing
prose preserved, @../path stays as file (not diff), neither-path-
nor-ref preserves literal, trailing-comma peel composes with ref
retry.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4d5f93aaa5 |
repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3)
User-driven git diff injection. The model sees the diff on the next
ask_ai turn through the existing exec_output channel.
Changes:
- _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree.
B1: every git invocation that flows into context MUST use
`--no-pager -c color.ui=never`. Forkpty makes git think stdout
is a TTY, enabling both color and the pager's keypad/line-clear
escapes — these would pollute the captured context block. The
helper is the single chokepoint; commit #4's @<r1>..<r2> retry
will reuse it.
- :diff [<args>] meta:
- Reads cwd at meta invocation (R6: differs from :tree's
scan-time cwd capture; documented in §5).
- Runs `_git_clean_cmd("diff " .. args)` via executor.exec.
- Empty output -> "(no diff): <label>" status, no context append.
- Non-zero exit -> "diff failed (exit N): <label>" status,
no context append. git's stderr already streamed to the
user via executor.exec's live multiplex, so the failure
reason is visible.
- Success -> appends "[diff <label>]\n<output>" via
ctx:append_exec_output. Label is "(working tree)" for empty
args, else verbatim args.
- Status confirms injection size: "diff injected: <label> (N bytes)".
- HELP gains :diff line with three example arg shapes; N3-resolved
(no `staged` alias — the meta is thin pass-through to git's grammar).
Smoke verified across four scenarios in an ephemeral test repo:
- Working-tree dirty -> 110-byte diff injected, no ANSI escapes
- --cached -> 118-byte staged diff injected, clean
- garbage..nonexistent -> exit 128, status + skip
- Clean working tree -> "(no diff)", status + skip
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d1dce832da |
repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2)
First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '*/.*' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>|refresh|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c4fc7fde01 |
context: [project] block plumbing (Phase 6 commit #1)
Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
261b230be8 |
docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs
Independent agent review of PHASE6 (manifest + baseline + plan at
|