diff --git a/docs/PHASE7.md b/docs/PHASE7.md index 110e81f..7871fc7 100644 --- a/docs/PHASE7.md +++ b/docs/PHASE7.md @@ -2,7 +2,7 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 7 Requirements, Architecture & Design Decisions -**Status:** Analyze (formulate complete; tree at `3bad07b` probed) +**Status:** Plan (formulate + analyze + baseline complete; tree at `2244a3f`) **Date:** 2026-05-16 **Analyze findings (2026-05-16):** @@ -444,7 +444,7 @@ One-shot per session. `:cost reset` clears the flag. | Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. | | Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. | | Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. | -| Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. | +| Q-C4 | `stream_options` forwarding by hossenfelder | B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct. | | Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). | | Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). | @@ -463,3 +463,123 @@ Candidate follow-ups (non-binding): Indirectly improves accuracy of any future "preflight cost predictor". Phase 7 itself is self-contained — no upstream dependencies. + +--- + +## 13. Implementation Plan (commit-by-commit) + +Bottom-up; broker first (it's the egress point that all callers +depend on), then context (the accumulator), then the call-site +rewires, then the user-facing meta + warn surface, then config + +status bump. Each commit leaves the tree green (existing tests + +load smoke + per-commit feature smoke). + +### Order + +1. **`broker.lua` — usage capture + signature widening.** + - `build_request(model_cfg, messages, stream, opts)` widened to + take an opts table; opts.tools / opts.max_tokens fold in from + the existing positional args. Opts.include_usage (default true) + adds `stream_options.include_usage = true` to the request body + (per B1, required for local). + - `M.chat_stream` event loop adds `if doc.usage then final_usage = + doc.usage end`; after `curl.post_sse` returns, if `final_usage` + is set, `on_delta("usage", payload)` is called. Payload includes + `model = model_cfg.model` (caller-stable per B4), the raw token + counts, and `cost` as a number (nil for local per B3). + - opts.category passthrough — the broker just echoes it into the + emitted usage payload; doesn't validate (per A4 free-form). + - `M.chat` (the non-streaming wrapper) returns `(text, usage)` — + backward-compatible (existing callers ignore the second value). + - Smoke: hand-build a request with stream_options, capture all + three on_delta kinds (text, tool_call when applicable, usage), + confirm usage payload matches what curl shows. + +2. **`context.lua` — accumulator + helpers.** + - `Context.new`: `self.usage_totals = {}` + `self.cost_warn_fired = false`. + - `Context:add_usage(model, category, usage)` — increments + `usage_totals[model][category]` slots. + - `Context:total_cost()` — sums all cost fields across all models/categories. + - `Context:total_tokens()` — sums prompt + completion separately. + - `Context:reset` — does NOT touch `usage_totals` or `cost_warn_fired` + (R8 parity with `memory_items` and `project`). + - Smoke: 4-case inline test of add_usage / totals / reset preservation. + +3. **`repl.lua` — wire opts.category + on_delta("usage") at non-Norris call sites.** + - call_broker wrapper (used by ask_ai): pass `opts.category = + "main"`; the on_delta wrapper handles `kind == "usage"` by + calling `ctx:add_usage(req_name, "main", payload)`. + - DELEGATE: handler: opts.category = "delegate". + - :delegate meta: opts.category = "delegate". + - summarize-on-evict callback: opts.category = "summarize". + - :memory summarize: opts.category = "memory_summarize". + - For broker.chat callers (non-streaming): capture the new second + return value and feed to ctx:add_usage. + - Smoke: send one cloud prompt, observe ctx.usage_totals grows. + +4. **`safety.lua` — opts.category for Norris + probe.** + - safety.norris_step's broker.chat_stream call: pass opts.category = + "norris"; the helpers.on_usage callback (added to the helpers + table by repl.lua) routes back to ctx:add_usage. OR — simpler — + safety.lua wraps on_delta itself with a "usage"-kind branch that + calls helpers.on_usage. + - safety.is_destructive's llm_probe broker.chat call: pass + opts.category = "probe"; capture the (text, usage) return and + forward via opts.on_usage callback (added to is_destructive opts). + - Smoke: a Norris session shows both "norris" and "probe" category + entries in :cost detail. + +5. **`repl.lua` — :cost meta + warn-threshold + HELP.** + - :cost (summary), :cost detail (per-model+category breakdown), + :cost reset (zero totals + clear cost_warn_fired). + - After every ctx:add_usage call (centralized in a helper if + possible), check cfg.cost.warn_at_dollars / warn_at_tokens; + emit one-shot status if crossed AND cost_warn_fired is false. + - HELP gains 3 lines for :cost. + - Smoke: :cost shows totals; :cost detail breaks down; warn fires + once when threshold crossed; :cost reset re-arms. + +6. **`config.lua` example block + `docs/PHASE7.md` status bump.** + - Commented-out `cost = { warn_at_dollars = 0.50, warn_at_tokens + = 100000 }` block in config.lua. + - PHASE7.md status header → **Implement** (matches Phase 5/6 + cadence — manifest tracks implementation state). + +### Risk index per commit + +| Commit | Risk | Mitigation | +|---|---|---| +| 1 (broker) | build_request signature change breaks all existing callers | All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table | +| 1 (broker) | `M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second | Lua doesn't error on dropped return values; backward-compat preserved automatically | +| 2 (context) | usage_totals nil on old ctx serializations | Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed | +| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged | +| 4 (safety wires) | safety.lua must NOT require("secrets")-style introduce new module dep | Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep | +| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | cost_warn_fired one-shot flag; explicit :cost reset to re-arm | +| 6 (config + status) | none | | + +### Tests + smoke per commit + +Each commit: +- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31) +- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'` +- Pass a per-feature smoke (described in each row above) + +### Things deliberately NOT split + +- broker.chat backward-compat shim — Lua's multiple-return-values + semantics handle it automatically (existing `local r = broker.chat(..)` + drops the new `usage` value). +- Per-category sub-tables — flat `model -> category -> counters` is + simple enough; nesting deeper for e.g. timestamps is v2. +- Cross-session persistence — explicitly Q-C2 deferred to phase 8. + +### Open at plan-time (resolve at implement) + +- Whether `safety.is_destructive`'s opts should carry `on_usage` + callback explicitly OR thread through cfg.helpers (the latter + matches the Norris helpers convention but is more coupling). + Decide at commit 4. Default to explicit opts.on_usage for minimum + surface. +- Whether to emit a `[aish] usage: model=X prompt=N completion=M cost=$X` + status line PER TURN (verbose mode) or only via :cost on demand. + v1 = on demand only; verbose mode is a follow-up nice-to-have.