docs/PHASE7: plan — §13 commit roadmap

Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 22:50:39 +00:00
parent 2244a3f1ee
commit 0f14dc1727
+122 -2
View File
@@ -2,7 +2,7 @@
**Project:** aish — AI-augmented conversational shell **Project:** aish — AI-augmented conversational shell
**Document:** Phase 7 Requirements, Architecture & Design Decisions **Document:** Phase 7 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; tree at `3bad07b` probed) **Status:** Plan (formulate + analyze + baseline complete; tree at `2244a3f`)
**Date:** 2026-05-16 **Date:** 2026-05-16
**Analyze findings (2026-05-16):** **Analyze findings (2026-05-16):**
@@ -444,7 +444,7 @@ One-shot per session. `:cost reset` clears the flag.
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. | | Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. | | Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. | | Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
| Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. | | Q-C4 | `stream_options` forwarding by hossenfelder | B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct. |
| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). | | Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). | | Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
@@ -463,3 +463,123 @@ Candidate follow-ups (non-binding):
Indirectly improves accuracy of any future "preflight cost predictor". Indirectly improves accuracy of any future "preflight cost predictor".
Phase 7 itself is self-contained — no upstream dependencies. Phase 7 itself is self-contained — no upstream dependencies.
---
## 13. Implementation Plan (commit-by-commit)
Bottom-up; broker first (it's the egress point that all callers
depend on), then context (the accumulator), then the call-site
rewires, then the user-facing meta + warn surface, then config +
status bump. Each commit leaves the tree green (existing tests +
load smoke + per-commit feature smoke).
### Order
1. **`broker.lua` — usage capture + signature widening.**
- `build_request(model_cfg, messages, stream, opts)` widened to
take an opts table; opts.tools / opts.max_tokens fold in from
the existing positional args. Opts.include_usage (default true)
adds `stream_options.include_usage = true` to the request body
(per B1, required for local).
- `M.chat_stream` event loop adds `if doc.usage then final_usage =
doc.usage end`; after `curl.post_sse` returns, if `final_usage`
is set, `on_delta("usage", payload)` is called. Payload includes
`model = model_cfg.model` (caller-stable per B4), the raw token
counts, and `cost` as a number (nil for local per B3).
- opts.category passthrough — the broker just echoes it into the
emitted usage payload; doesn't validate (per A4 free-form).
- `M.chat` (the non-streaming wrapper) returns `(text, usage)` —
backward-compatible (existing callers ignore the second value).
- Smoke: hand-build a request with stream_options, capture all
three on_delta kinds (text, tool_call when applicable, usage),
confirm usage payload matches what curl shows.
2. **`context.lua` — accumulator + helpers.**
- `Context.new`: `self.usage_totals = {}` + `self.cost_warn_fired = false`.
- `Context:add_usage(model, category, usage)` — increments
`usage_totals[model][category]` slots.
- `Context:total_cost()` — sums all cost fields across all models/categories.
- `Context:total_tokens()` — sums prompt + completion separately.
- `Context:reset` — does NOT touch `usage_totals` or `cost_warn_fired`
(R8 parity with `memory_items` and `project`).
- Smoke: 4-case inline test of add_usage / totals / reset preservation.
3. **`repl.lua` — wire opts.category + on_delta("usage") at non-Norris call sites.**
- call_broker wrapper (used by ask_ai): pass `opts.category =
"main"`; the on_delta wrapper handles `kind == "usage"` by
calling `ctx:add_usage(req_name, "main", payload)`.
- DELEGATE: handler: opts.category = "delegate".
- :delegate meta: opts.category = "delegate".
- summarize-on-evict callback: opts.category = "summarize".
- :memory summarize: opts.category = "memory_summarize".
- For broker.chat callers (non-streaming): capture the new second
return value and feed to ctx:add_usage.
- Smoke: send one cloud prompt, observe ctx.usage_totals grows.
4. **`safety.lua` — opts.category for Norris + probe.**
- safety.norris_step's broker.chat_stream call: pass opts.category =
"norris"; the helpers.on_usage callback (added to the helpers
table by repl.lua) routes back to ctx:add_usage. OR — simpler —
safety.lua wraps on_delta itself with a "usage"-kind branch that
calls helpers.on_usage.
- safety.is_destructive's llm_probe broker.chat call: pass
opts.category = "probe"; capture the (text, usage) return and
forward via opts.on_usage callback (added to is_destructive opts).
- Smoke: a Norris session shows both "norris" and "probe" category
entries in :cost detail.
5. **`repl.lua` — :cost meta + warn-threshold + HELP.**
- :cost (summary), :cost detail (per-model+category breakdown),
:cost reset (zero totals + clear cost_warn_fired).
- After every ctx:add_usage call (centralized in a helper if
possible), check cfg.cost.warn_at_dollars / warn_at_tokens;
emit one-shot status if crossed AND cost_warn_fired is false.
- HELP gains 3 lines for :cost.
- Smoke: :cost shows totals; :cost detail breaks down; warn fires
once when threshold crossed; :cost reset re-arms.
6. **`config.lua` example block + `docs/PHASE7.md` status bump.**
- Commented-out `cost = { warn_at_dollars = 0.50, warn_at_tokens
= 100000 }` block in config.lua.
- PHASE7.md status header → **Implement** (matches Phase 5/6
cadence — manifest tracks implementation state).
### Risk index per commit
| Commit | Risk | Mitigation |
|---|---|---|
| 1 (broker) | build_request signature change breaks all existing callers | All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table |
| 1 (broker) | `M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second | Lua doesn't error on dropped return values; backward-compat preserved automatically |
| 2 (context) | usage_totals nil on old ctx serializations | Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed |
| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged |
| 4 (safety wires) | safety.lua must NOT require("secrets")-style introduce new module dep | Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep |
| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | cost_warn_fired one-shot flag; explicit :cost reset to re-arm |
| 6 (config + status) | none | |
### Tests + smoke per commit
Each commit:
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'`
- Pass a per-feature smoke (described in each row above)
### Things deliberately NOT split
- broker.chat backward-compat shim — Lua's multiple-return-values
semantics handle it automatically (existing `local r = broker.chat(..)`
drops the new `usage` value).
- Per-category sub-tables — flat `model -> category -> counters` is
simple enough; nesting deeper for e.g. timestamps is v2.
- Cross-session persistence — explicitly Q-C2 deferred to phase 8.
### Open at plan-time (resolve at implement)
- Whether `safety.is_destructive`'s opts should carry `on_usage`
callback explicitly OR thread through cfg.helpers (the latter
matches the Norris helpers convention but is more coupling).
Decide at commit 4. Default to explicit opts.on_usage for minimum
surface.
- Whether to emit a `[aish] usage: model=X prompt=N completion=M cost=$X`
status line PER TURN (verbose mode) or only via :cost on demand.
v1 = on demand only; verbose mode is a follow-up nice-to-have.