docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.
Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).
§13 Implementation Plan added — 6 commits, bottom-up:
1. broker.lua: usage extraction from final SSE chunk; build_request
signature widening to (model_cfg, msgs, stream, opts); on_delta
("usage", payload); chat returns (text, usage); opts.category
passthrough.
2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
total_cost / total_tokens helpers; :reset preserves both.
3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
delegate x2, summarize, memory_summarize); on_delta("usage")
branch routes to ctx:add_usage.
4. safety.lua: wire opts.category for Norris main broker + is_
destructive LLM probe; helpers.on_usage callback convention
(no new module dep — matches #52's scrub_msgs pattern).
5. repl.lua: :cost meta surface + warn-threshold check + HELP.
6. config.lua: commented cost example block + PHASE7.md status
bump to Implement.
Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.
Two items left open at plan, resolve at implement:
- is_destructive opts.on_usage vs cfg.helpers threading
- per-turn verbose mode (deferred; v1 = :cost on demand only)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+122
-2
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
**Project:** aish — AI-augmented conversational shell
|
**Project:** aish — AI-augmented conversational shell
|
||||||
**Document:** Phase 7 Requirements, Architecture & Design Decisions
|
**Document:** Phase 7 Requirements, Architecture & Design Decisions
|
||||||
**Status:** Analyze (formulate complete; tree at `3bad07b` probed)
|
**Status:** Plan (formulate + analyze + baseline complete; tree at `2244a3f`)
|
||||||
**Date:** 2026-05-16
|
**Date:** 2026-05-16
|
||||||
|
|
||||||
**Analyze findings (2026-05-16):**
|
**Analyze findings (2026-05-16):**
|
||||||
@@ -444,7 +444,7 @@ One-shot per session. `:cost reset` clears the flag.
|
|||||||
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
|
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
|
||||||
| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
|
| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
|
||||||
| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
|
| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
|
||||||
| Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. |
|
| Q-C4 | `stream_options` forwarding by hossenfelder | B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct. |
|
||||||
| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
|
| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
|
||||||
| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
|
| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
|
||||||
|
|
||||||
@@ -463,3 +463,123 @@ Candidate follow-ups (non-binding):
|
|||||||
Indirectly improves accuracy of any future "preflight cost predictor".
|
Indirectly improves accuracy of any future "preflight cost predictor".
|
||||||
|
|
||||||
Phase 7 itself is self-contained — no upstream dependencies.
|
Phase 7 itself is self-contained — no upstream dependencies.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Implementation Plan (commit-by-commit)
|
||||||
|
|
||||||
|
Bottom-up; broker first (it's the egress point that all callers
|
||||||
|
depend on), then context (the accumulator), then the call-site
|
||||||
|
rewires, then the user-facing meta + warn surface, then config +
|
||||||
|
status bump. Each commit leaves the tree green (existing tests +
|
||||||
|
load smoke + per-commit feature smoke).
|
||||||
|
|
||||||
|
### Order
|
||||||
|
|
||||||
|
1. **`broker.lua` — usage capture + signature widening.**
|
||||||
|
- `build_request(model_cfg, messages, stream, opts)` widened to
|
||||||
|
take an opts table; opts.tools / opts.max_tokens fold in from
|
||||||
|
the existing positional args. Opts.include_usage (default true)
|
||||||
|
adds `stream_options.include_usage = true` to the request body
|
||||||
|
(per B1, required for local).
|
||||||
|
- `M.chat_stream` event loop adds `if doc.usage then final_usage =
|
||||||
|
doc.usage end`; after `curl.post_sse` returns, if `final_usage`
|
||||||
|
is set, `on_delta("usage", payload)` is called. Payload includes
|
||||||
|
`model = model_cfg.model` (caller-stable per B4), the raw token
|
||||||
|
counts, and `cost` as a number (nil for local per B3).
|
||||||
|
- opts.category passthrough — the broker just echoes it into the
|
||||||
|
emitted usage payload; doesn't validate (per A4 free-form).
|
||||||
|
- `M.chat` (the non-streaming wrapper) returns `(text, usage)` —
|
||||||
|
backward-compatible (existing callers ignore the second value).
|
||||||
|
- Smoke: hand-build a request with stream_options, capture all
|
||||||
|
three on_delta kinds (text, tool_call when applicable, usage),
|
||||||
|
confirm usage payload matches what curl shows.
|
||||||
|
|
||||||
|
2. **`context.lua` — accumulator + helpers.**
|
||||||
|
- `Context.new`: `self.usage_totals = {}` + `self.cost_warn_fired = false`.
|
||||||
|
- `Context:add_usage(model, category, usage)` — increments
|
||||||
|
`usage_totals[model][category]` slots.
|
||||||
|
- `Context:total_cost()` — sums all cost fields across all models/categories.
|
||||||
|
- `Context:total_tokens()` — sums prompt + completion separately.
|
||||||
|
- `Context:reset` — does NOT touch `usage_totals` or `cost_warn_fired`
|
||||||
|
(R8 parity with `memory_items` and `project`).
|
||||||
|
- Smoke: 4-case inline test of add_usage / totals / reset preservation.
|
||||||
|
|
||||||
|
3. **`repl.lua` — wire opts.category + on_delta("usage") at non-Norris call sites.**
|
||||||
|
- call_broker wrapper (used by ask_ai): pass `opts.category =
|
||||||
|
"main"`; the on_delta wrapper handles `kind == "usage"` by
|
||||||
|
calling `ctx:add_usage(req_name, "main", payload)`.
|
||||||
|
- DELEGATE: handler: opts.category = "delegate".
|
||||||
|
- :delegate meta: opts.category = "delegate".
|
||||||
|
- summarize-on-evict callback: opts.category = "summarize".
|
||||||
|
- :memory summarize: opts.category = "memory_summarize".
|
||||||
|
- For broker.chat callers (non-streaming): capture the new second
|
||||||
|
return value and feed to ctx:add_usage.
|
||||||
|
- Smoke: send one cloud prompt, observe ctx.usage_totals grows.
|
||||||
|
|
||||||
|
4. **`safety.lua` — opts.category for Norris + probe.**
|
||||||
|
- safety.norris_step's broker.chat_stream call: pass opts.category =
|
||||||
|
"norris"; the helpers.on_usage callback (added to the helpers
|
||||||
|
table by repl.lua) routes back to ctx:add_usage. OR — simpler —
|
||||||
|
safety.lua wraps on_delta itself with a "usage"-kind branch that
|
||||||
|
calls helpers.on_usage.
|
||||||
|
- safety.is_destructive's llm_probe broker.chat call: pass
|
||||||
|
opts.category = "probe"; capture the (text, usage) return and
|
||||||
|
forward via opts.on_usage callback (added to is_destructive opts).
|
||||||
|
- Smoke: a Norris session shows both "norris" and "probe" category
|
||||||
|
entries in :cost detail.
|
||||||
|
|
||||||
|
5. **`repl.lua` — :cost meta + warn-threshold + HELP.**
|
||||||
|
- :cost (summary), :cost detail (per-model+category breakdown),
|
||||||
|
:cost reset (zero totals + clear cost_warn_fired).
|
||||||
|
- After every ctx:add_usage call (centralized in a helper if
|
||||||
|
possible), check cfg.cost.warn_at_dollars / warn_at_tokens;
|
||||||
|
emit one-shot status if crossed AND cost_warn_fired is false.
|
||||||
|
- HELP gains 3 lines for :cost.
|
||||||
|
- Smoke: :cost shows totals; :cost detail breaks down; warn fires
|
||||||
|
once when threshold crossed; :cost reset re-arms.
|
||||||
|
|
||||||
|
6. **`config.lua` example block + `docs/PHASE7.md` status bump.**
|
||||||
|
- Commented-out `cost = { warn_at_dollars = 0.50, warn_at_tokens
|
||||||
|
= 100000 }` block in config.lua.
|
||||||
|
- PHASE7.md status header → **Implement** (matches Phase 5/6
|
||||||
|
cadence — manifest tracks implementation state).
|
||||||
|
|
||||||
|
### Risk index per commit
|
||||||
|
|
||||||
|
| Commit | Risk | Mitigation |
|
||||||
|
|---|---|---|
|
||||||
|
| 1 (broker) | build_request signature change breaks all existing callers | All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table |
|
||||||
|
| 1 (broker) | `M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second | Lua doesn't error on dropped return values; backward-compat preserved automatically |
|
||||||
|
| 2 (context) | usage_totals nil on old ctx serializations | Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed |
|
||||||
|
| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged |
|
||||||
|
| 4 (safety wires) | safety.lua must NOT require("secrets")-style introduce new module dep | Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep |
|
||||||
|
| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | cost_warn_fired one-shot flag; explicit :cost reset to re-arm |
|
||||||
|
| 6 (config + status) | none | |
|
||||||
|
|
||||||
|
### Tests + smoke per commit
|
||||||
|
|
||||||
|
Each commit:
|
||||||
|
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
|
||||||
|
- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'`
|
||||||
|
- Pass a per-feature smoke (described in each row above)
|
||||||
|
|
||||||
|
### Things deliberately NOT split
|
||||||
|
|
||||||
|
- broker.chat backward-compat shim — Lua's multiple-return-values
|
||||||
|
semantics handle it automatically (existing `local r = broker.chat(..)`
|
||||||
|
drops the new `usage` value).
|
||||||
|
- Per-category sub-tables — flat `model -> category -> counters` is
|
||||||
|
simple enough; nesting deeper for e.g. timestamps is v2.
|
||||||
|
- Cross-session persistence — explicitly Q-C2 deferred to phase 8.
|
||||||
|
|
||||||
|
### Open at plan-time (resolve at implement)
|
||||||
|
|
||||||
|
- Whether `safety.is_destructive`'s opts should carry `on_usage`
|
||||||
|
callback explicitly OR thread through cfg.helpers (the latter
|
||||||
|
matches the Norris helpers convention but is more coupling).
|
||||||
|
Decide at commit 4. Default to explicit opts.on_usage for minimum
|
||||||
|
surface.
|
||||||
|
- Whether to emit a `[aish] usage: model=X prompt=N completion=M cost=$X`
|
||||||
|
status line PER TURN (verbose mode) or only via :cost on demand.
|
||||||
|
v1 = on demand only; verbose mode is a follow-up nice-to-have.
|
||||||
|
|||||||
Reference in New Issue
Block a user