Files
aish/docs/PHASE7.md
T
marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:50:39 +00:00

586 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# aish — Phase 7 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 7 Requirements, Architecture & Design Decisions
**Status:** Plan (formulate + analyze + baseline complete; tree at `2244a3f`)
**Date:** 2026-05-16
**Analyze findings (2026-05-16):**
A1. **broker.chat_stream surface is clean for the extension.** The
existing `on_event(data)` closure inside `M.chat_stream` already
parses `doc.error` / `doc.choices` / `delta` / tool_calls — adding
`if doc.usage then final_usage = ... end` is one block. Emission
happens via a closure-local `final_usage` that the post-loop code
in `chat_stream` reads and calls `on_delta("usage", final_usage)`
on. `build_request` needs minor extension OR (cleaner) `chat_stream`
inserts `stream_options.include_usage = true` into the body table
AFTER `json.encode` — but we currently encode in `build_request`.
Cleanest: extend `build_request(model_cfg, messages, stream, opts)`
so it can read `opts.include_usage`. Phase 7 simplifies the
signature in passing.
A2. **7 caller sites** identified for `opts.category` threading:
| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |
All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).
A3. **`build_request` signature simplification.** Today it takes
`(model_cfg, messages, stream, tools, max_tokens)` — five positional
args. With Phase 7 needing `include_usage` AND `stream_options`,
positional growth gets unwieldy. **Resolution:** widen to
`(model_cfg, messages, stream, opts)` where opts carries
`{tools, max_tokens, include_usage, stream_options}`. Callers in
`M.chat_stream` and `M.chat` pass their existing opts table through.
This is a refactor but contained inside broker.lua.
A4. **Q-C3 RESOLVED: free-form categories.** The closed-set vs free-form
debate resolved in favor of free-form per the helpers/skills
convention already in place (Phase 6 :tree / :diff metas don't
validate sub-args either). `:cost detail` will show whatever
categories appear — small + documented closed set in practice
(7 entries from A2), no surprise.
A5. **Q-C5 RESOLVED: warn fires on the call that crossed.** The crossed
call's usage IS in the accumulator at the moment we check (we
check AFTER `add_usage`). Firing on the NEXT call would mean a
delay of one full broker round-trip before the user sees the
warn — defeats the purpose. Just emit-on-cross.
A6. **Q-C6 RESOLVED: `:reset` does NOT clear `cost_warn_fired`.**
Parity with `usage_totals` itself (per the §2 decision row); the
user reset their conversation, not their cost meter. The flag
AND the totals are reset only by the explicit `:cost reset` verb.
A7. **Norris call-graph rewires (existing safety.lua:354 path):** with
issue #52 wired (commit `955bd82`), the Norris broker call now
passes `helpers.scrub_msgs` / `helpers.streaming_rehydrator`. The
on_delta wrapping pattern means I need to be careful that the new
`("usage", payload)` kind also flows through any wrapper. Since
secrets streaming_rehydrator only matches on `kind == "text"`, the
"usage" kind passes through unchanged. No new entanglement.
A8. **`ctx.usage_totals` survives `:reset` per R8** — same invariant
as `memory_items` (Phase 4) and `project` (Phase 6). Documented in
§5 of the manifest; reinforces the "ambient context survives
conversation reset" rule.
A9. **Session JSONL serialization** — assistant turn dict gets an
optional `usage` field. `history.lua` log_turn currently calls
`json.encode(turn)` opaquely; the dkjson serializer handles nested
tables. No code change needed; the new field flows through
automatically when the assistant turn carries one.
A10. **Q-C1 PARTIAL: local providers may not emit `usage`.** The
formulate-time assumption was "treat absence as zero-cost / unknown".
A real probe against `qwen-coder-7b-snappy-8k` is a baseline
action — see B-probes below. The implementation will be defensive:
if `doc.usage` never appears in the stream, no "usage" event is
emitted, and the accumulator is unchanged for that turn. `:cost`
output naturally reflects "0 calls counted for local model" if
that's the case.
A11. **Q-C4 deferred to baseline**: actual `stream_options` forwarding
by the hossenfelder proxy must be probed against a live broker.
If the proxy strips the option, we get no `usage` events even
for cloud calls. Baseline action.
PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest
specifies what Phase 7 adds — **cost / usage observability**: the ability
to know, mid-session, how many tokens you've spent and how much money the
paid-cloud calls have cost.
PHASE0 §11 originally listed phases only through 6; this commit amends
§11 to add Phase 7.
---
## 1. Scope of Phase 7
Four pillars:
1. **Usage capture in broker**`broker.chat_stream` extracts the
provider's `usage` block (and `cost` where present) from the response
stream. Surfaces it to the caller via a new `on_delta("usage", ...)`
kind. The existing `broker.chat` buffering wrapper exposes it as a
second return value `(text, usage)`. Backward-compatible: callers
that don't handle the new kind / second value simply ignore it.
2. **Per-session accumulator on `ctx`** — running totals per-model AND
per-call-category (main / delegate / summarize / probe) accumulate on
`ctx.usage_totals`. No persistence across sessions in v1 (Q-C2
defers cross-session); the session-log JSONL files DO carry per-turn
usage so historical analysis is possible after the fact.
3. **`:cost` meta** — a `:cost` reporter that shows the current session
totals, with optional `:cost detail` for the per-model + per-category
breakdown. Zero broker calls (purely local read of `ctx.usage_totals`).
4. **Optional warning thresholds**`cfg.cost.warn_at_dollars` and
`cfg.cost.warn_at_tokens` emit a status the first time the running
total crosses the configured threshold. Default off (no warnings
without config). Useful when cloud presets are configured and you
want a "you've spent $1 this session" nudge before runaway cost.
**Phase 7 is done when:**
- `broker.chat_stream` exposes usage via the new `on_delta("usage", ...)`
callback kind; `broker.chat` returns `(text, usage)`. Backward compat
preserved (no existing caller breaks).
- After a session with mixed local + cloud calls, `:cost` prints a
total like:
```
[aish] session usage: 24 turns, prompt=12,450 / completion=3,210 tokens
cost=$0.0234 (cloud only; local: 0)
```
- `:cost detail` breaks down by model + category:
```
fast main: 14 turns, 8200/2100 tokens
cloud main: 8 turns, 3850/980 tokens, $0.0180
cloud delegate: 1 turn, 250/80 tokens, $0.0012
cloud probe: 1 turn, 150/30 tokens, $0.0042
```
- Session JSONL gains a `usage` field on assistant turns (when the
broker returned one).
- With `cfg.cost.warn_at_dollars = 0.50` set, crossing $0.50 cumulative
emits exactly one status line.
- Existing configs without `cfg.cost` behave exactly like Phase 6
(Phase 6 regression coverage).
---
## 2. Technology Decisions (delta from Phase 6)
| Decision | Choice | Rationale |
|---|---|---|
| Where to extract usage | In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk | The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). |
| New on_delta kind | `on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })` | Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible. |
| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.) |
| Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. |
| Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). |
| Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). |
| Cost precision | Store as `number` (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision) | No floating-point cumulative-error concerns at this scale. |
| Warning trigger | First crossing of either threshold emits a single status: `[aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY`. Crossed-flag stored on ctx; reset only on session end / `:cost reset`. | One-shot to avoid spamming. |
| `:reset` interaction | `:reset` does NOT clear `ctx.usage_totals` (parity with `memory_items`/`project`) — the user reset their conversation, not their cost tracking. `:cost reset` is the explicit reset verb. | Matches R8 invariant from Phase 6. |
| Session-log persistence | Assistant turn entries gain an optional `usage` field when broker returned one. `history.lua` log_turn writes it through verbatim. | Per-turn granularity preserved for after-the-fact analysis. No new file. |
---
## 3. Module Changes
| File | State after Phase 6 | Phase 7 changes |
|---|---|---|
| `broker.lua` | `chat_stream(cfg, msgs, on_delta, opts)` with text + tool_call kinds; `chat` returns text | Extract usage from final SSE chunk; emit `on_delta("usage", payload)`; `chat` returns `(text, usage)`. New `opts.include_usage` (default true); new `opts.category` (passed through as a tag in the usage payload). |
| `context.lua` | system prompt + turns + memory + project + summary | Add `self.usage_totals` (table) + `self.cost_warn_fired` (bool). New helpers: `Context:add_usage(model, category, usage)`, `Context:total_cost()`, `Context:total_tokens()`. `Context:reset` does NOT clear `usage_totals` (parity with memory_items / project per R8). |
| `repl.lua` | ask_ai + delegate + summarize callbacks + Norris helpers | Wire `opts.category` at each broker call site (main / delegate / summarize / memory_summarize). Wire `on_delta("usage", ...)` -> `ctx:add_usage(...)`. New `:cost` and `:cost detail` / `:cost reset` metas. Cost-warn check after each `add_usage` call. |
| `safety.lua` | norris_step + is_destructive | Pass `opts.category = "norris"` (for the main chat_stream call) and `"probe"` (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since `safety.llm_model = "cloud"` is the recommended setting. |
| `history.lua` | session.log_turn appends JSONL entries | log_turn already takes turn opaquely; assistant turns will carry `usage` if present and it'll serialize via dkjson. No code change unless filter desired. |
| `config.lua` | example blocks for mcp/safety/memory/routing/secrets/hooks/project | Add commented-out `cost = { warn_at_dollars, warn_at_tokens }` block. |
| `docs/PHASE0.md` | §11 lists phases 0-6 | **Amendment**: add Phase 7 row to §11. |
No new module files.
---
## 4. Pillar 1 — Usage capture in broker
### SSE shape (provider-by-provider — confirm in baseline)
For OpenAI-compatible streams with `stream_options: { include_usage: true }`:
```json
data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hi"}, ...}]}
data: {"id":"...","choices":[{"index":0,"delta":{}, "finish_reason":"stop"}]}
data: {"id":"...","choices":[],"usage":{"prompt_tokens":15,"completion_tokens":3,"total_tokens":18,"cost":0.00004,"cost_details":{...}}}
data: [DONE]
```
The final usage event arrives AFTER `finish_reason` but BEFORE `[DONE]`.
`choices` is empty `[]` on the usage event.
For non-streaming `chat`: usage is in the response body at the top level.
broker.chat is a wrapper around chat_stream, so it inherits the on_delta
path.
For local llama.cpp via hossenfelder: usage may or may not be present
depending on the proxy's version. Treat absence as zero-cost / unknown.
### Extraction algorithm
```lua
local final_usage = nil
local function on_event(data)
...
if doc.usage then
-- Provider sent usage; capture for emission after the stream.
final_usage = {
prompt_tokens = doc.usage.prompt_tokens or 0,
completion_tokens = doc.usage.completion_tokens or 0,
total_tokens = doc.usage.total_tokens or 0,
cost = doc.usage.cost, -- nil for local
model = doc.model or model_cfg.model,
}
-- Don't emit yet — the [DONE] event marks stream end; emit
-- once we exit the curl.post_sse loop so the caller sees
-- usage as the LAST event in the stream order.
end
-- ... existing text + tool_call handling ...
end
-- After curl.post_sse returns (stream complete):
if final_usage then on_delta("usage", final_usage) end
```
### Outbound include_usage
```lua
local body_table = { model = ..., messages = ..., stream = true }
if opts.include_usage ~= false then
body_table.stream_options = { include_usage = true }
end
```
Risk: some providers reject unrecognized fields. Baseline check; if any
host throws on `stream_options`, the per-model opt-out is one line.
### Category tagging
`opts.category` is a string set by the caller. broker echoes it into the
emitted usage payload so the accumulator knows what to credit. Default
category if absent: `"main"`.
---
## 5. Pillar 2 — Accumulator on ctx
### Shape
```lua
ctx.usage_totals = {
-- [model_name] = { [category] = { prompt = N, completion = N,
-- calls = N, cost = N } }
fast = {
main = { prompt = 1234, completion = 567, calls = 14, cost = 0 },
},
cloud = {
main = { prompt = 3850, completion = 980, calls = 8, cost = 0.0180 },
delegate = { prompt = 250, completion = 80, calls = 1, cost = 0.0012 },
probe = { prompt = 150, completion = 30, calls = 1, cost = 0.0042 },
},
}
ctx.cost_warn_fired = false
```
### add_usage
```lua
function Context:add_usage(model, category, u)
model = model or "?"
category = category or "main"
self.usage_totals = self.usage_totals or {}
local m = self.usage_totals[model] or {}
local c = m[category] or { prompt = 0, completion = 0, calls = 0, cost = 0 }
c.prompt = c.prompt + (u.prompt_tokens or 0)
c.completion = c.completion + (u.completion_tokens or 0)
c.calls = c.calls + 1
c.cost = c.cost + (u.cost or 0)
m[category] = c
self.usage_totals[model] = m
end
function Context:total_cost()
local total = 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do total = total + c.cost end
end
return total
end
function Context:total_tokens()
local p, comp = 0, 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do
p = p + c.prompt
comp = comp + c.completion
end
end
return p, comp
end
```
### Reset semantics
`Context:reset()` deliberately does NOT clear `usage_totals` —
matches R8 invariant from Phase 6 (`:reset` clears `turns`,
`pending_exec_output`, `summary`; preserves `memory_items`, `project`,
and now `usage_totals`). The user reset their conversation, not their
cost meter. `:cost reset` is the explicit reset verb for the meter.
---
## 6. Pillar 3 — `:cost` meta
```
:cost summary line
:cost detail per-model + per-category breakdown
:cost reset zero out ctx.usage_totals + cost_warn_fired
```
Summary format:
```
[aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
cost=$0.0234 (cloud only; local: 0)
```
Detail format (sorted by total cost desc, then by model):
```
[aish] session usage detail:
cloud main 8 calls, 3,850 / 980 tokens, $0.0180
cloud delegate 1 call, 250 / 80 tokens, $0.0012
cloud probe 1 call, 150 / 30 tokens, $0.0042
fast main 14 calls, 8,200 / 2,100 tokens, $0 (local)
```
Implementation: pure Lua iteration over `ctx.usage_totals`; no broker
calls. Sorting uses `table.sort` on a flattened list.
---
## 7. Pillar 4 — Warning thresholds
Config:
```lua
cost = {
warn_at_dollars = 0.50, -- emit once when cumulative cost crosses
warn_at_tokens = 100000, -- emit once when cumulative tokens crosses
}
```
After every `ctx:add_usage`, check:
```lua
if config.cost and not ctx.cost_warn_fired then
local cost = ctx:total_cost()
if config.cost.warn_at_dollars and cost >= config.cost.warn_at_dollars then
renderer.status(("session cost $%.4f has crossed warn_at_dollars=$%.4f")
:format(cost, config.cost.warn_at_dollars))
ctx.cost_warn_fired = true
end
-- (similar for warn_at_tokens; share the flag or use two)
end
```
One-shot per session. `:cost reset` clears the flag.
---
## 8. UX Surface Summary
| Meta | Behavior |
|---|---|
| `:cost` | One-line summary: calls / tokens / cost |
| `:cost detail` | Per-model + per-category breakdown |
| `:cost reset` | Zero out totals + clear warn-fired flag |
| Config | Default | Effect |
|---|---|---|
| `cfg.cost.warn_at_dollars` | nil | Status when cumulative cost first crosses this dollar amount |
| `cfg.cost.warn_at_tokens` | nil | Status when cumulative total tokens first crosses |
| (broker `opts.include_usage`) | true | Adds `stream_options.include_usage = true` to outbound request |
---
## 9. Out of Scope (Phase 7)
- **Cross-session cost persistence** — Q-C2 defers `<history.dir>/cost.jsonl`
rollup; v1 is session-only. Per-turn usage IS in the session JSONL for
after-the-fact aggregation if anyone wants to script it.
- **Per-model rate limiting / cost caps that REFUSE the call** — v1 only
warns. A future phase could add a hard cap that aborts before the
broker call.
- **Pricing-table fallback for local models** — if a local model doesn't
emit `usage.cost`, we record 0. Estimating cost from token count + a
static pricing table is a future polish (most users won't care about
local "cost" anyway — local is free).
- **Pretty token-bandwidth charts / sparklines** — out of scope; the
detail breakdown is text-only.
- **Estimated cost for future turns** — no preflight cost prediction.
- **MCP tool-call usage** — MCP servers don't expose token usage;
broker calls invoked DURING MCP tool dispatch ARE captured (because
they go through the same path), but the MCP tool call itself isn't.
---
## 10. Risks
| Risk | Mitigation |
|---|---|
| Some providers reject `stream_options` -> SSE errors at the top of the stream | `opts.include_usage = false` opt-out per call site; baseline-time probe of the actual hossenfelder broker behavior |
| OpenRouter `cost` field shape varies between providers (Bedrock vs. Baidu vs. Together vs. ...) | Capture `usage.cost` as-is (number); document that the same provider must be used for cross-call comparison |
| Local llama.cpp returns no `cost` -> displayed `$0` could mislead user "is this REALLY free?" | `:cost detail` annotates local lines with `(local)` literal; summary says `cost=$X (cloud only; local: 0)` |
| `ctx.usage_totals` grows unboundedly with new model names mid-session | Bounded by `#models in config` × `#categories` — small constants. No mitigation needed. |
| Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold | Acceptable for v1; user can `:cost reset` to re-arm. Future polish: warn at each Nx multiple. |
---
## 11. Open Questions (Phase 7)
| # | Question | Impact | Resolution target |
|---|---|---|---|
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
| Q-C4 | `stream_options` forwarding by hossenfelder | B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct. |
| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
---
## 12. Phase 7 → Phase 8+ Out-of-band
Candidate follow-ups (non-binding):
- **Phase 8**: cross-session cost persistence (Q-C2 deferral), with
optional cost dashboards / weekly rollup reporter.
- **Hard rate limits / cost caps that REFUSE the call** — an extension
of the warn surface that promotes warnings into preflight enforcement.
- **Better tokenization** (Q1 deferred-from-Phase-3): replace the char/4
heuristic on `Context:estimate_tokens()` with model `/tokenize` calls.
Indirectly improves accuracy of any future "preflight cost predictor".
Phase 7 itself is self-contained — no upstream dependencies.
---
## 13. Implementation Plan (commit-by-commit)
Bottom-up; broker first (it's the egress point that all callers
depend on), then context (the accumulator), then the call-site
rewires, then the user-facing meta + warn surface, then config +
status bump. Each commit leaves the tree green (existing tests +
load smoke + per-commit feature smoke).
### Order
1. **`broker.lua` — usage capture + signature widening.**
- `build_request(model_cfg, messages, stream, opts)` widened to
take an opts table; opts.tools / opts.max_tokens fold in from
the existing positional args. Opts.include_usage (default true)
adds `stream_options.include_usage = true` to the request body
(per B1, required for local).
- `M.chat_stream` event loop adds `if doc.usage then final_usage =
doc.usage end`; after `curl.post_sse` returns, if `final_usage`
is set, `on_delta("usage", payload)` is called. Payload includes
`model = model_cfg.model` (caller-stable per B4), the raw token
counts, and `cost` as a number (nil for local per B3).
- opts.category passthrough — the broker just echoes it into the
emitted usage payload; doesn't validate (per A4 free-form).
- `M.chat` (the non-streaming wrapper) returns `(text, usage)` —
backward-compatible (existing callers ignore the second value).
- Smoke: hand-build a request with stream_options, capture all
three on_delta kinds (text, tool_call when applicable, usage),
confirm usage payload matches what curl shows.
2. **`context.lua` — accumulator + helpers.**
- `Context.new`: `self.usage_totals = {}` + `self.cost_warn_fired = false`.
- `Context:add_usage(model, category, usage)` — increments
`usage_totals[model][category]` slots.
- `Context:total_cost()` — sums all cost fields across all models/categories.
- `Context:total_tokens()` — sums prompt + completion separately.
- `Context:reset` — does NOT touch `usage_totals` or `cost_warn_fired`
(R8 parity with `memory_items` and `project`).
- Smoke: 4-case inline test of add_usage / totals / reset preservation.
3. **`repl.lua` — wire opts.category + on_delta("usage") at non-Norris call sites.**
- call_broker wrapper (used by ask_ai): pass `opts.category =
"main"`; the on_delta wrapper handles `kind == "usage"` by
calling `ctx:add_usage(req_name, "main", payload)`.
- DELEGATE: handler: opts.category = "delegate".
- :delegate meta: opts.category = "delegate".
- summarize-on-evict callback: opts.category = "summarize".
- :memory summarize: opts.category = "memory_summarize".
- For broker.chat callers (non-streaming): capture the new second
return value and feed to ctx:add_usage.
- Smoke: send one cloud prompt, observe ctx.usage_totals grows.
4. **`safety.lua` — opts.category for Norris + probe.**
- safety.norris_step's broker.chat_stream call: pass opts.category =
"norris"; the helpers.on_usage callback (added to the helpers
table by repl.lua) routes back to ctx:add_usage. OR — simpler —
safety.lua wraps on_delta itself with a "usage"-kind branch that
calls helpers.on_usage.
- safety.is_destructive's llm_probe broker.chat call: pass
opts.category = "probe"; capture the (text, usage) return and
forward via opts.on_usage callback (added to is_destructive opts).
- Smoke: a Norris session shows both "norris" and "probe" category
entries in :cost detail.
5. **`repl.lua` — :cost meta + warn-threshold + HELP.**
- :cost (summary), :cost detail (per-model+category breakdown),
:cost reset (zero totals + clear cost_warn_fired).
- After every ctx:add_usage call (centralized in a helper if
possible), check cfg.cost.warn_at_dollars / warn_at_tokens;
emit one-shot status if crossed AND cost_warn_fired is false.
- HELP gains 3 lines for :cost.
- Smoke: :cost shows totals; :cost detail breaks down; warn fires
once when threshold crossed; :cost reset re-arms.
6. **`config.lua` example block + `docs/PHASE7.md` status bump.**
- Commented-out `cost = { warn_at_dollars = 0.50, warn_at_tokens
= 100000 }` block in config.lua.
- PHASE7.md status header → **Implement** (matches Phase 5/6
cadence — manifest tracks implementation state).
### Risk index per commit
| Commit | Risk | Mitigation |
|---|---|---|
| 1 (broker) | build_request signature change breaks all existing callers | All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table |
| 1 (broker) | `M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second | Lua doesn't error on dropped return values; backward-compat preserved automatically |
| 2 (context) | usage_totals nil on old ctx serializations | Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed |
| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged |
| 4 (safety wires) | safety.lua must NOT require("secrets")-style introduce new module dep | Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep |
| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | cost_warn_fired one-shot flag; explicit :cost reset to re-arm |
| 6 (config + status) | none | |
### Tests + smoke per commit
Each commit:
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'`
- Pass a per-feature smoke (described in each row above)
### Things deliberately NOT split
- broker.chat backward-compat shim — Lua's multiple-return-values
semantics handle it automatically (existing `local r = broker.chat(..)`
drops the new `usage` value).
- Per-category sub-tables — flat `model -> category -> counters` is
simple enough; nesting deeper for e.g. timestamps is v2.
- Cross-session persistence — explicitly Q-C2 deferred to phase 8.
### Open at plan-time (resolve at implement)
- Whether `safety.is_destructive`'s opts should carry `on_usage`
callback explicitly OR thread through cfg.helpers (the latter
matches the Norris helpers convention but is more coupling).
Decide at commit 4. Default to explicit opts.on_usage for minimum
surface.
- Whether to emit a `[aish] usage: model=X prompt=N completion=M cost=$X`
status line PER TURN (verbose mode) or only via :cost on demand.
v1 = on demand only; verbose mode is a follow-up nice-to-have.