Files

T

marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap

Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 22:50:39 +00:00

29 KiB

Raw Blame History

aish — Phase 7 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 7 Requirements, Architecture & Design Decisions Status: Plan (formulate + analyze + baseline complete; tree at 2244a3f) Date: 2026-05-16

Analyze findings (2026-05-16):

A1. broker.chat_stream surface is clean for the extension. The existing on_event(data) closure inside M.chat_stream already parses doc.error / doc.choices / delta / tool_calls — adding if doc.usage then final_usage = ... end is one block. Emission happens via a closure-local final_usage that the post-loop code in chat_stream reads and calls on_delta("usage", final_usage) on. build_request needs minor extension OR (cleaner) chat_stream inserts stream_options.include_usage = true into the body table AFTER json.encode — but we currently encode in build_request. Cleanest: extend build_request(model_cfg, messages, stream, opts) so it can read opts.include_usage. Phase 7 simplifies the signature in passing.

A2. 7 caller sites identified for opts.category threading:

| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |

All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).

A3. build_request signature simplification. Today it takes (model_cfg, messages, stream, tools, max_tokens) — five positional args. With Phase 7 needing include_usage AND stream_options, positional growth gets unwieldy. Resolution: widen to (model_cfg, messages, stream, opts) where opts carries {tools, max_tokens, include_usage, stream_options}. Callers in M.chat_stream and M.chat pass their existing opts table through. This is a refactor but contained inside broker.lua.

A4. Q-C3 RESOLVED: free-form categories. The closed-set vs free-form debate resolved in favor of free-form per the helpers/skills convention already in place (Phase 6 :tree / :diff metas don't validate sub-args either). :cost detail will show whatever categories appear — small + documented closed set in practice (7 entries from A2), no surprise.

A5. Q-C5 RESOLVED: warn fires on the call that crossed. The crossed call's usage IS in the accumulator at the moment we check (we check AFTER add_usage). Firing on the NEXT call would mean a delay of one full broker round-trip before the user sees the warn — defeats the purpose. Just emit-on-cross.

A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired. Parity with usage_totals itself (per the §2 decision row); the user reset their conversation, not their cost meter. The flag AND the totals are reset only by the explicit :cost reset verb.

A7. Norris call-graph rewires (existing safety.lua:354 path): with issue #52 wired (commit 955bd82), the Norris broker call now passes helpers.scrub_msgs / helpers.streaming_rehydrator. The on_delta wrapping pattern means I need to be careful that the new ("usage", payload) kind also flows through any wrapper. Since secrets streaming_rehydrator only matches on kind == "text", the "usage" kind passes through unchanged. No new entanglement.

A8. ctx.usage_totals survives :reset per R8 — same invariant as memory_items (Phase 4) and project (Phase 6). Documented in §5 of the manifest; reinforces the "ambient context survives conversation reset" rule.

A9. Session JSONL serialization — assistant turn dict gets an optional usage field. history.lua log_turn currently calls json.encode(turn) opaquely; the dkjson serializer handles nested tables. No code change needed; the new field flows through automatically when the assistant turn carries one.

A10. Q-C1 PARTIAL: local providers may not emit usage. The formulate-time assumption was "treat absence as zero-cost / unknown". A real probe against qwen-coder-7b-snappy-8k is a baseline action — see B-probes below. The implementation will be defensive: if doc.usage never appears in the stream, no "usage" event is emitted, and the accumulator is unchanged for that turn. :cost output naturally reflects "0 calls counted for local model" if that's the case.

A11. Q-C4 deferred to baseline: actual stream_options forwarding by the hossenfelder proxy must be probed against a live broker. If the proxy strips the option, we get no usage events even for cloud calls. Baseline action.

PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest specifies what Phase 7 adds — cost / usage observability: the ability to know, mid-session, how many tokens you've spent and how much money the paid-cloud calls have cost.

PHASE0 §11 originally listed phases only through 6; this commit amends §11 to add Phase 7.

1. Scope of Phase 7

Four pillars:

Usage capture in broker — broker.chat_stream extracts the provider's usage block (and cost where present) from the response stream. Surfaces it to the caller via a new on_delta("usage", ...) kind. The existing broker.chat buffering wrapper exposes it as a second return value (text, usage). Backward-compatible: callers that don't handle the new kind / second value simply ignore it.
Per-session accumulator on ctx — running totals per-model AND per-call-category (main / delegate / summarize / probe) accumulate on ctx.usage_totals. No persistence across sessions in v1 (Q-C2 defers cross-session); the session-log JSONL files DO carry per-turn usage so historical analysis is possible after the fact.
:cost meta — a :cost reporter that shows the current session totals, with optional :cost detail for the per-model + per-category breakdown. Zero broker calls (purely local read of ctx.usage_totals).
Optional warning thresholds — cfg.cost.warn_at_dollars and cfg.cost.warn_at_tokens emit a status the first time the running total crosses the configured threshold. Default off (no warnings without config). Useful when cloud presets are configured and you want a "you've spent $1 this session" nudge before runaway cost.

Phase 7 is done when:

broker.chat_stream exposes usage via the new on_delta("usage", ...) callback kind; broker.chat returns (text, usage). Backward compat preserved (no existing caller breaks).

After a session with mixed local + cloud calls, :cost prints a total like:

[aish] session usage: 24 turns, prompt=12,450 / completion=3,210 tokens
                                cost=$0.0234 (cloud only; local: 0)

:cost detail breaks down by model + category:

fast    main: 14 turns, 8200/2100 tokens
cloud   main: 8 turns, 3850/980 tokens, $0.0180
cloud   delegate: 1 turn, 250/80 tokens, $0.0012
cloud   probe: 1 turn, 150/30 tokens, $0.0042

Session JSONL gains a usage field on assistant turns (when the broker returned one).
With cfg.cost.warn_at_dollars = 0.50 set, crossing $0.50 cumulative emits exactly one status line.
Existing configs without cfg.cost behave exactly like Phase 6 (Phase 6 regression coverage).

2. Technology Decisions (delta from Phase 6)

Decision	Choice	Rationale
Where to extract usage	In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk	The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline).
New on_delta kind	`on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })`	Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible.
Where to enable usage on the wire	`opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body	Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.)
Accumulator location	`ctx.usage_totals[model_name][category]` table	ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place.
Categories	`"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"`	One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`).
Cost extraction	`usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0.	Single field name across all observed providers (per baseline B7 — to be confirmed).
Cost precision	Store as `number` (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision)	No floating-point cumulative-error concerns at this scale.
Warning trigger	First crossing of either threshold emits a single status: `[aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY`. Crossed-flag stored on ctx; reset only on session end / `:cost reset`.	One-shot to avoid spamming.
`:reset` interaction	`:reset` does NOT clear `ctx.usage_totals` (parity with `memory_items`/`project`) — the user reset their conversation, not their cost tracking. `:cost reset` is the explicit reset verb.	Matches R8 invariant from Phase 6.
Session-log persistence	Assistant turn entries gain an optional `usage` field when broker returned one. `history.lua` log_turn writes it through verbatim.	Per-turn granularity preserved for after-the-fact analysis. No new file.

3. Module Changes

File	State after Phase 6	Phase 7 changes
`broker.lua`	`chat_stream(cfg, msgs, on_delta, opts)` with text + tool_call kinds; `chat` returns text	Extract usage from final SSE chunk; emit `on_delta("usage", payload)`; `chat` returns `(text, usage)`. New `opts.include_usage` (default true); new `opts.category` (passed through as a tag in the usage payload).
`context.lua`	system prompt + turns + memory + project + summary	Add `self.usage_totals` (table) + `self.cost_warn_fired` (bool). New helpers: `Context:add_usage(model, category, usage)`, `Context:total_cost()`, `Context:total_tokens()`. `Context:reset` does NOT clear `usage_totals` (parity with memory_items / project per R8).
`repl.lua`	ask_ai + delegate + summarize callbacks + Norris helpers	Wire `opts.category` at each broker call site (main / delegate / summarize / memory_summarize). Wire `on_delta("usage", ...)` -> `ctx:add_usage(...)`. New `:cost` and `:cost detail` / `:cost reset` metas. Cost-warn check after each `add_usage` call.
`safety.lua`	norris_step + is_destructive	Pass `opts.category = "norris"` (for the main chat_stream call) and `"probe"` (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since `safety.llm_model = "cloud"` is the recommended setting.
`history.lua`	session.log_turn appends JSONL entries	log_turn already takes turn opaquely; assistant turns will carry `usage` if present and it'll serialize via dkjson. No code change unless filter desired.
`config.lua`	example blocks for mcp/safety/memory/routing/secrets/hooks/project	Add commented-out `cost = { warn_at_dollars, warn_at_tokens }` block.
`docs/PHASE0.md`	§11 lists phases 0-6	Amendment: add Phase 7 row to §11.

No new module files.

4. Pillar 1 — Usage capture in broker

SSE shape (provider-by-provider — confirm in baseline)

For OpenAI-compatible streams with stream_options: { include_usage: true }:

data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hi"}, ...}]}
data: {"id":"...","choices":[{"index":0,"delta":{}, "finish_reason":"stop"}]}
data: {"id":"...","choices":[],"usage":{"prompt_tokens":15,"completion_tokens":3,"total_tokens":18,"cost":0.00004,"cost_details":{...}}}
data: [DONE]

The final usage event arrives AFTER finish_reason but BEFORE [DONE]. choices is empty [] on the usage event.

For non-streaming chat: usage is in the response body at the top level. broker.chat is a wrapper around chat_stream, so it inherits the on_delta path.

For local llama.cpp via hossenfelder: usage may or may not be present depending on the proxy's version. Treat absence as zero-cost / unknown.

Extraction algorithm

local final_usage = nil

local function on_event(data)
    ...
    if doc.usage then
        -- Provider sent usage; capture for emission after the stream.
        final_usage = {
            prompt_tokens     = doc.usage.prompt_tokens or 0,
            completion_tokens = doc.usage.completion_tokens or 0,
            total_tokens      = doc.usage.total_tokens or 0,
            cost              = doc.usage.cost,   -- nil for local
            model             = doc.model or model_cfg.model,
        }
        -- Don't emit yet — the [DONE] event marks stream end; emit
        -- once we exit the curl.post_sse loop so the caller sees
        -- usage as the LAST event in the stream order.
    end
    -- ... existing text + tool_call handling ...
end

-- After curl.post_sse returns (stream complete):
if final_usage then on_delta("usage", final_usage) end

Outbound include_usage

local body_table = { model = ..., messages = ..., stream = true }
if opts.include_usage ~= false then
    body_table.stream_options = { include_usage = true }
end

Risk: some providers reject unrecognized fields. Baseline check; if any host throws on stream_options, the per-model opt-out is one line.

Category tagging

opts.category is a string set by the caller. broker echoes it into the emitted usage payload so the accumulator knows what to credit. Default category if absent: "main".

5. Pillar 2 — Accumulator on ctx

Shape

ctx.usage_totals = {
    -- [model_name] = { [category] = { prompt = N, completion = N,
    --                                 calls = N, cost = N } }
    fast = {
        main      = { prompt = 1234, completion = 567, calls = 14, cost = 0   },
    },
    cloud = {
        main      = { prompt = 3850, completion = 980, calls = 8,  cost = 0.0180 },
        delegate  = { prompt = 250,  completion = 80,  calls = 1,  cost = 0.0012 },
        probe     = { prompt = 150,  completion = 30,  calls = 1,  cost = 0.0042 },
    },
}
ctx.cost_warn_fired = false

add_usage

function Context:add_usage(model, category, u)
    model    = model    or "?"
    category = category or "main"
    self.usage_totals = self.usage_totals or {}
    local m = self.usage_totals[model] or {}
    local c = m[category] or { prompt = 0, completion = 0, calls = 0, cost = 0 }
    c.prompt     = c.prompt     + (u.prompt_tokens or 0)
    c.completion = c.completion + (u.completion_tokens or 0)
    c.calls      = c.calls      + 1
    c.cost       = c.cost       + (u.cost or 0)
    m[category] = c
    self.usage_totals[model] = m
end

function Context:total_cost()
    local total = 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do total = total + c.cost end
    end
    return total
end

function Context:total_tokens()
    local p, comp = 0, 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do
            p    = p    + c.prompt
            comp = comp + c.completion
        end
    end
    return p, comp
end

Reset semantics

Context:reset() deliberately does NOT clear usage_totals — matches R8 invariant from Phase 6 (:reset clears turns, pending_exec_output, summary; preserves memory_items, project, and now usage_totals). The user reset their conversation, not their cost meter. :cost reset is the explicit reset verb for the meter.

6. Pillar 3 — `:cost` meta

:cost                       summary line
:cost detail                per-model + per-category breakdown
:cost reset                 zero out ctx.usage_totals + cost_warn_fired

Summary format:

[aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
                       cost=$0.0234 (cloud only; local: 0)

Detail format (sorted by total cost desc, then by model):

[aish] session usage detail:
  cloud     main      8 calls,  3,850 / 980 tokens,   $0.0180
  cloud     delegate  1 call,     250 / 80  tokens,   $0.0012
  cloud     probe     1 call,     150 / 30  tokens,   $0.0042
  fast      main     14 calls,  8,200 / 2,100 tokens, $0     (local)

Implementation: pure Lua iteration over ctx.usage_totals; no broker calls. Sorting uses table.sort on a flattened list.

7. Pillar 4 — Warning thresholds

Config:

cost = {
    warn_at_dollars = 0.50,    -- emit once when cumulative cost crosses
    warn_at_tokens  = 100000,  -- emit once when cumulative tokens crosses
}

After every ctx:add_usage, check:

if config.cost and not ctx.cost_warn_fired then
    local cost = ctx:total_cost()
    if config.cost.warn_at_dollars and cost >= config.cost.warn_at_dollars then
        renderer.status(("session cost $%.4f has crossed warn_at_dollars=$%.4f")
                        :format(cost, config.cost.warn_at_dollars))
        ctx.cost_warn_fired = true
    end
    -- (similar for warn_at_tokens; share the flag or use two)
end

One-shot per session. :cost reset clears the flag.

8. UX Surface Summary

Meta	Behavior
`:cost`	One-line summary: calls / tokens / cost
`:cost detail`	Per-model + per-category breakdown
`:cost reset`	Zero out totals + clear warn-fired flag

Config	Default	Effect
`cfg.cost.warn_at_dollars`	nil	Status when cumulative cost first crosses this dollar amount
`cfg.cost.warn_at_tokens`	nil	Status when cumulative total tokens first crosses
(broker `opts.include_usage`)	true	Adds `stream_options.include_usage = true` to outbound request

9. Out of Scope (Phase 7)

Cross-session cost persistence — Q-C2 defers <history.dir>/cost.jsonl rollup; v1 is session-only. Per-turn usage IS in the session JSONL for after-the-fact aggregation if anyone wants to script it.
Per-model rate limiting / cost caps that REFUSE the call — v1 only warns. A future phase could add a hard cap that aborts before the broker call.
Pricing-table fallback for local models — if a local model doesn't emit usage.cost, we record 0. Estimating cost from token count + a static pricing table is a future polish (most users won't care about local "cost" anyway — local is free).
Pretty token-bandwidth charts / sparklines — out of scope; the detail breakdown is text-only.
Estimated cost for future turns — no preflight cost prediction.
MCP tool-call usage — MCP servers don't expose token usage; broker calls invoked DURING MCP tool dispatch ARE captured (because they go through the same path), but the MCP tool call itself isn't.

10. Risks

Risk	Mitigation
Some providers reject `stream_options` -> SSE errors at the top of the stream	`opts.include_usage = false` opt-out per call site; baseline-time probe of the actual hossenfelder broker behavior
OpenRouter `cost` field shape varies between providers (Bedrock vs. Baidu vs. Together vs. ...)	Capture `usage.cost` as-is (number); document that the same provider must be used for cross-call comparison
Local llama.cpp returns no `cost` -> displayed `$0` could mislead user "is this REALLY free?"	`:cost detail` annotates local lines with `(local)` literal; summary says `cost=$X (cloud only; local: 0)`
`ctx.usage_totals` grows unboundedly with new model names mid-session	Bounded by `#models in config` × `#categories` — small constants. No mitigation needed.
Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold	Acceptable for v1; user can `:cost reset` to re-arm. Future polish: warn at each Nx multiple.

11. Open Questions (Phase 7)

#	Question	Impact
Q-C1	Provider-without-usage handling	A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp.
Q-C2	Cross-session cost persistence (`cost.jsonl`)	Deferred to follow-up phase 8; v1 is session-only.
Q-C3	Categories closed-set vs free-form	A4 — free-form; caller decides. Matches Phase 6 helpers/skills convention.
Q-C4	`stream_options` forwarding by hossenfelder	B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct.
Q-C5	Warn fires on the crossed call or the next	A5 — on the crossed call (no UX-defeating delay).
Q-C6	`:reset` clears `cost_warn_fired`	A6 — no, only `:cost reset` clears the flag (R8 parity).

12. Phase 7 → Phase 8+ Out-of-band

Candidate follow-ups (non-binding):

Phase 8: cross-session cost persistence (Q-C2 deferral), with optional cost dashboards / weekly rollup reporter.
Hard rate limits / cost caps that REFUSE the call — an extension of the warn surface that promotes warnings into preflight enforcement.
Better tokenization (Q1 deferred-from-Phase-3): replace the char/4 heuristic on Context:estimate_tokens() with model /tokenize calls. Indirectly improves accuracy of any future "preflight cost predictor".

Phase 7 itself is self-contained — no upstream dependencies.

13. Implementation Plan (commit-by-commit)

Bottom-up; broker first (it's the egress point that all callers depend on), then context (the accumulator), then the call-site rewires, then the user-facing meta + warn surface, then config + status bump. Each commit leaves the tree green (existing tests + load smoke + per-commit feature smoke).

Order

broker.lua — usage capture + signature widening.
- build_request(model_cfg, messages, stream, opts) widened to take an opts table; opts.tools / opts.max_tokens fold in from the existing positional args. Opts.include_usage (default true) adds stream_options.include_usage = true to the request body (per B1, required for local).
- M.chat_stream event loop adds if doc.usage then final_usage = doc.usage end; after curl.post_sse returns, if final_usage is set, on_delta("usage", payload) is called. Payload includes model = model_cfg.model (caller-stable per B4), the raw token counts, and cost as a number (nil for local per B3).
- opts.category passthrough — the broker just echoes it into the emitted usage payload; doesn't validate (per A4 free-form).
- M.chat (the non-streaming wrapper) returns (text, usage) — backward-compatible (existing callers ignore the second value).
- Smoke: hand-build a request with stream_options, capture all three on_delta kinds (text, tool_call when applicable, usage), confirm usage payload matches what curl shows.
context.lua — accumulator + helpers.
- Context.new: self.usage_totals = {} + self.cost_warn_fired = false.
- Context:add_usage(model, category, usage) — increments usage_totals[model][category] slots.
- Context:total_cost() — sums all cost fields across all models/categories.
- Context:total_tokens() — sums prompt + completion separately.
- Context:reset — does NOT touch usage_totals or cost_warn_fired (R8 parity with memory_items and project).
- Smoke: 4-case inline test of add_usage / totals / reset preservation.
repl.lua — wire opts.category + on_delta("usage") at non-Norris call sites.
- call_broker wrapper (used by ask_ai): pass opts.category = "main"; the on_delta wrapper handles kind == "usage" by calling ctx:add_usage(req_name, "main", payload).
- DELEGATE: handler: opts.category = "delegate".
- :delegate meta: opts.category = "delegate".
- summarize-on-evict callback: opts.category = "summarize".
- :memory summarize: opts.category = "memory_summarize".
- For broker.chat callers (non-streaming): capture the new second return value and feed to ctx:add_usage.
- Smoke: send one cloud prompt, observe ctx.usage_totals grows.
safety.lua — opts.category for Norris + probe.
- safety.norris_step's broker.chat_stream call: pass opts.category = "norris"; the helpers.on_usage callback (added to the helpers table by repl.lua) routes back to ctx:add_usage. OR — simpler — safety.lua wraps on_delta itself with a "usage"-kind branch that calls helpers.on_usage.
- safety.is_destructive's llm_probe broker.chat call: pass opts.category = "probe"; capture the (text, usage) return and forward via opts.on_usage callback (added to is_destructive opts).
- Smoke: a Norris session shows both "norris" and "probe" category entries in :cost detail.
repl.lua — :cost meta + warn-threshold + HELP.
- :cost (summary), :cost detail (per-model+category breakdown), :cost reset (zero totals + clear cost_warn_fired).
- After every ctx:add_usage call (centralized in a helper if possible), check cfg.cost.warn_at_dollars / warn_at_tokens; emit one-shot status if crossed AND cost_warn_fired is false.
- HELP gains 3 lines for :cost.
- Smoke: :cost shows totals; :cost detail breaks down; warn fires once when threshold crossed; :cost reset re-arms.
config.lua example block + docs/PHASE7.md status bump.
- Commented-out cost = { warn_at_dollars = 0.50, warn_at_tokens = 100000 } block in config.lua.
- PHASE7.md status header → Implement (matches Phase 5/6 cadence — manifest tracks implementation state).

Risk index per commit

Commit	Risk	Mitigation
1 (broker)	build_request signature change breaks all existing callers	All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table
1 (broker)	`M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second	Lua doesn't error on dropped return values; backward-compat preserved automatically
2 (context)	usage_totals nil on old ctx serializations	Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed
3 (repl wires)	Forgetting one call site = silent under-count	Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged
4 (safety wires)	safety.lua must NOT require("secrets")-style introduce new module dep	Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep
5 (:cost + warn)	warn fires multiple times when threshold is much exceeded by one call	cost_warn_fired one-shot flag; explicit :cost reset to re-arm
6 (config + status)	none

Tests + smoke per commit

Each commit:

Pass luajit test_safety.lua (87/87) and luajit test_router_model.lua (31/31)
Load cleanly via luajit -e 'package.path=...; require("repl"); print("ok")'
Pass a per-feature smoke (described in each row above)

Things deliberately NOT split

broker.chat backward-compat shim — Lua's multiple-return-values semantics handle it automatically (existing local r = broker.chat(..) drops the new usage value).
Per-category sub-tables — flat model -> category -> counters is simple enough; nesting deeper for e.g. timestamps is v2.
Cross-session persistence — explicitly Q-C2 deferred to phase 8.

Open at plan-time (resolve at implement)

Whether safety.is_destructive's opts should carry on_usage callback explicitly OR thread through cfg.helpers (the latter matches the Norris helpers convention but is more coupling). Decide at commit 4. Default to explicit opts.on_usage for minimum surface.
Whether to emit a [aish] usage: model=X prompt=N completion=M cost=$X status line PER TURN (verbose mode) or only via :cost on demand. v1 = on demand only; verbose mode is a follow-up nice-to-have.

29 KiB Raw Blame History Unescape Escape