Files
aish/docs/PHASE7.md
T
marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:50:39 +00:00

29 KiB
Raw Blame History

aish — Phase 7 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 7 Requirements, Architecture & Design Decisions Status: Plan (formulate + analyze + baseline complete; tree at 2244a3f) Date: 2026-05-16

Analyze findings (2026-05-16):

A1. broker.chat_stream surface is clean for the extension. The existing on_event(data) closure inside M.chat_stream already parses doc.error / doc.choices / delta / tool_calls — adding if doc.usage then final_usage = ... end is one block. Emission happens via a closure-local final_usage that the post-loop code in chat_stream reads and calls on_delta("usage", final_usage) on. build_request needs minor extension OR (cleaner) chat_stream inserts stream_options.include_usage = true into the body table AFTER json.encode — but we currently encode in build_request. Cleanest: extend build_request(model_cfg, messages, stream, opts) so it can read opts.include_usage. Phase 7 simplifies the signature in passing.

A2. 7 caller sites identified for opts.category threading:

| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |

All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).

A3. build_request signature simplification. Today it takes (model_cfg, messages, stream, tools, max_tokens) — five positional args. With Phase 7 needing include_usage AND stream_options, positional growth gets unwieldy. Resolution: widen to (model_cfg, messages, stream, opts) where opts carries {tools, max_tokens, include_usage, stream_options}. Callers in M.chat_stream and M.chat pass their existing opts table through. This is a refactor but contained inside broker.lua.

A4. Q-C3 RESOLVED: free-form categories. The closed-set vs free-form debate resolved in favor of free-form per the helpers/skills convention already in place (Phase 6 :tree / :diff metas don't validate sub-args either). :cost detail will show whatever categories appear — small + documented closed set in practice (7 entries from A2), no surprise.

A5. Q-C5 RESOLVED: warn fires on the call that crossed. The crossed call's usage IS in the accumulator at the moment we check (we check AFTER add_usage). Firing on the NEXT call would mean a delay of one full broker round-trip before the user sees the warn — defeats the purpose. Just emit-on-cross.

A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired. Parity with usage_totals itself (per the §2 decision row); the user reset their conversation, not their cost meter. The flag AND the totals are reset only by the explicit :cost reset verb.

A7. Norris call-graph rewires (existing safety.lua:354 path): with issue #52 wired (commit 955bd82), the Norris broker call now passes helpers.scrub_msgs / helpers.streaming_rehydrator. The on_delta wrapping pattern means I need to be careful that the new ("usage", payload) kind also flows through any wrapper. Since secrets streaming_rehydrator only matches on kind == "text", the "usage" kind passes through unchanged. No new entanglement.

A8. ctx.usage_totals survives :reset per R8 — same invariant as memory_items (Phase 4) and project (Phase 6). Documented in §5 of the manifest; reinforces the "ambient context survives conversation reset" rule.

A9. Session JSONL serialization — assistant turn dict gets an optional usage field. history.lua log_turn currently calls json.encode(turn) opaquely; the dkjson serializer handles nested tables. No code change needed; the new field flows through automatically when the assistant turn carries one.

A10. Q-C1 PARTIAL: local providers may not emit usage. The formulate-time assumption was "treat absence as zero-cost / unknown". A real probe against qwen-coder-7b-snappy-8k is a baseline action — see B-probes below. The implementation will be defensive: if doc.usage never appears in the stream, no "usage" event is emitted, and the accumulator is unchanged for that turn. :cost output naturally reflects "0 calls counted for local model" if that's the case.

A11. Q-C4 deferred to baseline: actual stream_options forwarding by the hossenfelder proxy must be probed against a live broker. If the proxy strips the option, we get no usage events even for cloud calls. Baseline action.

PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest specifies what Phase 7 adds — cost / usage observability: the ability to know, mid-session, how many tokens you've spent and how much money the paid-cloud calls have cost.

PHASE0 §11 originally listed phases only through 6; this commit amends §11 to add Phase 7.


1. Scope of Phase 7

Four pillars:

  1. Usage capture in brokerbroker.chat_stream extracts the provider's usage block (and cost where present) from the response stream. Surfaces it to the caller via a new on_delta("usage", ...) kind. The existing broker.chat buffering wrapper exposes it as a second return value (text, usage). Backward-compatible: callers that don't handle the new kind / second value simply ignore it.

  2. Per-session accumulator on ctx — running totals per-model AND per-call-category (main / delegate / summarize / probe) accumulate on ctx.usage_totals. No persistence across sessions in v1 (Q-C2 defers cross-session); the session-log JSONL files DO carry per-turn usage so historical analysis is possible after the fact.

  3. :cost meta — a :cost reporter that shows the current session totals, with optional :cost detail for the per-model + per-category breakdown. Zero broker calls (purely local read of ctx.usage_totals).

  4. Optional warning thresholdscfg.cost.warn_at_dollars and cfg.cost.warn_at_tokens emit a status the first time the running total crosses the configured threshold. Default off (no warnings without config). Useful when cloud presets are configured and you want a "you've spent $1 this session" nudge before runaway cost.

Phase 7 is done when:

  • broker.chat_stream exposes usage via the new on_delta("usage", ...) callback kind; broker.chat returns (text, usage). Backward compat preserved (no existing caller breaks).
  • After a session with mixed local + cloud calls, :cost prints a total like:
    [aish] session usage: 24 turns, prompt=12,450 / completion=3,210 tokens
                                    cost=$0.0234 (cloud only; local: 0)
    
  • :cost detail breaks down by model + category:
    fast    main: 14 turns, 8200/2100 tokens
    cloud   main: 8 turns, 3850/980 tokens, $0.0180
    cloud   delegate: 1 turn, 250/80 tokens, $0.0012
    cloud   probe: 1 turn, 150/30 tokens, $0.0042
    
  • Session JSONL gains a usage field on assistant turns (when the broker returned one).
  • With cfg.cost.warn_at_dollars = 0.50 set, crossing $0.50 cumulative emits exactly one status line.
  • Existing configs without cfg.cost behave exactly like Phase 6 (Phase 6 regression coverage).

2. Technology Decisions (delta from Phase 6)

Decision Choice Rationale
Where to extract usage In broker.chat_stream event loop, looking at each SSE event's usage field on the final chunk The OpenAI streaming spec puts usage on the FINAL chunk when stream_options: { include_usage: true } is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline).
New on_delta kind on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? }) Mirrors the existing ("text", chunk) / ("tool_call", call) shape. Callers ignore unknown kinds; backward-compatible.
Where to enable usage on the wire opts.include_usage = true (default true) sets stream_options.include_usage = true in the outbound request body Off-switch for hosts that reject stream_options. Defaults on; baseline probe confirms current broker tolerates it. (A3: build_request signature widens to take an opts table; positional growth was getting unwieldy.)
Accumulator location ctx.usage_totals[model_name][category] table ctx is per-conversation; matches the :reset-survives-or-not rules already in place.
Categories "main" (ask_ai), "delegate", "summarize", "memory_summarize", "probe", "norris" One-tag-per-call-site. Tagged at the caller site (caller passes opts.category to broker.chat_stream).
Cost extraction usage.cost (OpenRouter convention; dollars as a number) plus usage.cost_details.upstream_inference_cost (more detailed). For Anthropic/Bedrock the cost arrives in dollars on usage.cost. For pure local llama.cpp: no cost field — record 0. Single field name across all observed providers (per baseline B7 — to be confirmed).
Cost precision Store as number (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision) No floating-point cumulative-error concerns at this scale.
Warning trigger First crossing of either threshold emits a single status: [aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY. Crossed-flag stored on ctx; reset only on session end / :cost reset. One-shot to avoid spamming.
:reset interaction :reset does NOT clear ctx.usage_totals (parity with memory_items/project) — the user reset their conversation, not their cost tracking. :cost reset is the explicit reset verb. Matches R8 invariant from Phase 6.
Session-log persistence Assistant turn entries gain an optional usage field when broker returned one. history.lua log_turn writes it through verbatim. Per-turn granularity preserved for after-the-fact analysis. No new file.

3. Module Changes

File State after Phase 6 Phase 7 changes
broker.lua chat_stream(cfg, msgs, on_delta, opts) with text + tool_call kinds; chat returns text Extract usage from final SSE chunk; emit on_delta("usage", payload); chat returns (text, usage). New opts.include_usage (default true); new opts.category (passed through as a tag in the usage payload).
context.lua system prompt + turns + memory + project + summary Add self.usage_totals (table) + self.cost_warn_fired (bool). New helpers: Context:add_usage(model, category, usage), Context:total_cost(), Context:total_tokens(). Context:reset does NOT clear usage_totals (parity with memory_items / project per R8).
repl.lua ask_ai + delegate + summarize callbacks + Norris helpers Wire opts.category at each broker call site (main / delegate / summarize / memory_summarize). Wire on_delta("usage", ...) -> ctx:add_usage(...). New :cost and :cost detail / :cost reset metas. Cost-warn check after each add_usage call.
safety.lua norris_step + is_destructive Pass opts.category = "norris" (for the main chat_stream call) and "probe" (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since safety.llm_model = "cloud" is the recommended setting.
history.lua session.log_turn appends JSONL entries log_turn already takes turn opaquely; assistant turns will carry usage if present and it'll serialize via dkjson. No code change unless filter desired.
config.lua example blocks for mcp/safety/memory/routing/secrets/hooks/project Add commented-out cost = { warn_at_dollars, warn_at_tokens } block.
docs/PHASE0.md §11 lists phases 0-6 Amendment: add Phase 7 row to §11.

No new module files.


4. Pillar 1 — Usage capture in broker

SSE shape (provider-by-provider — confirm in baseline)

For OpenAI-compatible streams with stream_options: { include_usage: true }:

data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hi"}, ...}]}
data: {"id":"...","choices":[{"index":0,"delta":{}, "finish_reason":"stop"}]}
data: {"id":"...","choices":[],"usage":{"prompt_tokens":15,"completion_tokens":3,"total_tokens":18,"cost":0.00004,"cost_details":{...}}}
data: [DONE]

The final usage event arrives AFTER finish_reason but BEFORE [DONE]. choices is empty [] on the usage event.

For non-streaming chat: usage is in the response body at the top level. broker.chat is a wrapper around chat_stream, so it inherits the on_delta path.

For local llama.cpp via hossenfelder: usage may or may not be present depending on the proxy's version. Treat absence as zero-cost / unknown.

Extraction algorithm

local final_usage = nil

local function on_event(data)
    ...
    if doc.usage then
        -- Provider sent usage; capture for emission after the stream.
        final_usage = {
            prompt_tokens     = doc.usage.prompt_tokens or 0,
            completion_tokens = doc.usage.completion_tokens or 0,
            total_tokens      = doc.usage.total_tokens or 0,
            cost              = doc.usage.cost,   -- nil for local
            model             = doc.model or model_cfg.model,
        }
        -- Don't emit yet — the [DONE] event marks stream end; emit
        -- once we exit the curl.post_sse loop so the caller sees
        -- usage as the LAST event in the stream order.
    end
    -- ... existing text + tool_call handling ...
end

-- After curl.post_sse returns (stream complete):
if final_usage then on_delta("usage", final_usage) end

Outbound include_usage

local body_table = { model = ..., messages = ..., stream = true }
if opts.include_usage ~= false then
    body_table.stream_options = { include_usage = true }
end

Risk: some providers reject unrecognized fields. Baseline check; if any host throws on stream_options, the per-model opt-out is one line.

Category tagging

opts.category is a string set by the caller. broker echoes it into the emitted usage payload so the accumulator knows what to credit. Default category if absent: "main".


5. Pillar 2 — Accumulator on ctx

Shape

ctx.usage_totals = {
    -- [model_name] = { [category] = { prompt = N, completion = N,
    --                                 calls = N, cost = N } }
    fast = {
        main      = { prompt = 1234, completion = 567, calls = 14, cost = 0   },
    },
    cloud = {
        main      = { prompt = 3850, completion = 980, calls = 8,  cost = 0.0180 },
        delegate  = { prompt = 250,  completion = 80,  calls = 1,  cost = 0.0012 },
        probe     = { prompt = 150,  completion = 30,  calls = 1,  cost = 0.0042 },
    },
}
ctx.cost_warn_fired = false

add_usage

function Context:add_usage(model, category, u)
    model    = model    or "?"
    category = category or "main"
    self.usage_totals = self.usage_totals or {}
    local m = self.usage_totals[model] or {}
    local c = m[category] or { prompt = 0, completion = 0, calls = 0, cost = 0 }
    c.prompt     = c.prompt     + (u.prompt_tokens or 0)
    c.completion = c.completion + (u.completion_tokens or 0)
    c.calls      = c.calls      + 1
    c.cost       = c.cost       + (u.cost or 0)
    m[category] = c
    self.usage_totals[model] = m
end

function Context:total_cost()
    local total = 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do total = total + c.cost end
    end
    return total
end

function Context:total_tokens()
    local p, comp = 0, 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do
            p    = p    + c.prompt
            comp = comp + c.completion
        end
    end
    return p, comp
end

Reset semantics

Context:reset() deliberately does NOT clear usage_totals — matches R8 invariant from Phase 6 (:reset clears turns, pending_exec_output, summary; preserves memory_items, project, and now usage_totals). The user reset their conversation, not their cost meter. :cost reset is the explicit reset verb for the meter.


6. Pillar 3 — :cost meta

:cost                       summary line
:cost detail                per-model + per-category breakdown
:cost reset                 zero out ctx.usage_totals + cost_warn_fired

Summary format:

[aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
                       cost=$0.0234 (cloud only; local: 0)

Detail format (sorted by total cost desc, then by model):

[aish] session usage detail:
  cloud     main      8 calls,  3,850 / 980 tokens,   $0.0180
  cloud     delegate  1 call,     250 / 80  tokens,   $0.0012
  cloud     probe     1 call,     150 / 30  tokens,   $0.0042
  fast      main     14 calls,  8,200 / 2,100 tokens, $0     (local)

Implementation: pure Lua iteration over ctx.usage_totals; no broker calls. Sorting uses table.sort on a flattened list.


7. Pillar 4 — Warning thresholds

Config:

cost = {
    warn_at_dollars = 0.50,    -- emit once when cumulative cost crosses
    warn_at_tokens  = 100000,  -- emit once when cumulative tokens crosses
}

After every ctx:add_usage, check:

if config.cost and not ctx.cost_warn_fired then
    local cost = ctx:total_cost()
    if config.cost.warn_at_dollars and cost >= config.cost.warn_at_dollars then
        renderer.status(("session cost $%.4f has crossed warn_at_dollars=$%.4f")
                        :format(cost, config.cost.warn_at_dollars))
        ctx.cost_warn_fired = true
    end
    -- (similar for warn_at_tokens; share the flag or use two)
end

One-shot per session. :cost reset clears the flag.


8. UX Surface Summary

Meta Behavior
:cost One-line summary: calls / tokens / cost
:cost detail Per-model + per-category breakdown
:cost reset Zero out totals + clear warn-fired flag
Config Default Effect
cfg.cost.warn_at_dollars nil Status when cumulative cost first crosses this dollar amount
cfg.cost.warn_at_tokens nil Status when cumulative total tokens first crosses
(broker opts.include_usage) true Adds stream_options.include_usage = true to outbound request

9. Out of Scope (Phase 7)

  • Cross-session cost persistence — Q-C2 defers <history.dir>/cost.jsonl rollup; v1 is session-only. Per-turn usage IS in the session JSONL for after-the-fact aggregation if anyone wants to script it.
  • Per-model rate limiting / cost caps that REFUSE the call — v1 only warns. A future phase could add a hard cap that aborts before the broker call.
  • Pricing-table fallback for local models — if a local model doesn't emit usage.cost, we record 0. Estimating cost from token count + a static pricing table is a future polish (most users won't care about local "cost" anyway — local is free).
  • Pretty token-bandwidth charts / sparklines — out of scope; the detail breakdown is text-only.
  • Estimated cost for future turns — no preflight cost prediction.
  • MCP tool-call usage — MCP servers don't expose token usage; broker calls invoked DURING MCP tool dispatch ARE captured (because they go through the same path), but the MCP tool call itself isn't.

10. Risks

Risk Mitigation
Some providers reject stream_options -> SSE errors at the top of the stream opts.include_usage = false opt-out per call site; baseline-time probe of the actual hossenfelder broker behavior
OpenRouter cost field shape varies between providers (Bedrock vs. Baidu vs. Together vs. ...) Capture usage.cost as-is (number); document that the same provider must be used for cross-call comparison
Local llama.cpp returns no cost -> displayed $0 could mislead user "is this REALLY free?" :cost detail annotates local lines with (local) literal; summary says cost=$X (cloud only; local: 0)
ctx.usage_totals grows unboundedly with new model names mid-session Bounded by #models in config × #categories — small constants. No mitigation needed.
Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold Acceptable for v1; user can :cost reset to re-arm. Future polish: warn at each Nx multiple.

11. Open Questions (Phase 7)

# Question Impact Resolution target
Q-C1 Provider-without-usage handling A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp.
Q-C2 Cross-session cost persistence (cost.jsonl) Deferred to follow-up phase 8; v1 is session-only.
Q-C3 Categories closed-set vs free-form A4 — free-form; caller decides. Matches Phase 6 helpers/skills convention.
Q-C4 stream_options forwarding by hossenfelder B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct.
Q-C5 Warn fires on the crossed call or the next A5 — on the crossed call (no UX-defeating delay).
Q-C6 :reset clears cost_warn_fired A6 — no, only :cost reset clears the flag (R8 parity).

12. Phase 7 → Phase 8+ Out-of-band

Candidate follow-ups (non-binding):

  • Phase 8: cross-session cost persistence (Q-C2 deferral), with optional cost dashboards / weekly rollup reporter.
  • Hard rate limits / cost caps that REFUSE the call — an extension of the warn surface that promotes warnings into preflight enforcement.
  • Better tokenization (Q1 deferred-from-Phase-3): replace the char/4 heuristic on Context:estimate_tokens() with model /tokenize calls. Indirectly improves accuracy of any future "preflight cost predictor".

Phase 7 itself is self-contained — no upstream dependencies.


13. Implementation Plan (commit-by-commit)

Bottom-up; broker first (it's the egress point that all callers depend on), then context (the accumulator), then the call-site rewires, then the user-facing meta + warn surface, then config + status bump. Each commit leaves the tree green (existing tests + load smoke + per-commit feature smoke).

Order

  1. broker.lua — usage capture + signature widening.

    • build_request(model_cfg, messages, stream, opts) widened to take an opts table; opts.tools / opts.max_tokens fold in from the existing positional args. Opts.include_usage (default true) adds stream_options.include_usage = true to the request body (per B1, required for local).
    • M.chat_stream event loop adds if doc.usage then final_usage = doc.usage end; after curl.post_sse returns, if final_usage is set, on_delta("usage", payload) is called. Payload includes model = model_cfg.model (caller-stable per B4), the raw token counts, and cost as a number (nil for local per B3).
    • opts.category passthrough — the broker just echoes it into the emitted usage payload; doesn't validate (per A4 free-form).
    • M.chat (the non-streaming wrapper) returns (text, usage) — backward-compatible (existing callers ignore the second value).
    • Smoke: hand-build a request with stream_options, capture all three on_delta kinds (text, tool_call when applicable, usage), confirm usage payload matches what curl shows.
  2. context.lua — accumulator + helpers.

    • Context.new: self.usage_totals = {} + self.cost_warn_fired = false.
    • Context:add_usage(model, category, usage) — increments usage_totals[model][category] slots.
    • Context:total_cost() — sums all cost fields across all models/categories.
    • Context:total_tokens() — sums prompt + completion separately.
    • Context:reset — does NOT touch usage_totals or cost_warn_fired (R8 parity with memory_items and project).
    • Smoke: 4-case inline test of add_usage / totals / reset preservation.
  3. repl.lua — wire opts.category + on_delta("usage") at non-Norris call sites.

    • call_broker wrapper (used by ask_ai): pass opts.category = "main"; the on_delta wrapper handles kind == "usage" by calling ctx:add_usage(req_name, "main", payload).
    • DELEGATE: handler: opts.category = "delegate".
    • :delegate meta: opts.category = "delegate".
    • summarize-on-evict callback: opts.category = "summarize".
    • :memory summarize: opts.category = "memory_summarize".
    • For broker.chat callers (non-streaming): capture the new second return value and feed to ctx:add_usage.
    • Smoke: send one cloud prompt, observe ctx.usage_totals grows.
  4. safety.lua — opts.category for Norris + probe.

    • safety.norris_step's broker.chat_stream call: pass opts.category = "norris"; the helpers.on_usage callback (added to the helpers table by repl.lua) routes back to ctx:add_usage. OR — simpler — safety.lua wraps on_delta itself with a "usage"-kind branch that calls helpers.on_usage.
    • safety.is_destructive's llm_probe broker.chat call: pass opts.category = "probe"; capture the (text, usage) return and forward via opts.on_usage callback (added to is_destructive opts).
    • Smoke: a Norris session shows both "norris" and "probe" category entries in :cost detail.
  5. repl.lua — :cost meta + warn-threshold + HELP.

    • :cost (summary), :cost detail (per-model+category breakdown), :cost reset (zero totals + clear cost_warn_fired).
    • After every ctx:add_usage call (centralized in a helper if possible), check cfg.cost.warn_at_dollars / warn_at_tokens; emit one-shot status if crossed AND cost_warn_fired is false.
    • HELP gains 3 lines for :cost.
    • Smoke: :cost shows totals; :cost detail breaks down; warn fires once when threshold crossed; :cost reset re-arms.
  6. config.lua example block + docs/PHASE7.md status bump.

    • Commented-out cost = { warn_at_dollars = 0.50, warn_at_tokens = 100000 } block in config.lua.
    • PHASE7.md status header → Implement (matches Phase 5/6 cadence — manifest tracks implementation state).

Risk index per commit

Commit Risk Mitigation
1 (broker) build_request signature change breaks all existing callers All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (opts.tools = old_tools if positional was used) is unnecessary because every caller already passes opts table
1 (broker) M.chat second return value confuses callers that do local r = broker.chat(...) discarding the second Lua doesn't error on dropped return values; backward-compat preserved automatically
2 (context) usage_totals nil on old ctx serializations Defensive self.usage_totals = self.usage_totals or {} in add_usage; no migration needed
3 (repl wires) Forgetting one call site = silent under-count Lint by grep for broker.chat\( and broker.chat_stream\( after the wire commit; ensure each is tagged
4 (safety wires) safety.lua must NOT require("secrets")-style introduce new module dep Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep
5 (:cost + warn) warn fires multiple times when threshold is much exceeded by one call cost_warn_fired one-shot flag; explicit :cost reset to re-arm
6 (config + status) none

Tests + smoke per commit

Each commit:

  • Pass luajit test_safety.lua (87/87) and luajit test_router_model.lua (31/31)
  • Load cleanly via luajit -e 'package.path=...; require("repl"); print("ok")'
  • Pass a per-feature smoke (described in each row above)

Things deliberately NOT split

  • broker.chat backward-compat shim — Lua's multiple-return-values semantics handle it automatically (existing local r = broker.chat(..) drops the new usage value).
  • Per-category sub-tables — flat model -> category -> counters is simple enough; nesting deeper for e.g. timestamps is v2.
  • Cross-session persistence — explicitly Q-C2 deferred to phase 8.

Open at plan-time (resolve at implement)

  • Whether safety.is_destructive's opts should carry on_usage callback explicitly OR thread through cfg.helpers (the latter matches the Norris helpers convention but is more coupling). Decide at commit 4. Default to explicit opts.on_usage for minimum surface.
  • Whether to emit a [aish] usage: model=X prompt=N completion=M cost=$X status line PER TURN (verbose mode) or only via :cost on demand. v1 = on demand only; verbose mode is a follow-up nice-to-have.