Sonnet-reviewed (per the reviews-use-sonnet feedback memory).
BLOCKERs (RESOLVED in-place):
R1. M.chat would silently return (text, nil) for ALL non-streaming
callers — 4 of 5 categories (summarize/delegate/memory_summarize/
probe) flow through broker.chat, NOT chat_stream. §4 now shows
the explicit M.chat update that captures kind=="usage" alongside
"text" and returns (text, usage).
R2. call_broker fallback retry would credit usage to the wrong model
name. Fix: broker emits payload.model = model_cfg.model (which IS
the fallback's name when called with fb_cfg — chat_stream's
upvar). Wrapper keys by payload.model, NOT outer model_name. §4
+ §13 commit 3 reflect.
R3. build_request has TWO internal callers inside broker.lua itself,
not just the public surface. Plan §13 commit 1 risk row now
spells this out explicitly so the implementer doesn't read "every
caller already passes opts" as "external-only".
CONCERNs (FOLDED):
R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
suppresses the other. Split into ctx.cost_warn_state = { dollars
= false, tokens = false }; :cost reset clears both. §7 + §13.
R5. Warn-check centralization — single _record_usage helper in
repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
routes via helpers.on_usage / opts.on_usage callbacks. context.lua
stays decoupled from renderer.
R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
`is_local = true` (sticky) when ANY recorded usage had cost==nil.
`:cost detail` annotation comes from is_local flag, not a
fragile cost==0 heuristic.
R7. :cost detail sort needs 3-level deterministic key:
(cost desc, model asc, category asc) — table.sort is unstable.
R8. call_broker fallback passes opts.include_usage unchanged.
Documented as known assumption (B1 confirms both backends
accept; future-broken fallback can pass include_usage=false).
R9. :resume does NOT restore historical usage_totals. Per-turn usage
IS in session JSONL for scripting; cross-session aggregation is
Q-C2 deferred. Documented in §8.
R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
Widened to $%.6f in §6 + §7 warn message format.
NITs (APPLIED):
N1. §4 pseudocode comment notes `if doc.usage` branch is independent
of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
(3bad07b); commit 6 must NOT re-apply.
PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for
implementation phase pending user gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
40 KiB
aish — Phase 7 Manifest
Project: aish — AI-augmented conversational shell
Document: Phase 7 Requirements, Architecture & Design Decisions
Status: Plan + review fold-in (tree at 0f14dc1)
Date: 2026-05-16
Review findings (independent Sonnet agent, 2026-05-16) — 3 BLOCKERs resolved in-place, 6 CONCERNs folded, 5 NITs applied:
R1 (BLOCKER, RESOLVED). M.chat would silently return (text, nil)
for ALL non-streaming callers. M.chat's internal on_delta only
captures kind == "text". Without explicit handling of
kind == "usage", four out of five categories that go through
broker.chat (summarize / delegate / memory_summarize / probe)
would report zero usage even after a cloud round-trip. Fix
folded into §4 + §13 commit 1: M.chat's on_delta also captures
the usage payload and returns it as the second value.
R2 (BLOCKER, RESOLVED). call_broker fallback retry — usage
payload's model field credits the WRONG model name. The
wrapped on_delta in call_broker is closed over the PRIMARY's
name; if the wrapped function uses an outer-scope model_name
variable to key the accumulator, the fallback's usage gets
misattributed. Resolution: the broker emits payload.model = model_cfg.model (which IS the fallback's model when called with
fb_cfg — chat_stream's local upvar). The wrapper keys by
payload.model, NOT by the outer model_name. Documented in
§4 emission code + §13 commit 3 (wrapped on_delta uses
payload.model for accumulator keying).
R3 (BLOCKER, RESOLVED — promoted to docs). build_request has
TWO internal callers inside broker.lua itself, not just the
public surface. Migration is contained but both internal sites
must be updated in commit 1. Plan §13 commit 1 risk row updated
to call this out explicitly so the implementer doesn't read
"every caller already passes opts" as "only external callers
need touching".
R4 (CONCERN, FOLDED). Single cost_warn_fired flag for two
thresholds is broken. When both warn_at_dollars AND
warn_at_tokens are configured, the first-to-fire suppresses the
other. Fix: ctx.cost_warn_fired becomes ctx.cost_warn_state = { dollars = false, tokens = false }. Each threshold has its
own flag; :cost reset clears both. §7 pseudocode updated.
R5 (CONCERN, FOLDED). Warn-check centralization decided: use a
single _record_usage(model, category, usage) helper inside
repl.lua that wraps ctx:add_usage AND does the threshold check
AND calls renderer.status when crossed. context.lua stays
decoupled from renderer. safety.lua call sites get
helpers.on_usage = _record_usage in the helpers table; probe
callsite gets opts.on_usage = _record_usage. Single chokepoint
for the warn check. §3 + §7 + §13 commits 3-5 reflect.
R6 (CONCERN, FOLDED). nil vs 0 cost distinction must be
preserved at the accumulator level. Local-model $0 (no cost
field) vs cloud-call-that-happens-to-cost-zero need to be
distinguishable for :cost detail annotation. Fix: accumulator
slot gains is_local = true when ANY recorded usage for that
slot had cost == nil. Cloud calls with cost = 0 (rare) stay
annotated as cloud. §5 pseudocode + §6 annotation logic updated.
R7 (CONCERN, FOLDED). :cost detail sort needs three-level key
for determinism. Lua's table.sort is unstable; equal-cost
rows would have arbitrary order. Fix: sort key is
(cost desc, model asc, category asc). §6 updated.
R8 (CONCERN, FOLDED). call_broker fallback passes opts.include_usage
unchanged. Documented as a known assumption (B1 confirms both
backends accept; if a future fallback host rejects, the call-site
can pass include_usage = false explicitly). §10 risk row added.
R9 (CONCERN, FOLDED). :resume does NOT restore historical
usage_totals. Per-turn usage IS in the session JSONL but
:resume reloads turns for conversation continuity only; the
accumulator stays empty. Documented in §8 surface notes; users
who want cross-session totals can script the jsonl or wait for
the deferred Q-C2 follow-up.
R10 (CONCERN, FOLDED). $%.4f loses sub-cent precision. A
0.000028 cloud cost displays as $0.0000 — indistinguishable
from $0 local. Fix: format strings widened to $%.6f in
§6 (and the warn message in §7). 6 decimal places accommodates
the smallest observed real cost.
R-N1..N5 (NITs, APPLIED):
N1. §4 extraction pseudocode gains a comment noting the
if doc.usage branch is INDEPENDENT of the choice branch and
must be checked regardless of choice nil-ness (handles both
B2 emission shapes).
N2. §2 "Cost extraction" row referenced stale "B7"; corrected to B3.
N3. §13 commit 3 row gains an explicit dependency note: commit 3's
"capture the new second return value" requires commit 1's M.chat
fix from R1 to ship first.
N4. §3 safety.lua row + §13 commit 4 row spell out the signature
chain: llm_probe → llm_second_opinion → M.is_destructive
all widen to thread opts.on_usage through.
N5. §3 PHASE0.md row + §13 commit 6 row — the PHASE0 §11 amendment
is ALREADY in tree (committed at 3bad07b with the formulate
doc). Commit 6 should NOT re-apply; only adds config.lua block
+ bumps PHASE7 status header.
Analyze findings (2026-05-16):
A1. broker.chat_stream surface is clean for the extension. The
existing on_event(data) closure inside M.chat_stream already
parses doc.error / doc.choices / delta / tool_calls — adding
if doc.usage then final_usage = ... end is one block. Emission
happens via a closure-local final_usage that the post-loop code
in chat_stream reads and calls on_delta("usage", final_usage)
on. build_request needs minor extension OR (cleaner) chat_stream
inserts stream_options.include_usage = true into the body table
AFTER json.encode — but we currently encode in build_request.
Cleanest: extend build_request(model_cfg, messages, stream, opts)
so it can read opts.include_usage. Phase 7 simplifies the
signature in passing.
A2. 7 caller sites identified for opts.category threading:
| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |
All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).
A3. build_request signature simplification. Today it takes
(model_cfg, messages, stream, tools, max_tokens) — five positional
args. With Phase 7 needing include_usage AND stream_options,
positional growth gets unwieldy. Resolution: widen to
(model_cfg, messages, stream, opts) where opts carries
{tools, max_tokens, include_usage, stream_options}. Callers in
M.chat_stream and M.chat pass their existing opts table through.
This is a refactor but contained inside broker.lua.
A4. Q-C3 RESOLVED: free-form categories. The closed-set vs free-form
debate resolved in favor of free-form per the helpers/skills
convention already in place (Phase 6 :tree / :diff metas don't
validate sub-args either). :cost detail will show whatever
categories appear — small + documented closed set in practice
(7 entries from A2), no surprise.
A5. Q-C5 RESOLVED: warn fires on the call that crossed. The crossed
call's usage IS in the accumulator at the moment we check (we
check AFTER add_usage). Firing on the NEXT call would mean a
delay of one full broker round-trip before the user sees the
warn — defeats the purpose. Just emit-on-cross.
A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired.
Parity with usage_totals itself (per the §2 decision row); the
user reset their conversation, not their cost meter. The flag
AND the totals are reset only by the explicit :cost reset verb.
A7. Norris call-graph rewires (existing safety.lua:354 path): with
issue #52 wired (commit 955bd82), the Norris broker call now
passes helpers.scrub_msgs / helpers.streaming_rehydrator. The
on_delta wrapping pattern means I need to be careful that the new
("usage", payload) kind also flows through any wrapper. Since
secrets streaming_rehydrator only matches on kind == "text", the
"usage" kind passes through unchanged. No new entanglement.
A8. ctx.usage_totals survives :reset per R8 — same invariant
as memory_items (Phase 4) and project (Phase 6). Documented in
§5 of the manifest; reinforces the "ambient context survives
conversation reset" rule.
A9. Session JSONL serialization — assistant turn dict gets an
optional usage field. history.lua log_turn currently calls
json.encode(turn) opaquely; the dkjson serializer handles nested
tables. No code change needed; the new field flows through
automatically when the assistant turn carries one.
A10. Q-C1 PARTIAL: local providers may not emit usage. The
formulate-time assumption was "treat absence as zero-cost / unknown".
A real probe against qwen-coder-7b-snappy-8k is a baseline
action — see B-probes below. The implementation will be defensive:
if doc.usage never appears in the stream, no "usage" event is
emitted, and the accumulator is unchanged for that turn. :cost
output naturally reflects "0 calls counted for local model" if
that's the case.
A11. Q-C4 deferred to baseline: actual stream_options forwarding
by the hossenfelder proxy must be probed against a live broker.
If the proxy strips the option, we get no usage events even
for cloud calls. Baseline action.
PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest specifies what Phase 7 adds — cost / usage observability: the ability to know, mid-session, how many tokens you've spent and how much money the paid-cloud calls have cost.
PHASE0 §11 originally listed phases only through 6; this commit amends §11 to add Phase 7.
1. Scope of Phase 7
Four pillars:
-
Usage capture in broker —
broker.chat_streamextracts the provider'susageblock (andcostwhere present) from the response stream. Surfaces it to the caller via a newon_delta("usage", ...)kind. The existingbroker.chatbuffering wrapper exposes it as a second return value(text, usage). Backward-compatible: callers that don't handle the new kind / second value simply ignore it. -
Per-session accumulator on
ctx— running totals per-model AND per-call-category (main / delegate / summarize / probe) accumulate onctx.usage_totals. No persistence across sessions in v1 (Q-C2 defers cross-session); the session-log JSONL files DO carry per-turn usage so historical analysis is possible after the fact. -
:costmeta — a:costreporter that shows the current session totals, with optional:cost detailfor the per-model + per-category breakdown. Zero broker calls (purely local read ofctx.usage_totals). -
Optional warning thresholds —
cfg.cost.warn_at_dollarsandcfg.cost.warn_at_tokensemit a status the first time the running total crosses the configured threshold. Default off (no warnings without config). Useful when cloud presets are configured and you want a "you've spent $1 this session" nudge before runaway cost.
Phase 7 is done when:
broker.chat_streamexposes usage via the newon_delta("usage", ...)callback kind;broker.chatreturns(text, usage). Backward compat preserved (no existing caller breaks).- After a session with mixed local + cloud calls,
:costprints a total like:[aish] session usage: 24 turns, prompt=12,450 / completion=3,210 tokens cost=$0.0234 (cloud only; local: 0) :cost detailbreaks down by model + category:fast main: 14 turns, 8200/2100 tokens cloud main: 8 turns, 3850/980 tokens, $0.0180 cloud delegate: 1 turn, 250/80 tokens, $0.0012 cloud probe: 1 turn, 150/30 tokens, $0.0042- Session JSONL gains a
usagefield on assistant turns (when the broker returned one). - With
cfg.cost.warn_at_dollars = 0.50set, crossing $0.50 cumulative emits exactly one status line. - Existing configs without
cfg.costbehave exactly like Phase 6 (Phase 6 regression coverage).
2. Technology Decisions (delta from Phase 6)
| Decision | Choice | Rationale |
|---|---|---|
| Where to extract usage | In broker.chat_stream event loop, looking at each SSE event's usage field on the final chunk |
The OpenAI streaming spec puts usage on the FINAL chunk when stream_options: { include_usage: true } is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). |
| New on_delta kind | on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? }) |
Mirrors the existing ("text", chunk) / ("tool_call", call) shape. Callers ignore unknown kinds; backward-compatible. |
| Where to enable usage on the wire | opts.include_usage = true (default true) sets stream_options.include_usage = true in the outbound request body |
Off-switch for hosts that reject stream_options. Defaults on; baseline probe confirms current broker tolerates it. (A3: build_request signature widens to take an opts table; positional growth was getting unwieldy.) |
| Accumulator location | ctx.usage_totals[model_name][category] table |
ctx is per-conversation; matches the :reset-survives-or-not rules already in place. |
| Categories | "main" (ask_ai), "delegate", "summarize", "memory_summarize", "probe", "norris" |
One-tag-per-call-site. Tagged at the caller site (caller passes opts.category to broker.chat_stream). |
| Cost extraction | usage.cost (OpenRouter convention; dollars as a number). For Anthropic/Bedrock the cost arrives in dollars on usage.cost. For pure local llama.cpp: no cost field — record as nil (R6 — preserves the local-vs-cloud-zero distinction in the accumulator). |
Single field name across observed providers per baseline B3. |
| Cost precision | Store as number (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision) |
No floating-point cumulative-error concerns at this scale. |
| Warning trigger | First crossing of either threshold emits a single status: [aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY. Crossed-flag stored on ctx; reset only on session end / :cost reset. |
One-shot to avoid spamming. |
:reset interaction |
:reset does NOT clear ctx.usage_totals (parity with memory_items/project) — the user reset their conversation, not their cost tracking. :cost reset is the explicit reset verb. |
Matches R8 invariant from Phase 6. |
| Session-log persistence | Assistant turn entries gain an optional usage field when broker returned one. history.lua log_turn writes it through verbatim. |
Per-turn granularity preserved for after-the-fact analysis. No new file. |
3. Module Changes
| File | State after Phase 6 | Phase 7 changes |
|---|---|---|
broker.lua |
chat_stream(cfg, msgs, on_delta, opts) with text + tool_call kinds; chat returns text |
Extract usage from final SSE chunk; emit on_delta("usage", payload); chat returns (text, usage). New opts.include_usage (default true); new opts.category (passed through as a tag in the usage payload). |
context.lua |
system prompt + turns + memory + project + summary | Add self.usage_totals (table) + self.cost_warn_fired (bool). New helpers: Context:add_usage(model, category, usage), Context:total_cost(), Context:total_tokens(). Context:reset does NOT clear usage_totals (parity with memory_items / project per R8). |
repl.lua |
ask_ai + delegate + summarize callbacks + Norris helpers | Wire opts.category at each broker call site (main / delegate / summarize / memory_summarize). Wire on_delta("usage", ...) -> ctx:add_usage(...). New :cost and :cost detail / :cost reset metas. Cost-warn check after each add_usage call. |
safety.lua |
norris_step + is_destructive | Pass opts.category = "norris" (for the main chat_stream call) and "probe" (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since safety.llm_model = "cloud" is the recommended setting. |
history.lua |
session.log_turn appends JSONL entries | log_turn already takes turn opaquely; assistant turns will carry usage if present and it'll serialize via dkjson. No code change unless filter desired. |
config.lua |
example blocks for mcp/safety/memory/routing/secrets/hooks/project | Add commented-out cost = { warn_at_dollars, warn_at_tokens } block. |
docs/PHASE0.md |
§11 lists phases 0-6 | Amendment landed at 3bad07b (formulate commit). N5: commit 6 does NOT re-apply. |
No new module files.
4. Pillar 1 — Usage capture in broker
SSE shape (provider-by-provider — confirm in baseline)
For OpenAI-compatible streams with stream_options: { include_usage: true }:
data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hi"}, ...}]}
data: {"id":"...","choices":[{"index":0,"delta":{}, "finish_reason":"stop"}]}
data: {"id":"...","choices":[],"usage":{"prompt_tokens":15,"completion_tokens":3,"total_tokens":18,"cost":0.00004,"cost_details":{...}}}
data: [DONE]
The final usage event arrives AFTER finish_reason but BEFORE [DONE].
choices is empty [] on the usage event.
For non-streaming chat: usage is in the response body at the top level.
broker.chat is a wrapper around chat_stream, so it inherits the on_delta
path.
For local llama.cpp via hossenfelder: usage may or may not be present depending on the proxy's version. Treat absence as zero-cost / unknown.
Extraction algorithm
local final_usage = nil
local function on_event(data)
...
-- N1: this branch is INDEPENDENT of the choice branch below;
-- check unconditionally. Per B2, local emits usage on a
-- choices=[] chunk (choice nil); cloud emits on a non-empty
-- choices chunk (with finish_reason). Both shapes funnel here.
if doc.usage then
-- R2: payload.model is ALWAYS the caller-stable model_cfg.model
-- (chat_stream's local upvar). When called via call_broker's
-- fallback retry, this naturally reflects the fallback's
-- model name — wrapper callers can key by payload.model
-- without tracking primary-vs-fallback themselves.
final_usage = {
prompt_tokens = doc.usage.prompt_tokens or 0,
completion_tokens = doc.usage.completion_tokens or 0,
total_tokens = doc.usage.total_tokens or 0,
-- R6: keep nil-vs-0 distinction at this layer; the
-- accumulator decides how to tag local-vs-cloud-zero.
cost = doc.usage.cost, -- nil for local
model = model_cfg.model, -- caller-stable per B4
category = opts.category or "main",
}
-- Don't emit yet — the [DONE] event marks stream end; emit
-- once we exit the curl.post_sse loop so the caller sees
-- usage as the LAST event in the stream order.
end
-- ... existing text + tool_call handling (unchanged) ...
end
-- After curl.post_sse returns (stream complete). R3-related:
-- only emit on successful streams; transport / api errors skip
-- the usage event (caller sees the error path and accumulator
-- stays unchanged).
if api_err then return nil, "api: " .. api_err end
if not ok then return nil, "transport: " .. tostring(err) end
if final_usage then on_delta("usage", final_usage) end
return true
M.chat capture (R1 — BLOCKER fix)
M.chat is the non-streaming buffering wrapper. Its existing on_delta
only captured text. Under Phase 7 it MUST also capture the usage
payload — otherwise EVERY non-streaming caller (summarize, delegate,
memory_summarize, probe — 4 of 5 categories) silently reports zero.
function M.chat(model_cfg, messages, opts)
local parts = {}
local captured_usage -- R1: required so M.chat returns (text, usage)
local ok, err = M.chat_stream(model_cfg, messages,
function(kind, payload)
if kind == "text" then parts[#parts + 1] = payload
elseif kind == "usage" then captured_usage = payload
end
end, opts)
if not ok then return nil, err end
return table.concat(parts), captured_usage
end
Existing callers that do local r = broker.chat(...) automatically
drop the second value (Lua semantics). Callers that want usage do
local r, u = broker.chat(...).
Outbound include_usage
local body_table = { model = ..., messages = ..., stream = true }
if opts.include_usage ~= false then
body_table.stream_options = { include_usage = true }
end
Risk: some providers reject unrecognized fields. Baseline check; if any
host throws on stream_options, the per-model opt-out is one line.
Category tagging
opts.category is a string set by the caller. broker echoes it into the
emitted usage payload so the accumulator knows what to credit. Default
category if absent: "main".
5. Pillar 2 — Accumulator on ctx
Shape
ctx.usage_totals = {
-- [model_name] = { [category] = { prompt = N, completion = N,
-- calls = N, cost = N } }
fast = {
main = { prompt = 1234, completion = 567, calls = 14, cost = 0 },
},
cloud = {
main = { prompt = 3850, completion = 980, calls = 8, cost = 0.0180 },
delegate = { prompt = 250, completion = 80, calls = 1, cost = 0.0012 },
probe = { prompt = 150, completion = 30, calls = 1, cost = 0.0042 },
},
}
ctx.cost_warn_fired = false
add_usage
function Context:add_usage(model, category, u)
model = model or "?"
category = category or "main"
self.usage_totals = self.usage_totals or {}
local m = self.usage_totals[model] or {}
local c = m[category] or {
prompt = 0, completion = 0, calls = 0, cost = 0,
is_local = false, -- R6: cloud unless any usage came w/o cost
}
c.prompt = c.prompt + (u.prompt_tokens or 0)
c.completion = c.completion + (u.completion_tokens or 0)
c.calls = c.calls + 1
-- R6: preserve nil-vs-0 distinction. A `nil` cost means the
-- provider doesn't emit cost (i.e., local llama.cpp). Sticky:
-- once a slot has seen any nil-cost call, it's flagged is_local.
if u.cost == nil then
c.is_local = true
else
c.cost = c.cost + u.cost
end
m[category] = c
self.usage_totals[model] = m
end
function Context:total_cost()
local total = 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do total = total + c.cost end
end
return total
end
function Context:total_tokens()
local p, comp = 0, 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do
p = p + c.prompt
comp = comp + c.completion
end
end
return p, comp
end
Reset semantics
Context:reset() deliberately does NOT clear usage_totals —
matches R8 invariant from Phase 6 (:reset clears turns,
pending_exec_output, summary; preserves memory_items, project,
and now usage_totals). The user reset their conversation, not their
cost meter. :cost reset is the explicit reset verb for the meter.
6. Pillar 3 — :cost meta
:cost summary line
:cost detail per-model + per-category breakdown
:cost reset zero out ctx.usage_totals + cost_warn_fired
Summary format (R10 — 6-decimal precision for sub-cent costs):
[aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
cost=$0.023400 (cloud only; local: 0)
Detail format (R7 — sort key is (cost desc, model asc, category asc)
for deterministic ordering on equal-cost rows; R6 — annotation comes
from the slot's is_local flag, NOT a cost == 0 heuristic):
[aish] session usage detail:
cloud main 8 calls, 3,850 / 980 tokens, $0.018000
cloud delegate 1 call, 250 / 80 tokens, $0.001200
cloud probe 1 call, 150 / 30 tokens, $0.004200
fast main 14 calls, 8,200 / 2,100 tokens, $0 (local)
Implementation: pure Lua iteration over ctx.usage_totals; no broker
calls. Sort flattens into a list, sorts via table.sort with explicit
3-level comparator: cost desc, model asc, category asc.
7. Pillar 4 — Warning thresholds
Config:
cost = {
warn_at_dollars = 0.50, -- emit once when cumulative cost crosses
warn_at_tokens = 100000, -- emit once when cumulative tokens crosses
}
R5 centralizes the check inside a single _record_usage(model, cat, u)
helper in repl.lua. This is the ONLY place that calls
ctx:add_usage; safety.lua call sites route through it via the
helpers.on_usage / opts.on_usage callback. Keeps context.lua
decoupled from renderer (no module-coupling violation).
R4: two independent flags (one per threshold) — first-to-fire must NOT suppress the other.
-- repl.lua (sketch):
local function _record_usage(model, category, u)
ctx:add_usage(model, category, u)
if not (config.cost) then return end
ctx.cost_warn_state = ctx.cost_warn_state or { dollars = false, tokens = false }
local cw = ctx.cost_warn_state
if config.cost.warn_at_dollars and not cw.dollars then
local cost = ctx:total_cost()
if cost >= config.cost.warn_at_dollars then
-- R10: 6-decimal format for sub-cent visibility
renderer.status(("session cost $%.6f has crossed warn_at_dollars=$%.6f")
:format(cost, config.cost.warn_at_dollars))
cw.dollars = true
end
end
if config.cost.warn_at_tokens and not cw.tokens then
local p, c = ctx:total_tokens()
if (p + c) >= config.cost.warn_at_tokens then
renderer.status(("session tokens %d has crossed warn_at_tokens=%d")
:format(p + c, config.cost.warn_at_tokens))
cw.tokens = true
end
end
end
One-shot per threshold per session. :cost reset clears both
totals AND both warn flags atomically.
8. UX Surface Summary
| Meta | Behavior |
|---|---|
:cost |
One-line summary: calls / tokens / cost |
:cost detail |
Per-model + per-category breakdown |
:cost reset |
Zero out totals + clear warn-fired flag |
| Config | Default | Effect |
|---|---|---|
cfg.cost.warn_at_dollars |
nil | Status when cumulative cost first crosses this dollar amount |
cfg.cost.warn_at_tokens |
nil | Status when cumulative total tokens first crosses |
(broker opts.include_usage) |
true | Adds stream_options.include_usage = true to outbound request |
R9 boundary note: :resume <name> reloads turns for conversation
continuity but does NOT reconstruct ctx.usage_totals from the
per-turn usage fields stored in the session JSONL. After :resume,
the cost meter starts fresh from zero for the resumed session's live
calls. The historical usage IS in the JSONL for after-the-fact
scripting; cross-session aggregation is Q-C2 deferred work.
9. Out of Scope (Phase 7)
- Cross-session cost persistence — Q-C2 defers
<history.dir>/cost.jsonlrollup; v1 is session-only. Per-turn usage IS in the session JSONL for after-the-fact aggregation if anyone wants to script it. - Per-model rate limiting / cost caps that REFUSE the call — v1 only warns. A future phase could add a hard cap that aborts before the broker call.
- Pricing-table fallback for local models — if a local model doesn't
emit
usage.cost, we record 0. Estimating cost from token count + a static pricing table is a future polish (most users won't care about local "cost" anyway — local is free). - Pretty token-bandwidth charts / sparklines — out of scope; the detail breakdown is text-only.
- Estimated cost for future turns — no preflight cost prediction.
- MCP tool-call usage — MCP servers don't expose token usage; broker calls invoked DURING MCP tool dispatch ARE captured (because they go through the same path), but the MCP tool call itself isn't.
10. Risks
| Risk | Mitigation |
|---|---|
Some providers reject stream_options -> SSE errors at the top of the stream |
opts.include_usage = false opt-out per call site; baseline-time probe of the actual hossenfelder broker behavior |
OpenRouter cost field shape varies between providers (Bedrock vs. Baidu vs. Together vs. ...) |
Capture usage.cost as-is (number); document that the same provider must be used for cross-call comparison |
Local llama.cpp returns no cost -> displayed $0 could mislead user "is this REALLY free?" |
:cost detail annotates local lines with (local) literal; summary says cost=$X (cloud only; local: 0) |
ctx.usage_totals grows unboundedly with new model names mid-session |
Bounded by #models in config × #categories — small constants. No mitigation needed. |
| Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold | Acceptable for v1; user can :cost reset to re-arm. Future polish: warn at each Nx multiple. |
R8: call_broker fallback retry passes opts.include_usage unchanged |
Documented assumption: B1 confirmed both backends accept the flag. If a future fallback host rejects, the call-site that knows can pass opts.include_usage = false explicitly. |
11. Open Questions (Phase 7)
| # | Question | Impact | Resolution target |
|---|---|---|---|
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. | |
| Q-C2 | Cross-session cost persistence (cost.jsonl) |
Deferred to follow-up phase 8; v1 is session-only. | |
| Q-C3 | Categories closed-set vs free-form | A4 — free-form; caller decides. Matches Phase 6 helpers/skills convention. | |
| Q-C4 | stream_options forwarding by hossenfelder |
B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct. | |
| Q-C5 | Warn fires on the crossed call or the next | A5 — on the crossed call (no UX-defeating delay). | |
| Q-C6 | :reset clears cost_warn_fired |
A6 — no, only :cost reset clears the flag (R8 parity). |
12. Phase 7 → Phase 8+ Out-of-band
Candidate follow-ups (non-binding):
- Phase 8: cross-session cost persistence (Q-C2 deferral), with optional cost dashboards / weekly rollup reporter.
- Hard rate limits / cost caps that REFUSE the call — an extension of the warn surface that promotes warnings into preflight enforcement.
- Better tokenization (Q1 deferred-from-Phase-3): replace the char/4
heuristic on
Context:estimate_tokens()with model/tokenizecalls. Indirectly improves accuracy of any future "preflight cost predictor".
Phase 7 itself is self-contained — no upstream dependencies.
13. Implementation Plan (commit-by-commit)
Bottom-up; broker first (it's the egress point that all callers depend on), then context (the accumulator), then the call-site rewires, then the user-facing meta + warn surface, then config + status bump. Each commit leaves the tree green (existing tests + load smoke + per-commit feature smoke).
Order
-
broker.lua— usage capture + signature widening.build_request(model_cfg, messages, stream, opts)widened to take an opts table; opts.tools / opts.max_tokens fold in from the existing positional args.- R3: TWO internal callers of
build_requestexist inside broker.lua itself (M.chat_streamat line 65-66 and indirectly viaM.chat). Both must be updated in this commit; the migration is CONTAINED but not zero-touch. "Every caller already passes opts" refers to the public surface — internalbuild_requestwas positional. - Opts.include_usage (default true) adds
stream_options.include_usage = trueto the request body (per B1, required for local). M.chat_streamevent loop addsif doc.usage then final_usage = doc.usage end; aftercurl.post_ssereturns, iffinal_usageis set,on_delta("usage", payload)is called. Payload includesmodel = model_cfg.model(caller-stable per B4 + R2), the raw token counts, andcostas a number (nil for local per B3).- opts.category passthrough — the broker just echoes it into the emitted usage payload; doesn't validate (per A4 free-form).
- R1:
M.chat(non-streaming wrapper) MUST capture usage in its internal on_delta and return(text, usage). Without this, four out of five non-streaming categories silently report zero. §4 shows the explicit update. - Smoke: hand-build a request with stream_options, capture all
three on_delta kinds (text, tool_call when applicable, usage),
confirm usage payload matches what curl shows. Also smoke
broker.chat(...)returns non-nil usage for cloud calls.
-
context.lua— accumulator + helpers.Context.new:self.usage_totals = {}+self.cost_warn_fired = false.Context:add_usage(model, category, usage)— incrementsusage_totals[model][category]slots.Context:total_cost()— sums all cost fields across all models/categories.Context:total_tokens()— sums prompt + completion separately.Context:reset— does NOT touchusage_totalsorcost_warn_fired(R8 parity withmemory_itemsandproject).- Smoke: 4-case inline test of add_usage / totals / reset preservation.
-
repl.lua— wire opts.category + on_delta("usage") at non-Norris call sites. N3: depends on commit 1's R1 M.chat fix shipping first. This commit's "capture the second return value" pattern only works after M.chat actually returns one._record_usage(model, category, usage)helper (R5) — the single chokepoint that wrapsctx:add_usageAND does the warn check. Replaces all directctx:add_usage(...)invocations in repl.lua.- call_broker wrapper (used by ask_ai): pass
opts.category = "main"; the wrapped on_delta handleskind == "usage"by calling_record_usage(payload.model, payload.category, payload)— keys by payload.model per R2 (handles fallback retry correctly without tracking primary-vs-fallback at the wrapper). - DELEGATE: handler: opts.category = "delegate"; capture second
return value from broker.chat and feed to
_record_usage. - :delegate meta: opts.category = "delegate"; same.
- summarize-on-evict callback: opts.category = "summarize"; same.
- :memory summarize: opts.category = "memory_summarize"; same.
- Smoke: send one cloud prompt, observe ctx.usage_totals grows; also smoke the fallback path with a deliberately-broken primary and confirm usage credits the fallback model name (R2 verification).
-
safety.lua— opts.category for Norris + probe.- safety.norris_step's broker.chat_stream call: pass
opts.category = "norris". The on_delta wrapper inside safety.lua already widens (post-#52) to handlekind == "text"(rehydration); now also handleskind == "usage"by callinghelpers.on_usage(payload.model, payload.category, payload). R5: helpers.on_usage IS repl.lua's_record_usage. - N4 signature chain widening:
llm_probe,llm_second_opinion, andM.is_destructiveall widen to threadopts.on_usagethrough:llm_probe(model_cfg, system, cmd, opts)— passopts.category = "probe"to broker.chat; on the(text, usage)return, ifopts.on_usageANDusage, callopts.on_usage(usage.model, usage.category, usage).llm_second_opinion(cmd, cfg, opts)— pass opts through to both llm_probe calls (probe 1 + probe 2 re-roll).M.is_destructive(cmd, cfg, opts)— opts.on_usage already in the table from #52's scrub_msgs/rehydrate addition; threads through naturally.
- Smoke: a Norris session shows both "norris" and "probe" category entries in :cost detail; the probe model is named correctly (e.g. "cloud" if safety.llm_model = "cloud").
- safety.norris_step's broker.chat_stream call: pass
-
repl.lua— :cost meta + warn-threshold + HELP.- :cost (summary), :cost detail (per-model+category breakdown), :cost reset (zero totals + clear cost_warn_fired).
- After every ctx:add_usage call (centralized in a helper if possible), check cfg.cost.warn_at_dollars / warn_at_tokens; emit one-shot status if crossed AND cost_warn_fired is false.
- HELP gains 3 lines for :cost.
- Smoke: :cost shows totals; :cost detail breaks down; warn fires once when threshold crossed; :cost reset re-arms.
-
config.luaexample block +docs/PHASE7.mdstatus bump.- Commented-out
cost = { warn_at_dollars = 0.50, warn_at_tokens = 100000 }block in config.lua. - N5: PHASE0.md §11 amendment is already in tree (committed
at
3bad07bwith the formulate doc). Commit 6 must NOT re-apply. - PHASE7.md status header → Implement (matches Phase 5/6 cadence — manifest tracks implementation state).
- Commented-out
Risk index per commit
| Commit | Risk | Mitigation |
|---|---|---|
| 1 (broker) | R3: build_request has TWO INTERNAL callers in broker.lua; both must be updated in this commit | Explicit in commit-1 note above; grep build_request\( to confirm |
| 1 (broker) | R1: M.chat must capture usage in on_delta and return (text, usage) | §4 shows the explicit M.chat update; smoke test verifies non-nil usage on cloud call |
| 1 (broker) | M.chat second return value confuses callers that do local r = broker.chat(...) discarding the second |
Lua doesn't error on dropped return values; backward-compat preserved automatically |
| 2 (context) | usage_totals nil on old ctx serializations | Defensive self.usage_totals = self.usage_totals or {} in add_usage; no migration needed |
| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for broker.chat\( and broker.chat_stream\( after the wire commit; ensure each is tagged with opts.category |
| 3 (repl wires) | R2: fallback retry credits usage to wrong model | wrapped on_delta keys by payload.model (set inside broker per R2), NOT by outer model_name; smoke a deliberately-broken-primary case |
| 4 (safety wires) | safety.lua must NOT introduce new module dep | Use helpers.on_usage callback convention (matches #52's scrub_msgs) |
| 4 (safety wires) | N4: llm_probe → llm_second_opinion → is_destructive signature chain widening | Spelled out in commit-4 note above |
| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | per-threshold one-shot flag in ctx.cost_warn_state; explicit :cost reset to re-arm both |
| 5 (:cost + warn) | R4: single shared flag covers two thresholds | RESOLVED — split into cost_warn_state.dollars + .tokens |
| 6 (config + status) | N5: PHASE0 §11 already amended at 3bad07b |
This commit does NOT re-apply the amendment |
Tests + smoke per commit
Each commit:
- Pass
luajit test_safety.lua(87/87) andluajit test_router_model.lua(31/31) - Load cleanly via
luajit -e 'package.path=...; require("repl"); print("ok")' - Pass a per-feature smoke (described in each row above)
Things deliberately NOT split
- broker.chat backward-compat shim — Lua's multiple-return-values
semantics handle it automatically (existing
local r = broker.chat(..)drops the newusagevalue). - Per-category sub-tables — flat
model -> category -> countersis simple enough; nesting deeper for e.g. timestamps is v2. - Cross-session persistence — explicitly Q-C2 deferred to phase 8.
Open at plan-time (resolve at implement)
- Whether
safety.is_destructive's opts should carryon_usagecallback explicitly OR thread through cfg.helpers (the latter matches the Norris helpers convention but is more coupling). Decide at commit 4. Default to explicit opts.on_usage for minimum surface. - Whether to emit a
[aish] usage: model=X prompt=N completion=M cost=$Xstatus line PER TURN (verbose mode) or only via :cost on demand. v1 = on demand only; verbose mode is a follow-up nice-to-have.