Files
marfrit 1f34b6dce8 config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE7.md). N5: PHASE0 §11 amendment landed in commit 3bad07b
(formulate); not re-applied here.

config.lua:
  - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block
    with parity to the Phase 1-6 example blocks.
  - Notes warn flags are independent (R4) and per-turn usage flows
    to session/*.jsonl for after-the-fact analysis.

docs/PHASE7.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits inline for traceability:
      7364963  broker: usage capture + opts widening
      7b4a9be  context: accumulator helpers
      8adebd5  repl: _record_usage + opts.category at 5 sites
      b30212a  safety + repl: opts.category for Norris + probe
      0d6ff93  repl: :cost meta surface
      this     config example + status bump

Phase 7 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:55 +00:00

41 KiB
Raw Permalink Blame History

aish — Phase 7 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 7 Requirements, Architecture & Design Decisions Status: Implement (6 commits landed: 7364963, 7b4a9be, 8adebd5, b30212a, 0d6ff93, this) Date: 2026-05-16

Review findings (independent Sonnet agent, 2026-05-16) — 3 BLOCKERs resolved in-place, 6 CONCERNs folded, 5 NITs applied:

R1 (BLOCKER, RESOLVED). M.chat would silently return (text, nil) for ALL non-streaming callers. M.chat's internal on_delta only captures kind == "text". Without explicit handling of kind == "usage", four out of five categories that go through broker.chat (summarize / delegate / memory_summarize / probe) would report zero usage even after a cloud round-trip. Fix folded into §4 + §13 commit 1: M.chat's on_delta also captures the usage payload and returns it as the second value.

R2 (BLOCKER, RESOLVED). call_broker fallback retry — usage payload's model field credits the WRONG model name. The wrapped on_delta in call_broker is closed over the PRIMARY's name; if the wrapped function uses an outer-scope model_name variable to key the accumulator, the fallback's usage gets misattributed. Resolution: the broker emits payload.model = model_cfg.model (which IS the fallback's model when called with fb_cfg — chat_stream's local upvar). The wrapper keys by payload.model, NOT by the outer model_name. Documented in §4 emission code + §13 commit 3 (wrapped on_delta uses payload.model for accumulator keying).

R3 (BLOCKER, RESOLVED — promoted to docs). build_request has TWO internal callers inside broker.lua itself, not just the public surface. Migration is contained but both internal sites must be updated in commit 1. Plan §13 commit 1 risk row updated to call this out explicitly so the implementer doesn't read "every caller already passes opts" as "only external callers need touching".

R4 (CONCERN, FOLDED). Single cost_warn_fired flag for two thresholds is broken. When both warn_at_dollars AND warn_at_tokens are configured, the first-to-fire suppresses the other. Fix: ctx.cost_warn_fired becomes ctx.cost_warn_state = { dollars = false, tokens = false }. Each threshold has its own flag; :cost reset clears both. §7 pseudocode updated.

R5 (CONCERN, FOLDED). Warn-check centralization decided: use a single _record_usage(model, category, usage) helper inside repl.lua that wraps ctx:add_usage AND does the threshold check AND calls renderer.status when crossed. context.lua stays decoupled from renderer. safety.lua call sites get helpers.on_usage = _record_usage in the helpers table; probe callsite gets opts.on_usage = _record_usage. Single chokepoint for the warn check. §3 + §7 + §13 commits 3-5 reflect.

R6 (CONCERN, FOLDED). nil vs 0 cost distinction must be preserved at the accumulator level. Local-model $0 (no cost field) vs cloud-call-that-happens-to-cost-zero need to be distinguishable for :cost detail annotation. Fix: accumulator slot gains is_local = true when ANY recorded usage for that slot had cost == nil. Cloud calls with cost = 0 (rare) stay annotated as cloud. §5 pseudocode + §6 annotation logic updated.

R7 (CONCERN, FOLDED). :cost detail sort needs three-level key for determinism. Lua's table.sort is unstable; equal-cost rows would have arbitrary order. Fix: sort key is (cost desc, model asc, category asc). §6 updated.

R8 (CONCERN, FOLDED). call_broker fallback passes opts.include_usage unchanged. Documented as a known assumption (B1 confirms both backends accept; if a future fallback host rejects, the call-site can pass include_usage = false explicitly). §10 risk row added.

R9 (CONCERN, FOLDED). :resume does NOT restore historical usage_totals. Per-turn usage IS in the session JSONL but :resume reloads turns for conversation continuity only; the accumulator stays empty. Documented in §8 surface notes; users who want cross-session totals can script the jsonl or wait for the deferred Q-C2 follow-up.

R10 (CONCERN, FOLDED). $%.4f loses sub-cent precision. A 0.000028 cloud cost displays as $0.0000 — indistinguishable from $0 local. Fix: format strings widened to $%.6f in §6 (and the warn message in §7). 6 decimal places accommodates the smallest observed real cost.

R-N1..N5 (NITs, APPLIED):

N1. §4 extraction pseudocode gains a comment noting the if doc.usage branch is INDEPENDENT of the choice branch and must be checked regardless of choice nil-ness (handles both B2 emission shapes). N2. §2 "Cost extraction" row referenced stale "B7"; corrected to B3. N3. §13 commit 3 row gains an explicit dependency note: commit 3's "capture the new second return value" requires commit 1's M.chat fix from R1 to ship first. N4. §3 safety.lua row + §13 commit 4 row spell out the signature chain: llm_probellm_second_opinionM.is_destructive all widen to thread opts.on_usage through. N5. §3 PHASE0.md row + §13 commit 6 row — the PHASE0 §11 amendment is ALREADY in tree (committed at 3bad07b with the formulate doc). Commit 6 should NOT re-apply; only adds config.lua block + bumps PHASE7 status header.

Analyze findings (2026-05-16):

A1. broker.chat_stream surface is clean for the extension. The existing on_event(data) closure inside M.chat_stream already parses doc.error / doc.choices / delta / tool_calls — adding if doc.usage then final_usage = ... end is one block. Emission happens via a closure-local final_usage that the post-loop code in chat_stream reads and calls on_delta("usage", final_usage) on. build_request needs minor extension OR (cleaner) chat_stream inserts stream_options.include_usage = true into the body table AFTER json.encode — but we currently encode in build_request. Cleanest: extend build_request(model_cfg, messages, stream, opts) so it can read opts.include_usage. Phase 7 simplifies the signature in passing.

A2. 7 caller sites identified for opts.category threading:

| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |

All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).

A3. build_request signature simplification. Today it takes (model_cfg, messages, stream, tools, max_tokens) — five positional args. With Phase 7 needing include_usage AND stream_options, positional growth gets unwieldy. Resolution: widen to (model_cfg, messages, stream, opts) where opts carries {tools, max_tokens, include_usage, stream_options}. Callers in M.chat_stream and M.chat pass their existing opts table through. This is a refactor but contained inside broker.lua.

A4. Q-C3 RESOLVED: free-form categories. The closed-set vs free-form debate resolved in favor of free-form per the helpers/skills convention already in place (Phase 6 :tree / :diff metas don't validate sub-args either). :cost detail will show whatever categories appear — small + documented closed set in practice (7 entries from A2), no surprise.

A5. Q-C5 RESOLVED: warn fires on the call that crossed. The crossed call's usage IS in the accumulator at the moment we check (we check AFTER add_usage). Firing on the NEXT call would mean a delay of one full broker round-trip before the user sees the warn — defeats the purpose. Just emit-on-cross.

A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired. Parity with usage_totals itself (per the §2 decision row); the user reset their conversation, not their cost meter. The flag AND the totals are reset only by the explicit :cost reset verb.

A7. Norris call-graph rewires (existing safety.lua:354 path): with issue #52 wired (commit 955bd82), the Norris broker call now passes helpers.scrub_msgs / helpers.streaming_rehydrator. The on_delta wrapping pattern means I need to be careful that the new ("usage", payload) kind also flows through any wrapper. Since secrets streaming_rehydrator only matches on kind == "text", the "usage" kind passes through unchanged. No new entanglement.

A8. ctx.usage_totals survives :reset per R8 — same invariant as memory_items (Phase 4) and project (Phase 6). Documented in §5 of the manifest; reinforces the "ambient context survives conversation reset" rule.

A9. Session JSONL serialization — assistant turn dict gets an optional usage field. history.lua log_turn currently calls json.encode(turn) opaquely; the dkjson serializer handles nested tables. No code change needed; the new field flows through automatically when the assistant turn carries one.

A10. Q-C1 PARTIAL: local providers may not emit usage. The formulate-time assumption was "treat absence as zero-cost / unknown". A real probe against qwen-coder-7b-snappy-8k is a baseline action — see B-probes below. The implementation will be defensive: if doc.usage never appears in the stream, no "usage" event is emitted, and the accumulator is unchanged for that turn. :cost output naturally reflects "0 calls counted for local model" if that's the case.

A11. Q-C4 deferred to baseline: actual stream_options forwarding by the hossenfelder proxy must be probed against a live broker. If the proxy strips the option, we get no usage events even for cloud calls. Baseline action.

PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest specifies what Phase 7 adds — cost / usage observability: the ability to know, mid-session, how many tokens you've spent and how much money the paid-cloud calls have cost.

PHASE0 §11 originally listed phases only through 6; this commit amends §11 to add Phase 7.


1. Scope of Phase 7

Four pillars:

  1. Usage capture in brokerbroker.chat_stream extracts the provider's usage block (and cost where present) from the response stream. Surfaces it to the caller via a new on_delta("usage", ...) kind. The existing broker.chat buffering wrapper exposes it as a second return value (text, usage). Backward-compatible: callers that don't handle the new kind / second value simply ignore it.

  2. Per-session accumulator on ctx — running totals per-model AND per-call-category (main / delegate / summarize / probe) accumulate on ctx.usage_totals. No persistence across sessions in v1 (Q-C2 defers cross-session); the session-log JSONL files DO carry per-turn usage so historical analysis is possible after the fact.

  3. :cost meta — a :cost reporter that shows the current session totals, with optional :cost detail for the per-model + per-category breakdown. Zero broker calls (purely local read of ctx.usage_totals).

  4. Optional warning thresholdscfg.cost.warn_at_dollars and cfg.cost.warn_at_tokens emit a status the first time the running total crosses the configured threshold. Default off (no warnings without config). Useful when cloud presets are configured and you want a "you've spent $1 this session" nudge before runaway cost.

Phase 7 is done when:

  • broker.chat_stream exposes usage via the new on_delta("usage", ...) callback kind; broker.chat returns (text, usage). Backward compat preserved (no existing caller breaks).
  • After a session with mixed local + cloud calls, :cost prints a total like:
    [aish] session usage: 24 turns, prompt=12,450 / completion=3,210 tokens
                                    cost=$0.0234 (cloud only; local: 0)
    
  • :cost detail breaks down by model + category:
    fast    main: 14 turns, 8200/2100 tokens
    cloud   main: 8 turns, 3850/980 tokens, $0.0180
    cloud   delegate: 1 turn, 250/80 tokens, $0.0012
    cloud   probe: 1 turn, 150/30 tokens, $0.0042
    
  • Session JSONL gains a usage field on assistant turns (when the broker returned one).
  • With cfg.cost.warn_at_dollars = 0.50 set, crossing $0.50 cumulative emits exactly one status line.
  • Existing configs without cfg.cost behave exactly like Phase 6 (Phase 6 regression coverage).

2. Technology Decisions (delta from Phase 6)

Decision Choice Rationale
Where to extract usage In broker.chat_stream event loop, looking at each SSE event's usage field on the final chunk The OpenAI streaming spec puts usage on the FINAL chunk when stream_options: { include_usage: true } is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline).
New on_delta kind on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? }) Mirrors the existing ("text", chunk) / ("tool_call", call) shape. Callers ignore unknown kinds; backward-compatible.
Where to enable usage on the wire opts.include_usage = true (default true) sets stream_options.include_usage = true in the outbound request body Off-switch for hosts that reject stream_options. Defaults on; baseline probe confirms current broker tolerates it. (A3: build_request signature widens to take an opts table; positional growth was getting unwieldy.)
Accumulator location ctx.usage_totals[model_name][category] table ctx is per-conversation; matches the :reset-survives-or-not rules already in place.
Categories "main" (ask_ai), "delegate", "summarize", "memory_summarize", "probe", "norris" One-tag-per-call-site. Tagged at the caller site (caller passes opts.category to broker.chat_stream).
Cost extraction usage.cost (OpenRouter convention; dollars as a number). For Anthropic/Bedrock the cost arrives in dollars on usage.cost. For pure local llama.cpp: no cost field — record as nil (R6 — preserves the local-vs-cloud-zero distinction in the accumulator). Single field name across observed providers per baseline B3.
Cost precision Store as number (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision) No floating-point cumulative-error concerns at this scale.
Warning trigger First crossing of either threshold emits a single status: [aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY. Crossed-flag stored on ctx; reset only on session end / :cost reset. One-shot to avoid spamming.
:reset interaction :reset does NOT clear ctx.usage_totals (parity with memory_items/project) — the user reset their conversation, not their cost tracking. :cost reset is the explicit reset verb. Matches R8 invariant from Phase 6.
Session-log persistence Assistant turn entries gain an optional usage field when broker returned one. history.lua log_turn writes it through verbatim. Per-turn granularity preserved for after-the-fact analysis. No new file.

3. Module Changes

File State after Phase 6 Phase 7 changes
broker.lua chat_stream(cfg, msgs, on_delta, opts) with text + tool_call kinds; chat returns text Extract usage from final SSE chunk; emit on_delta("usage", payload); chat returns (text, usage). New opts.include_usage (default true); new opts.category (passed through as a tag in the usage payload).
context.lua system prompt + turns + memory + project + summary Add self.usage_totals (table) + self.cost_warn_fired (bool). New helpers: Context:add_usage(model, category, usage), Context:total_cost(), Context:total_tokens(). Context:reset does NOT clear usage_totals (parity with memory_items / project per R8).
repl.lua ask_ai + delegate + summarize callbacks + Norris helpers Wire opts.category at each broker call site (main / delegate / summarize / memory_summarize). Wire on_delta("usage", ...) -> ctx:add_usage(...). New :cost and :cost detail / :cost reset metas. Cost-warn check after each add_usage call.
safety.lua norris_step + is_destructive Pass opts.category = "norris" (for the main chat_stream call) and "probe" (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since safety.llm_model = "cloud" is the recommended setting.
history.lua session.log_turn appends JSONL entries log_turn already takes turn opaquely; assistant turns will carry usage if present and it'll serialize via dkjson. No code change unless filter desired.
config.lua example blocks for mcp/safety/memory/routing/secrets/hooks/project Add commented-out cost = { warn_at_dollars, warn_at_tokens } block.
docs/PHASE0.md §11 lists phases 0-6 Amendment landed at 3bad07b (formulate commit). N5: commit 6 does NOT re-apply.

No new module files.


4. Pillar 1 — Usage capture in broker

SSE shape (provider-by-provider — confirm in baseline)

For OpenAI-compatible streams with stream_options: { include_usage: true }:

data: {"id":"...","choices":[{"index":0,"delta":{"content":"Hi"}, ...}]}
data: {"id":"...","choices":[{"index":0,"delta":{}, "finish_reason":"stop"}]}
data: {"id":"...","choices":[],"usage":{"prompt_tokens":15,"completion_tokens":3,"total_tokens":18,"cost":0.00004,"cost_details":{...}}}
data: [DONE]

The final usage event arrives AFTER finish_reason but BEFORE [DONE]. choices is empty [] on the usage event.

For non-streaming chat: usage is in the response body at the top level. broker.chat is a wrapper around chat_stream, so it inherits the on_delta path.

For local llama.cpp via hossenfelder: usage may or may not be present depending on the proxy's version. Treat absence as zero-cost / unknown.

Extraction algorithm

local final_usage = nil

local function on_event(data)
    ...
    -- N1: this branch is INDEPENDENT of the choice branch below;
    -- check unconditionally. Per B2, local emits usage on a
    -- choices=[] chunk (choice nil); cloud emits on a non-empty
    -- choices chunk (with finish_reason). Both shapes funnel here.
    if doc.usage then
        -- R2: payload.model is ALWAYS the caller-stable model_cfg.model
        -- (chat_stream's local upvar). When called via call_broker's
        -- fallback retry, this naturally reflects the fallback's
        -- model name — wrapper callers can key by payload.model
        -- without tracking primary-vs-fallback themselves.
        final_usage = {
            prompt_tokens     = doc.usage.prompt_tokens or 0,
            completion_tokens = doc.usage.completion_tokens or 0,
            total_tokens      = doc.usage.total_tokens or 0,
            -- R6: keep nil-vs-0 distinction at this layer; the
            -- accumulator decides how to tag local-vs-cloud-zero.
            cost              = doc.usage.cost,   -- nil for local
            model             = model_cfg.model,  -- caller-stable per B4
            category          = opts.category or "main",
        }
        -- Don't emit yet — the [DONE] event marks stream end; emit
        -- once we exit the curl.post_sse loop so the caller sees
        -- usage as the LAST event in the stream order.
    end
    -- ... existing text + tool_call handling (unchanged) ...
end

-- After curl.post_sse returns (stream complete). R3-related:
-- only emit on successful streams; transport / api errors skip
-- the usage event (caller sees the error path and accumulator
-- stays unchanged).
if api_err then return nil, "api: " .. api_err end
if not ok    then return nil, "transport: " .. tostring(err) end
if final_usage then on_delta("usage", final_usage) end
return true

M.chat capture (R1 — BLOCKER fix)

M.chat is the non-streaming buffering wrapper. Its existing on_delta only captured text. Under Phase 7 it MUST also capture the usage payload — otherwise EVERY non-streaming caller (summarize, delegate, memory_summarize, probe — 4 of 5 categories) silently reports zero.

function M.chat(model_cfg, messages, opts)
    local parts        = {}
    local captured_usage  -- R1: required so M.chat returns (text, usage)
    local ok, err = M.chat_stream(model_cfg, messages,
        function(kind, payload)
            if kind == "text"  then parts[#parts + 1] = payload
            elseif kind == "usage" then captured_usage = payload
            end
        end, opts)
    if not ok then return nil, err end
    return table.concat(parts), captured_usage
end

Existing callers that do local r = broker.chat(...) automatically drop the second value (Lua semantics). Callers that want usage do local r, u = broker.chat(...).

Outbound include_usage

local body_table = { model = ..., messages = ..., stream = true }
if opts.include_usage ~= false then
    body_table.stream_options = { include_usage = true }
end

Risk: some providers reject unrecognized fields. Baseline check; if any host throws on stream_options, the per-model opt-out is one line.

Category tagging

opts.category is a string set by the caller. broker echoes it into the emitted usage payload so the accumulator knows what to credit. Default category if absent: "main".


5. Pillar 2 — Accumulator on ctx

Shape

ctx.usage_totals = {
    -- [model_name] = { [category] = { prompt = N, completion = N,
    --                                 calls = N, cost = N } }
    fast = {
        main      = { prompt = 1234, completion = 567, calls = 14, cost = 0   },
    },
    cloud = {
        main      = { prompt = 3850, completion = 980, calls = 8,  cost = 0.0180 },
        delegate  = { prompt = 250,  completion = 80,  calls = 1,  cost = 0.0012 },
        probe     = { prompt = 150,  completion = 30,  calls = 1,  cost = 0.0042 },
    },
}
ctx.cost_warn_fired = false

add_usage

function Context:add_usage(model, category, u)
    model    = model    or "?"
    category = category or "main"
    self.usage_totals = self.usage_totals or {}
    local m = self.usage_totals[model] or {}
    local c = m[category] or {
        prompt = 0, completion = 0, calls = 0, cost = 0,
        is_local = false,  -- R6: cloud unless any usage came w/o cost
    }
    c.prompt     = c.prompt     + (u.prompt_tokens or 0)
    c.completion = c.completion + (u.completion_tokens or 0)
    c.calls      = c.calls      + 1
    -- R6: preserve nil-vs-0 distinction. A `nil` cost means the
    -- provider doesn't emit cost (i.e., local llama.cpp). Sticky:
    -- once a slot has seen any nil-cost call, it's flagged is_local.
    if u.cost == nil then
        c.is_local = true
    else
        c.cost = c.cost + u.cost
    end
    m[category] = c
    self.usage_totals[model] = m
end

function Context:total_cost()
    local total = 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do total = total + c.cost end
    end
    return total
end

function Context:total_tokens()
    local p, comp = 0, 0
    for _, m in pairs(self.usage_totals or {}) do
        for _, c in pairs(m) do
            p    = p    + c.prompt
            comp = comp + c.completion
        end
    end
    return p, comp
end

Reset semantics

Context:reset() deliberately does NOT clear usage_totals — matches R8 invariant from Phase 6 (:reset clears turns, pending_exec_output, summary; preserves memory_items, project, and now usage_totals). The user reset their conversation, not their cost meter. :cost reset is the explicit reset verb for the meter.


6. Pillar 3 — :cost meta

:cost                       summary line
:cost detail                per-model + per-category breakdown
:cost reset                 zero out ctx.usage_totals + cost_warn_fired

Summary format (R10 — 6-decimal precision for sub-cent costs):

[aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
                       cost=$0.023400 (cloud only; local: 0)

Detail format (R7 — sort key is (cost desc, model asc, category asc) for deterministic ordering on equal-cost rows; R6 — annotation comes from the slot's is_local flag, NOT a cost == 0 heuristic):

[aish] session usage detail:
  cloud     main      8 calls,  3,850 / 980 tokens,   $0.018000
  cloud     delegate  1 call,     250 / 80  tokens,   $0.001200
  cloud     probe     1 call,     150 / 30  tokens,   $0.004200
  fast      main     14 calls,  8,200 / 2,100 tokens, $0       (local)

Implementation: pure Lua iteration over ctx.usage_totals; no broker calls. Sort flattens into a list, sorts via table.sort with explicit 3-level comparator: cost desc, model asc, category asc.


7. Pillar 4 — Warning thresholds

Config:

cost = {
    warn_at_dollars = 0.50,    -- emit once when cumulative cost crosses
    warn_at_tokens  = 100000,  -- emit once when cumulative tokens crosses
}

R5 centralizes the check inside a single _record_usage(model, cat, u) helper in repl.lua. This is the ONLY place that calls ctx:add_usage; safety.lua call sites route through it via the helpers.on_usage / opts.on_usage callback. Keeps context.lua decoupled from renderer (no module-coupling violation).

R4: two independent flags (one per threshold) — first-to-fire must NOT suppress the other.

-- repl.lua (sketch):
local function _record_usage(model, category, u)
    ctx:add_usage(model, category, u)
    if not (config.cost) then return end
    ctx.cost_warn_state = ctx.cost_warn_state or { dollars = false, tokens = false }
    local cw = ctx.cost_warn_state
    if config.cost.warn_at_dollars and not cw.dollars then
        local cost = ctx:total_cost()
        if cost >= config.cost.warn_at_dollars then
            -- R10: 6-decimal format for sub-cent visibility
            renderer.status(("session cost $%.6f has crossed warn_at_dollars=$%.6f")
                            :format(cost, config.cost.warn_at_dollars))
            cw.dollars = true
        end
    end
    if config.cost.warn_at_tokens and not cw.tokens then
        local p, c = ctx:total_tokens()
        if (p + c) >= config.cost.warn_at_tokens then
            renderer.status(("session tokens %d has crossed warn_at_tokens=%d")
                            :format(p + c, config.cost.warn_at_tokens))
            cw.tokens = true
        end
    end
end

One-shot per threshold per session. :cost reset clears both totals AND both warn flags atomically.


8. UX Surface Summary

Meta Behavior
:cost One-line summary: calls / tokens / cost
:cost detail Per-model + per-category breakdown
:cost reset Zero out totals + clear warn-fired flag
Config Default Effect
cfg.cost.warn_at_dollars nil Status when cumulative cost first crosses this dollar amount
cfg.cost.warn_at_tokens nil Status when cumulative total tokens first crosses
(broker opts.include_usage) true Adds stream_options.include_usage = true to outbound request

R9 boundary note: :resume <name> reloads turns for conversation continuity but does NOT reconstruct ctx.usage_totals from the per-turn usage fields stored in the session JSONL. After :resume, the cost meter starts fresh from zero for the resumed session's live calls. The historical usage IS in the JSONL for after-the-fact scripting; cross-session aggregation is Q-C2 deferred work.


9. Out of Scope (Phase 7)

  • Cross-session cost persistence — Q-C2 defers <history.dir>/cost.jsonl rollup; v1 is session-only. Per-turn usage IS in the session JSONL for after-the-fact aggregation if anyone wants to script it.
  • Per-model rate limiting / cost caps that REFUSE the call — v1 only warns. A future phase could add a hard cap that aborts before the broker call.
  • Pricing-table fallback for local models — if a local model doesn't emit usage.cost, we record 0. Estimating cost from token count + a static pricing table is a future polish (most users won't care about local "cost" anyway — local is free).
  • Pretty token-bandwidth charts / sparklines — out of scope; the detail breakdown is text-only.
  • Estimated cost for future turns — no preflight cost prediction.
  • MCP tool-call usage — MCP servers don't expose token usage; broker calls invoked DURING MCP tool dispatch ARE captured (because they go through the same path), but the MCP tool call itself isn't.

10. Risks

Risk Mitigation
Some providers reject stream_options -> SSE errors at the top of the stream opts.include_usage = false opt-out per call site; baseline-time probe of the actual hossenfelder broker behavior
OpenRouter cost field shape varies between providers (Bedrock vs. Baidu vs. Together vs. ...) Capture usage.cost as-is (number); document that the same provider must be used for cross-call comparison
Local llama.cpp returns no cost -> displayed $0 could mislead user "is this REALLY free?" :cost detail annotates local lines with (local) literal; summary says cost=$X (cloud only; local: 0)
ctx.usage_totals grows unboundedly with new model names mid-session Bounded by #models in config × #categories — small constants. No mitigation needed.
Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold Acceptable for v1; user can :cost reset to re-arm. Future polish: warn at each Nx multiple.
R8: call_broker fallback retry passes opts.include_usage unchanged Documented assumption: B1 confirmed both backends accept the flag. If a future fallback host rejects, the call-site that knows can pass opts.include_usage = false explicitly.

11. Open Questions (Phase 7)

# Question Impact Resolution target
Q-C1 Provider-without-usage handling A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp.
Q-C2 Cross-session cost persistence (cost.jsonl) Deferred to follow-up phase 8; v1 is session-only.
Q-C3 Categories closed-set vs free-form A4 — free-form; caller decides. Matches Phase 6 helpers/skills convention.
Q-C4 stream_options forwarding by hossenfelder B1 RESOLVED — both backends accept; flag is REQUIRED for local llama.cpp, no-op for cloud. Default-true is correct.
Q-C5 Warn fires on the crossed call or the next A5 — on the crossed call (no UX-defeating delay).
Q-C6 :reset clears cost_warn_fired A6 — no, only :cost reset clears the flag (R8 parity).

12. Phase 7 → Phase 8+ Out-of-band

Candidate follow-ups (non-binding):

  • Phase 8: cross-session cost persistence (Q-C2 deferral), with optional cost dashboards / weekly rollup reporter.
  • Hard rate limits / cost caps that REFUSE the call — an extension of the warn surface that promotes warnings into preflight enforcement.
  • Better tokenization (Q1 deferred-from-Phase-3): replace the char/4 heuristic on Context:estimate_tokens() with model /tokenize calls. Indirectly improves accuracy of any future "preflight cost predictor".

Phase 7 itself is self-contained — no upstream dependencies.


13. Implementation Plan (commit-by-commit)

Bottom-up; broker first (it's the egress point that all callers depend on), then context (the accumulator), then the call-site rewires, then the user-facing meta + warn surface, then config + status bump. Each commit leaves the tree green (existing tests + load smoke + per-commit feature smoke).

Order

  1. broker.lua — usage capture + signature widening.

    • build_request(model_cfg, messages, stream, opts) widened to take an opts table; opts.tools / opts.max_tokens fold in from the existing positional args.
    • R3: TWO internal callers of build_request exist inside broker.lua itself (M.chat_stream at line 65-66 and indirectly via M.chat). Both must be updated in this commit; the migration is CONTAINED but not zero-touch. "Every caller already passes opts" refers to the public surface — internal build_request was positional.
    • Opts.include_usage (default true) adds stream_options.include_usage = true to the request body (per B1, required for local).
    • M.chat_stream event loop adds if doc.usage then final_usage = doc.usage end; after curl.post_sse returns, if final_usage is set, on_delta("usage", payload) is called. Payload includes model = model_cfg.model (caller-stable per B4 + R2), the raw token counts, and cost as a number (nil for local per B3).
    • opts.category passthrough — the broker just echoes it into the emitted usage payload; doesn't validate (per A4 free-form).
    • R1: M.chat (non-streaming wrapper) MUST capture usage in its internal on_delta and return (text, usage). Without this, four out of five non-streaming categories silently report zero. §4 shows the explicit update.
    • Smoke: hand-build a request with stream_options, capture all three on_delta kinds (text, tool_call when applicable, usage), confirm usage payload matches what curl shows. Also smoke broker.chat(...) returns non-nil usage for cloud calls.
  2. context.lua — accumulator + helpers.

    • Context.new: self.usage_totals = {} + self.cost_warn_fired = false.
    • Context:add_usage(model, category, usage) — increments usage_totals[model][category] slots.
    • Context:total_cost() — sums all cost fields across all models/categories.
    • Context:total_tokens() — sums prompt + completion separately.
    • Context:reset — does NOT touch usage_totals or cost_warn_fired (R8 parity with memory_items and project).
    • Smoke: 4-case inline test of add_usage / totals / reset preservation.
  3. repl.lua — wire opts.category + on_delta("usage") at non-Norris call sites. N3: depends on commit 1's R1 M.chat fix shipping first. This commit's "capture the second return value" pattern only works after M.chat actually returns one.

    • _record_usage(model, category, usage) helper (R5) — the single chokepoint that wraps ctx:add_usage AND does the warn check. Replaces all direct ctx:add_usage(...) invocations in repl.lua.
    • call_broker wrapper (used by ask_ai): pass opts.category = "main"; the wrapped on_delta handles kind == "usage" by calling _record_usage(payload.model, payload.category, payload) — keys by payload.model per R2 (handles fallback retry correctly without tracking primary-vs-fallback at the wrapper).
    • DELEGATE: handler: opts.category = "delegate"; capture second return value from broker.chat and feed to _record_usage.
    • :delegate meta: opts.category = "delegate"; same.
    • summarize-on-evict callback: opts.category = "summarize"; same.
    • :memory summarize: opts.category = "memory_summarize"; same.
    • Smoke: send one cloud prompt, observe ctx.usage_totals grows; also smoke the fallback path with a deliberately-broken primary and confirm usage credits the fallback model name (R2 verification).
  4. safety.lua — opts.category for Norris + probe.

    • safety.norris_step's broker.chat_stream call: pass opts.category = "norris". The on_delta wrapper inside safety.lua already widens (post-#52) to handle kind == "text" (rehydration); now also handles kind == "usage" by calling helpers.on_usage(payload.model, payload.category, payload). R5: helpers.on_usage IS repl.lua's _record_usage.
    • N4 signature chain widening: llm_probe, llm_second_opinion, and M.is_destructive all widen to thread opts.on_usage through:
      • llm_probe(model_cfg, system, cmd, opts) — pass opts.category = "probe" to broker.chat; on the (text, usage) return, if opts.on_usage AND usage, call opts.on_usage(usage.model, usage.category, usage).
      • llm_second_opinion(cmd, cfg, opts) — pass opts through to both llm_probe calls (probe 1 + probe 2 re-roll).
      • M.is_destructive(cmd, cfg, opts) — opts.on_usage already in the table from #52's scrub_msgs/rehydrate addition; threads through naturally.
    • Smoke: a Norris session shows both "norris" and "probe" category entries in :cost detail; the probe model is named correctly (e.g. "cloud" if safety.llm_model = "cloud").
  5. repl.lua — :cost meta + warn-threshold + HELP.

    • :cost (summary), :cost detail (per-model+category breakdown), :cost reset (zero totals + clear cost_warn_fired).
    • After every ctx:add_usage call (centralized in a helper if possible), check cfg.cost.warn_at_dollars / warn_at_tokens; emit one-shot status if crossed AND cost_warn_fired is false.
    • HELP gains 3 lines for :cost.
    • Smoke: :cost shows totals; :cost detail breaks down; warn fires once when threshold crossed; :cost reset re-arms.
  6. config.lua example block + docs/PHASE7.md status bump.

    • Commented-out cost = { warn_at_dollars = 0.50, warn_at_tokens = 100000 } block in config.lua.
    • N5: PHASE0.md §11 amendment is already in tree (committed at 3bad07b with the formulate doc). Commit 6 must NOT re-apply.
    • PHASE7.md status header → Implement (matches Phase 5/6 cadence — manifest tracks implementation state).

Risk index per commit

Commit Risk Mitigation
1 (broker) R3: build_request has TWO INTERNAL callers in broker.lua; both must be updated in this commit Explicit in commit-1 note above; grep build_request\( to confirm
1 (broker) R1: M.chat must capture usage in on_delta and return (text, usage) §4 shows the explicit M.chat update; smoke test verifies non-nil usage on cloud call
1 (broker) M.chat second return value confuses callers that do local r = broker.chat(...) discarding the second Lua doesn't error on dropped return values; backward-compat preserved automatically
2 (context) usage_totals nil on old ctx serializations Defensive self.usage_totals = self.usage_totals or {} in add_usage; no migration needed
3 (repl wires) Forgetting one call site = silent under-count Lint by grep for broker.chat\( and broker.chat_stream\( after the wire commit; ensure each is tagged with opts.category
3 (repl wires) R2: fallback retry credits usage to wrong model wrapped on_delta keys by payload.model (set inside broker per R2), NOT by outer model_name; smoke a deliberately-broken-primary case
4 (safety wires) safety.lua must NOT introduce new module dep Use helpers.on_usage callback convention (matches #52's scrub_msgs)
4 (safety wires) N4: llm_probe → llm_second_opinion → is_destructive signature chain widening Spelled out in commit-4 note above
5 (:cost + warn) warn fires multiple times when threshold is much exceeded by one call per-threshold one-shot flag in ctx.cost_warn_state; explicit :cost reset to re-arm both
5 (:cost + warn) R4: single shared flag covers two thresholds RESOLVED — split into cost_warn_state.dollars + .tokens
6 (config + status) N5: PHASE0 §11 already amended at 3bad07b This commit does NOT re-apply the amendment

Tests + smoke per commit

Each commit:

  • Pass luajit test_safety.lua (87/87) and luajit test_router_model.lua (31/31)
  • Load cleanly via luajit -e 'package.path=...; require("repl"); print("ok")'
  • Pass a per-feature smoke (described in each row above)

Things deliberately NOT split

  • broker.chat backward-compat shim — Lua's multiple-return-values semantics handle it automatically (existing local r = broker.chat(..) drops the new usage value).
  • Per-category sub-tables — flat model -> category -> counters is simple enough; nesting deeper for e.g. timestamps is v2.
  • Cross-session persistence — explicitly Q-C2 deferred to phase 8.

Open at plan-time (resolve at implement)

  • Whether safety.is_destructive's opts should carry on_usage callback explicitly OR thread through cfg.helpers (the latter matches the Norris helpers convention but is more coupling). Decide at commit 4. Default to explicit opts.on_usage for minimum surface.
  • Whether to emit a [aish] usage: model=X prompt=N completion=M cost=$X status line PER TURN (verbose mode) or only via :cost on demand. v1 = on demand only; verbose mode is a follow-up nice-to-have.