docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs

Sonnet-reviewed (per the reviews-use-sonnet feedback memory). BLOCKERs (RESOLVED in-place): R1. M.chat would silently return (text, nil) for ALL non-streaming callers — 4 of 5 categories (summarize/delegate/memory_summarize/ probe) flow through broker.chat, NOT chat_stream. §4 now shows the explicit M.chat update that captures kind=="usage" alongside "text" and returns (text, usage). R2. call_broker fallback retry would credit usage to the wrong model name. Fix: broker emits payload.model = model_cfg.model (which IS the fallback's name when called with fb_cfg — chat_stream's upvar). Wrapper keys by payload.model, NOT outer model_name. §4 + §13 commit 3 reflect. R3. build_request has TWO internal callers inside broker.lua itself, not just the public surface. Plan §13 commit 1 risk row now spells this out explicitly so the implementer doesn't read "every caller already passes opts" as "external-only". CONCERNs (FOLDED): R4. Single cost_warn_fired flag covers two thresholds — first-to-fire suppresses the other. Split into ctx.cost_warn_state = { dollars = false, tokens = false }; :cost reset clears both. §7 + §13. R5. Warn-check centralization — single _record_usage helper in repl.lua wraps ctx:add_usage AND does threshold check. safety.lua routes via helpers.on_usage / opts.on_usage callbacks. context.lua stays decoupled from renderer. R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains `is_local = true` (sticky) when ANY recorded usage had cost==nil. `:cost detail` annotation comes from is_local flag, not a fragile cost==0 heuristic. R7. :cost detail sort needs 3-level deterministic key: (cost desc, model asc, category asc) — table.sort is unstable. R8. call_broker fallback passes opts.include_usage unchanged. Documented as known assumption (B1 confirms both backends accept; future-broken fallback can pass include_usage=false). R9. :resume does NOT restore historical usage_totals. Per-turn usage IS in session JSONL for scripting; cross-session aggregation is Q-C2 deferred. Documented in §8. R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000). Widened to $%.6f in §6 + §7 warn message format. NITs (APPLIED): N1. §4 pseudocode comment notes `if doc.usage` branch is independent of choice branch (handles both B2 emission shapes). N2. §2 stale "B7" reference corrected to B3. N3. §13 commit 3 row gains explicit dependency note on commit 1's R1. N4. §13 commit 4 spells out llm_probe -> llm_second_opinion -> M.is_destructive signature chain widening. N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree (3bad07b); commit 6 must NOT re-apply. PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for implementation phase pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:55:59 +00:00
parent 0f14dc1727
commit d4c20f09df
1 changed files with 275 additions and 57 deletions
@@ -2,9 +2,106 @@

 **Project:** aish — AI-augmented conversational shell
 **Document:** Phase 7 Requirements, Architecture & Design Decisions
-**Status:** Plan (formulate + analyze + baseline complete; tree at `2244a3f`)
+**Status:** Plan + review fold-in (tree at `0f14dc1`)
 **Date:** 2026-05-16

+**Review findings (independent Sonnet agent, 2026-05-16) — 3 BLOCKERs
+resolved in-place, 6 CONCERNs folded, 5 NITs applied:**
+
+R1 (BLOCKER, RESOLVED). **`M.chat` would silently return `(text, nil)`
+    for ALL non-streaming callers.** `M.chat`'s internal on_delta only
+    captures `kind == "text"`. Without explicit handling of
+    `kind == "usage"`, four out of five categories that go through
+    `broker.chat` (summarize / delegate / memory_summarize / probe)
+    would report zero usage even after a cloud round-trip. **Fix
+    folded into §4 + §13 commit 1:** M.chat's on_delta also captures
+    the usage payload and returns it as the second value.
+
+R2 (BLOCKER, RESOLVED). **`call_broker` fallback retry — usage
+    payload's `model` field credits the WRONG model name.** The
+    `wrapped` on_delta in call_broker is closed over the PRIMARY's
+    name; if the wrapped function uses an outer-scope `model_name`
+    variable to key the accumulator, the fallback's usage gets
+    misattributed. **Resolution:** the broker emits `payload.model =
+    model_cfg.model` (which IS the fallback's model when called with
+    `fb_cfg` — chat_stream's local upvar). The wrapper keys by
+    `payload.model`, NOT by the outer `model_name`. Documented in
+    §4 emission code + §13 commit 3 (wrapped on_delta uses
+    `payload.model` for accumulator keying).
+
+R3 (BLOCKER, RESOLVED — promoted to docs). **`build_request` has
+    TWO internal callers inside broker.lua itself**, not just the
+    public surface. Migration is contained but both internal sites
+    must be updated in commit 1. Plan §13 commit 1 risk row updated
+    to call this out explicitly so the implementer doesn't read
+    "every caller already passes opts" as "only external callers
+    need touching".
+
+R4 (CONCERN, FOLDED). **Single `cost_warn_fired` flag for two
+    thresholds is broken.** When both warn_at_dollars AND
+    warn_at_tokens are configured, the first-to-fire suppresses the
+    other. **Fix:** `ctx.cost_warn_fired` becomes `ctx.cost_warn_state
+    = { dollars = false, tokens = false }`. Each threshold has its
+    own flag; `:cost reset` clears both. §7 pseudocode updated.
+
+R5 (CONCERN, FOLDED). **Warn-check centralization decided:** use a
+    single `_record_usage(model, category, usage)` helper inside
+    repl.lua that wraps `ctx:add_usage` AND does the threshold check
+    AND calls renderer.status when crossed. `context.lua` stays
+    decoupled from `renderer`. safety.lua call sites get
+    `helpers.on_usage = _record_usage` in the helpers table; probe
+    callsite gets `opts.on_usage = _record_usage`. Single chokepoint
+    for the warn check. §3 + §7 + §13 commits 3-5 reflect.
+
+R6 (CONCERN, FOLDED). **`nil` vs `0` cost distinction must be
+    preserved at the accumulator level.** Local-model `$0` (no cost
+    field) vs cloud-call-that-happens-to-cost-zero need to be
+    distinguishable for `:cost detail` annotation. **Fix:** accumulator
+    slot gains `is_local = true` when ANY recorded usage for that
+    slot had `cost == nil`. Cloud calls with `cost = 0` (rare) stay
+    annotated as cloud. §5 pseudocode + §6 annotation logic updated.
+
+R7 (CONCERN, FOLDED). **`:cost detail` sort needs three-level key
+    for determinism.** Lua's `table.sort` is unstable; equal-cost
+    rows would have arbitrary order. **Fix:** sort key is
+    `(cost desc, model asc, category asc)`. §6 updated.
+
+R8 (CONCERN, FOLDED). **`call_broker` fallback passes `opts.include_usage`
+    unchanged.** Documented as a known assumption (B1 confirms both
+    backends accept; if a future fallback host rejects, the call-site
+    can pass `include_usage = false` explicitly). §10 risk row added.
+
+R9 (CONCERN, FOLDED). **`:resume` does NOT restore historical
+    `usage_totals`.** Per-turn usage IS in the session JSONL but
+    `:resume` reloads turns for conversation continuity only; the
+    accumulator stays empty. Documented in §8 surface notes; users
+    who want cross-session totals can script the jsonl or wait for
+    the deferred Q-C2 follow-up.
+
+R10 (CONCERN, FOLDED). **`$%.4f` loses sub-cent precision.** A
+     `0.000028` cloud cost displays as `$0.0000` — indistinguishable
+     from `$0` local. **Fix:** format strings widened to `$%.6f` in
+     §6 (and the warn message in §7). 6 decimal places accommodates
+     the smallest observed real cost.
+
+R-N1..N5 (NITs, APPLIED):
+
+  N1. §4 extraction pseudocode gains a comment noting the
+      `if doc.usage` branch is INDEPENDENT of the choice branch and
+      must be checked regardless of choice nil-ness (handles both
+      B2 emission shapes).
+  N2. §2 "Cost extraction" row referenced stale "B7"; corrected to B3.
+  N3. §13 commit 3 row gains an explicit dependency note: commit 3's
+      "capture the new second return value" requires commit 1's M.chat
+      fix from R1 to ship first.
+  N4. §3 safety.lua row + §13 commit 4 row spell out the signature
+      chain: `llm_probe` → `llm_second_opinion` → `M.is_destructive`
+      all widen to thread `opts.on_usage` through.
+  N5. §3 PHASE0.md row + §13 commit 6 row — the PHASE0 §11 amendment
+      is ALREADY in tree (committed at `3bad07b` with the formulate
+      doc). Commit 6 should NOT re-apply; only adds config.lua block
+      + bumps PHASE7 status header.
+
 **Analyze findings (2026-05-16):**

 A1. **broker.chat_stream surface is clean for the extension.** The
@@ -168,7 +265,7 @@ Four pillars:
 | Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.) |
 | Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. |
 | Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). |
-| Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). |
+| Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record as nil (R6 — preserves the local-vs-cloud-zero distinction in the accumulator). | Single field name across observed providers per baseline B3. |
 | Cost precision | Store as `number` (Lua double = 53-bit mantissa, ~15 decimal digits — plenty for sub-cent precision) | No floating-point cumulative-error concerns at this scale. |
 | Warning trigger | First crossing of either threshold emits a single status: `[aish] session cost $X.XXXX has crossed warn_at_dollars=$Y.YYYY`. Crossed-flag stored on ctx; reset only on session end / `:cost reset`. | One-shot to avoid spamming. |
 | `:reset` interaction | `:reset` does NOT clear `ctx.usage_totals` (parity with `memory_items`/`project`) — the user reset their conversation, not their cost tracking. `:cost reset` is the explicit reset verb. | Matches R8 invariant from Phase 6. |
@@ -186,7 +283,7 @@ Four pillars:
 | `safety.lua` | norris_step + is_destructive | Pass `opts.category = "norris"` (for the main chat_stream call) and `"probe"` (for the is_destructive LLM probe). Surfaces probe-cost in the breakdown — useful since `safety.llm_model = "cloud"` is the recommended setting. |
 | `history.lua` | session.log_turn appends JSONL entries | log_turn already takes turn opaquely; assistant turns will carry `usage` if present and it'll serialize via dkjson. No code change unless filter desired. |
 | `config.lua` | example blocks for mcp/safety/memory/routing/secrets/hooks/project | Add commented-out `cost = { warn_at_dollars, warn_at_tokens }` block. |
-| `docs/PHASE0.md` | §11 lists phases 0-6 | **Amendment**: add Phase 7 row to §11. |
+| `docs/PHASE0.md` | §11 lists phases 0-6 | Amendment landed at `3bad07b` (formulate commit). N5: commit 6 does NOT re-apply. |

 No new module files.

@@ -222,26 +319,69 @@ local final_usage = nil

 local function on_event(data)
    ...
+    -- N1: this branch is INDEPENDENT of the choice branch below;
+    -- check unconditionally. Per B2, local emits usage on a
+    -- choices=[] chunk (choice nil); cloud emits on a non-empty
+    -- choices chunk (with finish_reason). Both shapes funnel here.
    if doc.usage then
-        -- Provider sent usage; capture for emission after the stream.
+        -- R2: payload.model is ALWAYS the caller-stable model_cfg.model
+        -- (chat_stream's local upvar). When called via call_broker's
+        -- fallback retry, this naturally reflects the fallback's
+        -- model name — wrapper callers can key by payload.model
+        -- without tracking primary-vs-fallback themselves.
        final_usage = {
            prompt_tokens     = doc.usage.prompt_tokens or 0,
            completion_tokens = doc.usage.completion_tokens or 0,
            total_tokens      = doc.usage.total_tokens or 0,
+            -- R6: keep nil-vs-0 distinction at this layer; the
+            -- accumulator decides how to tag local-vs-cloud-zero.
            cost              = doc.usage.cost,   -- nil for local
-            model             = doc.model or model_cfg.model,
+            model             = model_cfg.model,  -- caller-stable per B4
+            category          = opts.category or "main",
        }
        -- Don't emit yet — the [DONE] event marks stream end; emit
        -- once we exit the curl.post_sse loop so the caller sees
        -- usage as the LAST event in the stream order.
    end
-    -- ... existing text + tool_call handling ...
+    -- ... existing text + tool_call handling (unchanged) ...
 end

-- After curl.post_sse returns (stream complete):
+-- After curl.post_sse returns (stream complete). R3-related:
+-- only emit on successful streams; transport / api errors skip
+-- the usage event (caller sees the error path and accumulator
+-- stays unchanged).
+if api_err then return nil, "api: " .. api_err end
+if not ok    then return nil, "transport: " .. tostring(err) end
 if final_usage then on_delta("usage", final_usage) end
+return true
 ```

+### `M.chat` capture (R1 — BLOCKER fix)
+
+`M.chat` is the non-streaming buffering wrapper. Its existing on_delta
+only captured text. Under Phase 7 it MUST also capture the usage
+payload — otherwise EVERY non-streaming caller (summarize, delegate,
+memory_summarize, probe — 4 of 5 categories) silently reports zero.
+
+```lua
+function M.chat(model_cfg, messages, opts)
+    local parts        = {}
+    local captured_usage  -- R1: required so M.chat returns (text, usage)
+    local ok, err = M.chat_stream(model_cfg, messages,
+        function(kind, payload)
+            if kind == "text"  then parts[#parts + 1] = payload
+            elseif kind == "usage" then captured_usage = payload
+            end
+        end, opts)
+    if not ok then return nil, err end
+    return table.concat(parts), captured_usage
+end
+```
+
+Existing callers that do `local r = broker.chat(...)` automatically
+drop the second value (Lua semantics). Callers that want usage do
+`local r, u = broker.chat(...)`.
+
 ### Outbound include_usage

 ```lua
@@ -290,11 +430,21 @@ function Context:add_usage(model, category, u)
    category = category or "main"
    self.usage_totals = self.usage_totals or {}
    local m = self.usage_totals[model] or {}
-    local c = m[category] or { prompt = 0, completion = 0, calls = 0, cost = 0 }
+    local c = m[category] or {
+        prompt = 0, completion = 0, calls = 0, cost = 0,
+        is_local = false,  -- R6: cloud unless any usage came w/o cost
+    }
    c.prompt     = c.prompt     + (u.prompt_tokens or 0)
    c.completion = c.completion + (u.completion_tokens or 0)
    c.calls      = c.calls      + 1
-    c.cost       = c.cost       + (u.cost or 0)
+    -- R6: preserve nil-vs-0 distinction. A `nil` cost means the
+    -- provider doesn't emit cost (i.e., local llama.cpp). Sticky:
+    -- once a slot has seen any nil-cost call, it's flagged is_local.
+    if u.cost == nil then
+        c.is_local = true
+    else
+        c.cost = c.cost + u.cost
+    end
    m[category] = c
    self.usage_totals[model] = m
 end
@@ -337,25 +487,28 @@ cost meter. `:cost reset` is the explicit reset verb for the meter.
 :cost reset                 zero out ctx.usage_totals + cost_warn_fired
 ```

-Summary format:
+Summary format (R10 — 6-decimal precision for sub-cent costs):

 ```
 [aish] session usage: 24 calls, prompt=12,450 / completion=3,210 tokens
-                       cost=$0.0234 (cloud only; local: 0)
+                       cost=$0.023400 (cloud only; local: 0)
 ```

-Detail format (sorted by total cost desc, then by model):
+Detail format (R7 — sort key is `(cost desc, model asc, category asc)`
+for deterministic ordering on equal-cost rows; R6 — annotation comes
+from the slot's `is_local` flag, NOT a `cost == 0` heuristic):

 ```
 [aish] session usage detail:
-  cloud     main      8 calls,  3,850 / 980 tokens,   $0.0180
-  cloud     delegate  1 call,     250 / 80  tokens,   $0.0012
-  cloud     probe     1 call,     150 / 30  tokens,   $0.0042
+  cloud     main      8 calls,  3,850 / 980 tokens,   $0.018000
+  cloud     delegate  1 call,     250 / 80  tokens,   $0.001200
+  cloud     probe     1 call,     150 / 30  tokens,   $0.004200
  fast      main     14 calls,  8,200 / 2,100 tokens, $0       (local)
 ```

 Implementation: pure Lua iteration over `ctx.usage_totals`; no broker
-calls. Sorting uses `table.sort` on a flattened list.
+calls. Sort flattens into a list, sorts via `table.sort` with explicit
+3-level comparator: `cost desc, model asc, category asc`.

 ---

@@ -370,21 +523,44 @@ cost = {
 }
 ```

-After every `ctx:add_usage`, check:
+R5 centralizes the check inside a single `_record_usage(model, cat, u)`
+helper in repl.lua. This is the ONLY place that calls
+`ctx:add_usage`; safety.lua call sites route through it via the
+`helpers.on_usage` / `opts.on_usage` callback. Keeps `context.lua`
+decoupled from `renderer` (no module-coupling violation).
+
+R4: two independent flags (one per threshold) — first-to-fire must
+NOT suppress the other.

 ```lua
-if config.cost and not ctx.cost_warn_fired then
+-- repl.lua (sketch):
+local function _record_usage(model, category, u)
+    ctx:add_usage(model, category, u)
+    if not (config.cost) then return end
+    ctx.cost_warn_state = ctx.cost_warn_state or { dollars = false, tokens = false }
+    local cw = ctx.cost_warn_state
+    if config.cost.warn_at_dollars and not cw.dollars then
        local cost = ctx:total_cost()
-    if config.cost.warn_at_dollars and cost >= config.cost.warn_at_dollars then
-        renderer.status(("session cost $%.4f has crossed warn_at_dollars=$%.4f")
+        if cost >= config.cost.warn_at_dollars then
+            -- R10: 6-decimal format for sub-cent visibility
+            renderer.status(("session cost $%.6f has crossed warn_at_dollars=$%.6f")
                            :format(cost, config.cost.warn_at_dollars))
-        ctx.cost_warn_fired = true
+            cw.dollars = true
+        end
+    end
+    if config.cost.warn_at_tokens and not cw.tokens then
+        local p, c = ctx:total_tokens()
+        if (p + c) >= config.cost.warn_at_tokens then
+            renderer.status(("session tokens %d has crossed warn_at_tokens=%d")
+                            :format(p + c, config.cost.warn_at_tokens))
+            cw.tokens = true
+        end
    end
-    -- (similar for warn_at_tokens; share the flag or use two)
 end
 ```

-One-shot per session. `:cost reset` clears the flag.
+One-shot per threshold per session. `:cost reset` clears both
+totals AND both warn flags atomically.

 ---

@@ -402,6 +578,13 @@ One-shot per session. `:cost reset` clears the flag.
 | `cfg.cost.warn_at_tokens` | nil | Status when cumulative total tokens first crosses |
 | (broker `opts.include_usage`) | true | Adds `stream_options.include_usage = true` to outbound request |

+R9 boundary note: `:resume <name>` reloads turns for conversation
+continuity but does NOT reconstruct `ctx.usage_totals` from the
+per-turn `usage` fields stored in the session JSONL. After `:resume`,
+the cost meter starts fresh from zero for the resumed session's live
+calls. The historical usage IS in the JSONL for after-the-fact
+scripting; cross-session aggregation is Q-C2 deferred work.
+
 ---

 ## 9. Out of Scope (Phase 7)
@@ -434,6 +617,7 @@ One-shot per session. `:cost reset` clears the flag.
 | Local llama.cpp returns no `cost` -> displayed `$0` could mislead user "is this REALLY free?" | `:cost detail` annotates local lines with `(local)` literal; summary says `cost=$X (cloud only; local: 0)` |
 | `ctx.usage_totals` grows unboundedly with new model names mid-session | Bounded by `#models in config` × `#categories` — small constants. No mitigation needed. |
 | Warn threshold fires once and never again for a long-running session that crosses 2x / 10x the threshold | Acceptable for v1; user can `:cost reset` to re-arm. Future polish: warn at each Nx multiple. |
+| R8: `call_broker` fallback retry passes `opts.include_usage` unchanged | Documented assumption: B1 confirmed both backends accept the flag. If a future fallback host rejects, the call-site that knows can pass `opts.include_usage = false` explicitly. |

 ---

@@ -479,21 +663,30 @@ load smoke + per-commit feature smoke).
 1. **`broker.lua` — usage capture + signature widening.**
   - `build_request(model_cfg, messages, stream, opts)` widened to
     take an opts table; opts.tools / opts.max_tokens fold in from
-     the existing positional args. Opts.include_usage (default true)
-     adds `stream_options.include_usage = true` to the request body
-     (per B1, required for local).
+     the existing positional args.
+   - **R3: TWO internal callers of `build_request` exist inside
+     broker.lua itself** (`M.chat_stream` at line 65-66 and indirectly
+     via `M.chat`). Both must be updated in this commit; the
+     migration is CONTAINED but not zero-touch. "Every caller already
+     passes opts" refers to the public surface — internal `build_request`
+     was positional.
+   - Opts.include_usage (default true) adds `stream_options.include_usage
+     = true` to the request body (per B1, required for local).
   - `M.chat_stream` event loop adds `if doc.usage then final_usage =
     doc.usage end`; after `curl.post_sse` returns, if `final_usage`
     is set, `on_delta("usage", payload)` is called. Payload includes
-     `model = model_cfg.model` (caller-stable per B4), the raw token
-     counts, and `cost` as a number (nil for local per B3).
+     `model = model_cfg.model` (caller-stable per B4 + R2), the raw
+     token counts, and `cost` as a number (nil for local per B3).
   - opts.category passthrough — the broker just echoes it into the
     emitted usage payload; doesn't validate (per A4 free-form).
-   - `M.chat` (the non-streaming wrapper) returns `(text, usage)` —
-     backward-compatible (existing callers ignore the second value).
+   - **R1: `M.chat` (non-streaming wrapper) MUST capture usage in its
+     internal on_delta and return `(text, usage)`. Without this, four
+     out of five non-streaming categories silently report zero.** §4
+     shows the explicit update.
   - Smoke: hand-build a request with stream_options, capture all
     three on_delta kinds (text, tool_call when applicable, usage),
-     confirm usage payload matches what curl shows.
+     confirm usage payload matches what curl shows. Also smoke
+     `broker.chat(...)` returns non-nil usage for cloud calls.

 2. **`context.lua` — accumulator + helpers.**
   - `Context.new`: `self.usage_totals = {}` + `self.cost_warn_fired = false`.
@@ -506,28 +699,47 @@ load smoke + per-commit feature smoke).
   - Smoke: 4-case inline test of add_usage / totals / reset preservation.

 3. **`repl.lua` — wire opts.category + on_delta("usage") at non-Norris call sites.**
+   **N3: depends on commit 1's R1 M.chat fix shipping first.** This
+   commit's "capture the second return value" pattern only works
+   after M.chat actually returns one.
+   - `_record_usage(model, category, usage)` helper (R5) — the single
+     chokepoint that wraps `ctx:add_usage` AND does the warn check.
+     Replaces all direct `ctx:add_usage(...)` invocations in repl.lua.
   - call_broker wrapper (used by ask_ai): pass `opts.category =
-     "main"`; the on_delta wrapper handles `kind == "usage"` by
-     calling `ctx:add_usage(req_name, "main", payload)`.
-   - DELEGATE: handler: opts.category = "delegate".
-   - :delegate meta: opts.category = "delegate".
-   - summarize-on-evict callback: opts.category = "summarize".
-   - :memory summarize: opts.category = "memory_summarize".
-   - For broker.chat callers (non-streaming): capture the new second
-     return value and feed to ctx:add_usage.
-   - Smoke: send one cloud prompt, observe ctx.usage_totals grows.
+     "main"`; the wrapped on_delta handles `kind == "usage"` by
+     calling `_record_usage(payload.model, payload.category, payload)`
+     — keys by **payload.model** per R2 (handles fallback retry
+     correctly without tracking primary-vs-fallback at the wrapper).
+   - DELEGATE: handler: opts.category = "delegate"; capture second
+     return value from broker.chat and feed to `_record_usage`.
+   - :delegate meta: opts.category = "delegate"; same.
+   - summarize-on-evict callback: opts.category = "summarize"; same.
+   - :memory summarize: opts.category = "memory_summarize"; same.
+   - Smoke: send one cloud prompt, observe ctx.usage_totals grows;
+     also smoke the fallback path with a deliberately-broken primary
+     and confirm usage credits the fallback model name (R2 verification).

 4. **`safety.lua` — opts.category for Norris + probe.**
-   - safety.norris_step's broker.chat_stream call: pass opts.category =
-     "norris"; the helpers.on_usage callback (added to the helpers
-     table by repl.lua) routes back to ctx:add_usage. OR — simpler —
-     safety.lua wraps on_delta itself with a "usage"-kind branch that
-     calls helpers.on_usage.
-   - safety.is_destructive's llm_probe broker.chat call: pass
-     opts.category = "probe"; capture the (text, usage) return and
-     forward via opts.on_usage callback (added to is_destructive opts).
+   - safety.norris_step's broker.chat_stream call: pass `opts.category
+     = "norris"`. The on_delta wrapper inside safety.lua already
+     widens (post-#52) to handle `kind == "text"` (rehydration);
+     now also handles `kind == "usage"` by calling
+     `helpers.on_usage(payload.model, payload.category, payload)`.
+     R5: helpers.on_usage IS repl.lua's `_record_usage`.
+   - **N4 signature chain widening**: `llm_probe`, `llm_second_opinion`,
+     and `M.is_destructive` all widen to thread `opts.on_usage` through:
+       - `llm_probe(model_cfg, system, cmd, opts)` — pass `opts.category
+         = "probe"` to broker.chat; on the `(text, usage)` return,
+         if `opts.on_usage` AND `usage`, call `opts.on_usage(usage.model,
+         usage.category, usage)`.
+       - `llm_second_opinion(cmd, cfg, opts)` — pass opts through to
+         both llm_probe calls (probe 1 + probe 2 re-roll).
+       - `M.is_destructive(cmd, cfg, opts)` — opts.on_usage already in
+         the table from #52's scrub_msgs/rehydrate addition; threads
+         through naturally.
   - Smoke: a Norris session shows both "norris" and "probe" category
-     entries in :cost detail.
+     entries in :cost detail; the probe model is named correctly
+     (e.g. "cloud" if safety.llm_model = "cloud").

 5. **`repl.lua` — :cost meta + warn-threshold + HELP.**
   - :cost (summary), :cost detail (per-model+category breakdown),
@@ -542,6 +754,8 @@ load smoke + per-commit feature smoke).
 6. **`config.lua` example block + `docs/PHASE7.md` status bump.**
   - Commented-out `cost = { warn_at_dollars = 0.50, warn_at_tokens
     = 100000 }` block in config.lua.
+   - **N5: PHASE0.md §11 amendment is already in tree** (committed
+     at `3bad07b` with the formulate doc). Commit 6 must NOT re-apply.
   - PHASE7.md status header → **Implement** (matches Phase 5/6
     cadence — manifest tracks implementation state).

@@ -549,13 +763,17 @@ load smoke + per-commit feature smoke).

 | Commit | Risk | Mitigation |
 |---|---|---|
-| 1 (broker) | build_request signature change breaks all existing callers | All callers of chat_stream/chat use opts already; we move tools/max_tokens INTO opts — temporary positional fallback (`opts.tools = old_tools` if positional was used) is unnecessary because every caller already passes opts table |
+| 1 (broker) | R3: build_request has TWO INTERNAL callers in broker.lua; both must be updated in this commit | Explicit in commit-1 note above; grep `build_request\(` to confirm |
+| 1 (broker) | R1: M.chat must capture usage in on_delta and return (text, usage) | §4 shows the explicit M.chat update; smoke test verifies non-nil usage on cloud call |
 | 1 (broker) | `M.chat` second return value confuses callers that do `local r = broker.chat(...)` discarding the second | Lua doesn't error on dropped return values; backward-compat preserved automatically |
 | 2 (context) | usage_totals nil on old ctx serializations | Defensive `self.usage_totals = self.usage_totals or {}` in add_usage; no migration needed |
-| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged |
-| 4 (safety wires) | safety.lua must NOT require("secrets")-style introduce new module dep | Use helpers.on_usage callback convention (same shape as #52's scrub_msgs) — no module dep |
-| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | cost_warn_fired one-shot flag; explicit :cost reset to re-arm |
-| 6 (config + status) | none |  |
+| 3 (repl wires) | Forgetting one call site = silent under-count | Lint by grep for `broker.chat\(` and `broker.chat_stream\(` after the wire commit; ensure each is tagged with opts.category |
+| 3 (repl wires) | R2: fallback retry credits usage to wrong model | wrapped on_delta keys by `payload.model` (set inside broker per R2), NOT by outer `model_name`; smoke a deliberately-broken-primary case |
+| 4 (safety wires) | safety.lua must NOT introduce new module dep | Use helpers.on_usage callback convention (matches #52's scrub_msgs) |
+| 4 (safety wires) | N4: llm_probe → llm_second_opinion → is_destructive signature chain widening | Spelled out in commit-4 note above |
+| 5 (:cost + warn) | warn fires multiple times when threshold is much exceeded by one call | per-threshold one-shot flag in `ctx.cost_warn_state`; explicit :cost reset to re-arm both |
+| 5 (:cost + warn) | R4: single shared flag covers two thresholds | RESOLVED — split into `cost_warn_state.dollars` + `.tokens` |
+| 6 (config + status) | N5: PHASE0 §11 already amended at `3bad07b` | This commit does NOT re-apply the amendment |

 ### Tests + smoke per commit