5 Commits

Author SHA1 Message Date
marfrit 1f34b6dce8 config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE7.md). N5: PHASE0 §11 amendment landed in commit 3bad07b
(formulate); not re-applied here.

config.lua:
  - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block
    with parity to the Phase 1-6 example blocks.
  - Notes warn flags are independent (R4) and per-turn usage flows
    to session/*.jsonl for after-the-fact analysis.

docs/PHASE7.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits inline for traceability:
      7364963  broker: usage capture + opts widening
      7b4a9be  context: accumulator helpers
      8adebd5  repl: _record_usage + opts.category at 5 sites
      b30212a  safety + repl: opts.category for Norris + probe
      0d6ff93  repl: :cost meta surface
      this     config example + status bump

Phase 7 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:55 +00:00
marfrit d4c20f09df docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs
Sonnet-reviewed (per the reviews-use-sonnet feedback memory).

BLOCKERs (RESOLVED in-place):

R1. M.chat would silently return (text, nil) for ALL non-streaming
    callers — 4 of 5 categories (summarize/delegate/memory_summarize/
    probe) flow through broker.chat, NOT chat_stream. §4 now shows
    the explicit M.chat update that captures kind=="usage" alongside
    "text" and returns (text, usage).

R2. call_broker fallback retry would credit usage to the wrong model
    name. Fix: broker emits payload.model = model_cfg.model (which IS
    the fallback's name when called with fb_cfg — chat_stream's
    upvar). Wrapper keys by payload.model, NOT outer model_name. §4
    + §13 commit 3 reflect.

R3. build_request has TWO internal callers inside broker.lua itself,
    not just the public surface. Plan §13 commit 1 risk row now
    spells this out explicitly so the implementer doesn't read "every
    caller already passes opts" as "external-only".

CONCERNs (FOLDED):

R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
    suppresses the other. Split into ctx.cost_warn_state = { dollars
    = false, tokens = false }; :cost reset clears both. §7 + §13.

R5. Warn-check centralization — single _record_usage helper in
    repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
    routes via helpers.on_usage / opts.on_usage callbacks. context.lua
    stays decoupled from renderer.

R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
    `is_local = true` (sticky) when ANY recorded usage had cost==nil.
    `:cost detail` annotation comes from is_local flag, not a
    fragile cost==0 heuristic.

R7. :cost detail sort needs 3-level deterministic key:
    (cost desc, model asc, category asc) — table.sort is unstable.

R8. call_broker fallback passes opts.include_usage unchanged.
    Documented as known assumption (B1 confirms both backends
    accept; future-broken fallback can pass include_usage=false).

R9. :resume does NOT restore historical usage_totals. Per-turn usage
    IS in session JSONL for scripting; cross-session aggregation is
    Q-C2 deferred. Documented in §8.

R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
     Widened to $%.6f in §6 + §7 warn message format.

NITs (APPLIED):

N1. §4 pseudocode comment notes `if doc.usage` branch is independent
    of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
    M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
    (3bad07b); commit 6 must NOT re-apply.

PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for
implementation phase pending user gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:55:59 +00:00
marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:50:39 +00:00
marfrit f0bccdec48 docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at 3bad07b probed).

11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline):

A1.  broker.chat_stream surface clean — usage capture via closure-local
     + on_delta("usage") emission after curl.post_sse returns.
A2.  7 caller sites for opts.category threading (probe / norris /
     summarize / main / delegate x2 / memory_summarize).
A3.  build_request signature widens to (model_cfg, msgs, stream, opts)
     to absorb tools / max_tokens / include_usage / stream_options
     without further positional growth.
A4.  Q-C3 RESOLVED: free-form categories (caller decides); matches
     Phase 6 helpers/skills convention.
A5.  Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call
     delay).
A6.  Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only
     :cost reset clears.
A7.  Norris call-graph rewires (commit 955bd82) — secrets streaming
     rehydrator wraps only "text" kind; new "usage" kind passes
     through unchanged. No new entanglement.
A8.  ctx.usage_totals survives :reset (R8 parity with memory_items,
     project).
A9.  Session JSONL inherits the new field automatically (dkjson
     opaque encoding).
A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage.
     Real probe required for local model — baseline action.
A11. Q-C4 deferred to baseline (real broker probe).

§2 build_request row updated to mention the A3 refactor.
§11 Open Qs table now shows all 6 with resolutions; only Q-C4
remains as a baseline-time probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:03 +00:00
marfrit 3bad07b2da docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).

Four pillars:

  1. Usage capture in broker.chat_stream — extract `usage` from the
     final SSE chunk (OpenAI streaming spec with `stream_options:
     {include_usage: true}`). Surface via new on_delta("usage",
     payload) kind. broker.chat returns (text, usage) — backward-
     compat: existing callers ignore the second value.

  2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
     tables (categories: main / delegate / summarize / memory_summarize
     / probe / norris, tagged at the call site via opts.category).
     :reset preserves usage_totals (R8 parity with memory_items /
     project). Session JSONL gains an optional `usage` field on
     assistant turns for after-the-fact analysis.

  3. :cost meta surface — :cost (summary), :cost detail (per-model +
     per-category breakdown), :cost reset (zero the meter). Pure-Lua
     read of ctx.usage_totals; no broker calls.

  4. Optional warn thresholds — cfg.cost.warn_at_dollars /
     warn_at_tokens emit a one-shot status when crossed. Default off;
     useful with cloud presets configured.

Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.

Open at formulate:
  Q-C1 — provider-without-usage handling (local llama.cpp probably)
  Q-C2 — cross-session persistence (defer to phase 8)
  Q-C3 — categories closed-set vs free-form
  Q-C4 — does hossenfelder forward stream_options to all backends?
  Q-C5 — warn fires on the call that crosses, or the next one?
  Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?

Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:47:58 +00:00