From f0bccdec48636248b437e0abfcf3f2ea875caaaf Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 22:49:03 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE7:=20analyze=20=E2=80=94=20probe=20br?= =?UTF-8?q?oker=20surface=20+=20resolve=20Qs=20in-place?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Status: Formulate -> Analyze (tree at 3bad07b probed). 11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline): A1. broker.chat_stream surface clean — usage capture via closure-local + on_delta("usage") emission after curl.post_sse returns. A2. 7 caller sites for opts.category threading (probe / norris / summarize / main / delegate x2 / memory_summarize). A3. build_request signature widens to (model_cfg, msgs, stream, opts) to absorb tools / max_tokens / include_usage / stream_options without further positional growth. A4. Q-C3 RESOLVED: free-form categories (caller decides); matches Phase 6 helpers/skills convention. A5. Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call delay). A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only :cost reset clears. A7. Norris call-graph rewires (commit 955bd82) — secrets streaming rehydrator wraps only "text" kind; new "usage" kind passes through unchanged. No new entanglement. A8. ctx.usage_totals survives :reset (R8 parity with memory_items, project). A9. Session JSONL inherits the new field automatically (dkjson opaque encoding). A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage. Real probe required for local model — baseline action. A11. Q-C4 deferred to baseline (real broker probe). §2 build_request row updated to mention the A3 refactor. §11 Open Qs table now shows all 6 with resolutions; only Q-C4 remains as a baseline-time probe. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE7.md | 106 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 98 insertions(+), 8 deletions(-) diff --git a/docs/PHASE7.md b/docs/PHASE7.md index 0bc1746..110e81f 100644 --- a/docs/PHASE7.md +++ b/docs/PHASE7.md @@ -2,9 +2,99 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 7 Requirements, Architecture & Design Decisions -**Status:** Formulate (pre-analyze) +**Status:** Analyze (formulate complete; tree at `3bad07b` probed) **Date:** 2026-05-16 +**Analyze findings (2026-05-16):** + +A1. **broker.chat_stream surface is clean for the extension.** The + existing `on_event(data)` closure inside `M.chat_stream` already + parses `doc.error` / `doc.choices` / `delta` / tool_calls — adding + `if doc.usage then final_usage = ... end` is one block. Emission + happens via a closure-local `final_usage` that the post-loop code + in `chat_stream` reads and calls `on_delta("usage", final_usage)` + on. `build_request` needs minor extension OR (cleaner) `chat_stream` + inserts `stream_options.include_usage = true` into the body table + AFTER `json.encode` — but we currently encode in `build_request`. + Cleanest: extend `build_request(model_cfg, messages, stream, opts)` + so it can read `opts.include_usage`. Phase 7 simplifies the + signature in passing. + +A2. **7 caller sites** identified for `opts.category` threading: + + | Site | Category | + |---|---| + | `safety.lua:191` (LLM probe) | `"probe"` | + | `safety.lua:354` (norris main) | `"norris"` | + | `repl.lua:326` (summarize-on-evict) | `"summarize"` | + | `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` | + | `repl.lua:1104` (DELEGATE: handler) | `"delegate"` | + | `repl.lua:1587` (:memory summarize) | `"memory_summarize"` | + | `repl.lua:2156` (:delegate meta) | `"delegate"` | + + All callers pass `opts` already; adding a `category` field is + additive and backward-compatible (default to `"main"` when absent). + +A3. **`build_request` signature simplification.** Today it takes + `(model_cfg, messages, stream, tools, max_tokens)` — five positional + args. With Phase 7 needing `include_usage` AND `stream_options`, + positional growth gets unwieldy. **Resolution:** widen to + `(model_cfg, messages, stream, opts)` where opts carries + `{tools, max_tokens, include_usage, stream_options}`. Callers in + `M.chat_stream` and `M.chat` pass their existing opts table through. + This is a refactor but contained inside broker.lua. + +A4. **Q-C3 RESOLVED: free-form categories.** The closed-set vs free-form + debate resolved in favor of free-form per the helpers/skills + convention already in place (Phase 6 :tree / :diff metas don't + validate sub-args either). `:cost detail` will show whatever + categories appear — small + documented closed set in practice + (7 entries from A2), no surprise. + +A5. **Q-C5 RESOLVED: warn fires on the call that crossed.** The crossed + call's usage IS in the accumulator at the moment we check (we + check AFTER `add_usage`). Firing on the NEXT call would mean a + delay of one full broker round-trip before the user sees the + warn — defeats the purpose. Just emit-on-cross. + +A6. **Q-C6 RESOLVED: `:reset` does NOT clear `cost_warn_fired`.** + Parity with `usage_totals` itself (per the §2 decision row); the + user reset their conversation, not their cost meter. The flag + AND the totals are reset only by the explicit `:cost reset` verb. + +A7. **Norris call-graph rewires (existing safety.lua:354 path):** with + issue #52 wired (commit `955bd82`), the Norris broker call now + passes `helpers.scrub_msgs` / `helpers.streaming_rehydrator`. The + on_delta wrapping pattern means I need to be careful that the new + `("usage", payload)` kind also flows through any wrapper. Since + secrets streaming_rehydrator only matches on `kind == "text"`, the + "usage" kind passes through unchanged. No new entanglement. + +A8. **`ctx.usage_totals` survives `:reset` per R8** — same invariant + as `memory_items` (Phase 4) and `project` (Phase 6). Documented in + §5 of the manifest; reinforces the "ambient context survives + conversation reset" rule. + +A9. **Session JSONL serialization** — assistant turn dict gets an + optional `usage` field. `history.lua` log_turn currently calls + `json.encode(turn)` opaquely; the dkjson serializer handles nested + tables. No code change needed; the new field flows through + automatically when the assistant turn carries one. + +A10. **Q-C1 PARTIAL: local providers may not emit `usage`.** The + formulate-time assumption was "treat absence as zero-cost / unknown". + A real probe against `qwen-coder-7b-snappy-8k` is a baseline + action — see B-probes below. The implementation will be defensive: + if `doc.usage` never appears in the stream, no "usage" event is + emitted, and the accumulator is unchanged for that turn. `:cost` + output naturally reflects "0 calls counted for local model" if + that's the case. + +A11. **Q-C4 deferred to baseline**: actual `stream_options` forwarding + by the hossenfelder proxy must be probed against a live broker. + If the proxy strips the option, we get no `usage` events even + for cloud calls. Baseline action. + PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest specifies what Phase 7 adds — **cost / usage observability**: the ability to know, mid-session, how many tokens you've spent and how much money the @@ -75,7 +165,7 @@ Four pillars: |---|---|---| | Where to extract usage | In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk | The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). | | New on_delta kind | `on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })` | Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible. | -| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. | +| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.) | | Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. | | Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). | | Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). | @@ -351,12 +441,12 @@ One-shot per session. `:cost reset` clears the flag. | # | Question | Impact | Resolution target | |---|---|---|---| -| Q-C1 | What to do when a provider doesn't emit `usage` (local llama.cpp, some misconfigured proxies)? Record zero / silently skip / estimate via char/4? | Accumulator entries for those models | Analyze (probe what hossenfelder actually returns per model class) | -| Q-C2 | Should `/cost.jsonl` accumulate cross-session totals? `:cost --all-time` reporter? | New file + persistence layer | Defer to follow-up phase (v1 = session only) | -| Q-C3 | Categories vs free-form tags — should `opts.category` be free-form (caller decides) or validated against a closed set? | Predictability of `:cost detail` output | Analyze | -| Q-C4 | Does the hossenfelder broker forward `stream_options` to all backends? Some backends may strip it. | include_usage default | Baseline (real probe) | -| Q-C5 | Should `cfg.cost.warn_at_dollars` trigger on the FIRST broker call that crosses, or on the NEXT one (so user sees the call complete before the warn)? | UX detail | Analyze | -| Q-C6 | When `:reset` is called, should `cost_warn_fired` be cleared too (so the next warn-cross fires again)? Or only clear via `:cost reset`? | Reset surface area | Analyze | +| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. | +| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. | +| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. | +| Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. | +| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). | +| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). | ---