docs/PHASE7: analyze — probe broker surface + resolve Qs in-place

Status: Formulate -> Analyze (tree at 3bad07b probed).

11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline):

A1.  broker.chat_stream surface clean — usage capture via closure-local
     + on_delta("usage") emission after curl.post_sse returns.
A2.  7 caller sites for opts.category threading (probe / norris /
     summarize / main / delegate x2 / memory_summarize).
A3.  build_request signature widens to (model_cfg, msgs, stream, opts)
     to absorb tools / max_tokens / include_usage / stream_options
     without further positional growth.
A4.  Q-C3 RESOLVED: free-form categories (caller decides); matches
     Phase 6 helpers/skills convention.
A5.  Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call
     delay).
A6.  Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only
     :cost reset clears.
A7.  Norris call-graph rewires (commit 955bd82) — secrets streaming
     rehydrator wraps only "text" kind; new "usage" kind passes
     through unchanged. No new entanglement.
A8.  ctx.usage_totals survives :reset (R8 parity with memory_items,
     project).
A9.  Session JSONL inherits the new field automatically (dkjson
     opaque encoding).
A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage.
     Real probe required for local model — baseline action.
A11. Q-C4 deferred to baseline (real broker probe).

§2 build_request row updated to mention the A3 refactor.
§11 Open Qs table now shows all 6 with resolutions; only Q-C4
remains as a baseline-time probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 22:49:03 +00:00
parent 3bad07b2da
commit f0bccdec48
+98 -8
View File
@@ -2,9 +2,99 @@
**Project:** aish — AI-augmented conversational shell **Project:** aish — AI-augmented conversational shell
**Document:** Phase 7 Requirements, Architecture & Design Decisions **Document:** Phase 7 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analyze) **Status:** Analyze (formulate complete; tree at `3bad07b` probed)
**Date:** 2026-05-16 **Date:** 2026-05-16
**Analyze findings (2026-05-16):**
A1. **broker.chat_stream surface is clean for the extension.** The
existing `on_event(data)` closure inside `M.chat_stream` already
parses `doc.error` / `doc.choices` / `delta` / tool_calls — adding
`if doc.usage then final_usage = ... end` is one block. Emission
happens via a closure-local `final_usage` that the post-loop code
in `chat_stream` reads and calls `on_delta("usage", final_usage)`
on. `build_request` needs minor extension OR (cleaner) `chat_stream`
inserts `stream_options.include_usage = true` into the body table
AFTER `json.encode` — but we currently encode in `build_request`.
Cleanest: extend `build_request(model_cfg, messages, stream, opts)`
so it can read `opts.include_usage`. Phase 7 simplifies the
signature in passing.
A2. **7 caller sites** identified for `opts.category` threading:
| Site | Category |
|---|---|
| `safety.lua:191` (LLM probe) | `"probe"` |
| `safety.lua:354` (norris main) | `"norris"` |
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
| `repl.lua:2156` (:delegate meta) | `"delegate"` |
All callers pass `opts` already; adding a `category` field is
additive and backward-compatible (default to `"main"` when absent).
A3. **`build_request` signature simplification.** Today it takes
`(model_cfg, messages, stream, tools, max_tokens)` — five positional
args. With Phase 7 needing `include_usage` AND `stream_options`,
positional growth gets unwieldy. **Resolution:** widen to
`(model_cfg, messages, stream, opts)` where opts carries
`{tools, max_tokens, include_usage, stream_options}`. Callers in
`M.chat_stream` and `M.chat` pass their existing opts table through.
This is a refactor but contained inside broker.lua.
A4. **Q-C3 RESOLVED: free-form categories.** The closed-set vs free-form
debate resolved in favor of free-form per the helpers/skills
convention already in place (Phase 6 :tree / :diff metas don't
validate sub-args either). `:cost detail` will show whatever
categories appear — small + documented closed set in practice
(7 entries from A2), no surprise.
A5. **Q-C5 RESOLVED: warn fires on the call that crossed.** The crossed
call's usage IS in the accumulator at the moment we check (we
check AFTER `add_usage`). Firing on the NEXT call would mean a
delay of one full broker round-trip before the user sees the
warn — defeats the purpose. Just emit-on-cross.
A6. **Q-C6 RESOLVED: `:reset` does NOT clear `cost_warn_fired`.**
Parity with `usage_totals` itself (per the §2 decision row); the
user reset their conversation, not their cost meter. The flag
AND the totals are reset only by the explicit `:cost reset` verb.
A7. **Norris call-graph rewires (existing safety.lua:354 path):** with
issue #52 wired (commit `955bd82`), the Norris broker call now
passes `helpers.scrub_msgs` / `helpers.streaming_rehydrator`. The
on_delta wrapping pattern means I need to be careful that the new
`("usage", payload)` kind also flows through any wrapper. Since
secrets streaming_rehydrator only matches on `kind == "text"`, the
"usage" kind passes through unchanged. No new entanglement.
A8. **`ctx.usage_totals` survives `:reset` per R8** — same invariant
as `memory_items` (Phase 4) and `project` (Phase 6). Documented in
§5 of the manifest; reinforces the "ambient context survives
conversation reset" rule.
A9. **Session JSONL serialization** — assistant turn dict gets an
optional `usage` field. `history.lua` log_turn currently calls
`json.encode(turn)` opaquely; the dkjson serializer handles nested
tables. No code change needed; the new field flows through
automatically when the assistant turn carries one.
A10. **Q-C1 PARTIAL: local providers may not emit `usage`.** The
formulate-time assumption was "treat absence as zero-cost / unknown".
A real probe against `qwen-coder-7b-snappy-8k` is a baseline
action — see B-probes below. The implementation will be defensive:
if `doc.usage` never appears in the stream, no "usage" event is
emitted, and the accumulator is unchanged for that turn. `:cost`
output naturally reflects "0 calls counted for local model" if
that's the case.
A11. **Q-C4 deferred to baseline**: actual `stream_options` forwarding
by the hossenfelder proxy must be probed against a live broker.
If the proxy strips the option, we get no `usage` events even
for cloud calls. Baseline action.
PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest
specifies what Phase 7 adds — **cost / usage observability**: the ability specifies what Phase 7 adds — **cost / usage observability**: the ability
to know, mid-session, how many tokens you've spent and how much money the to know, mid-session, how many tokens you've spent and how much money the
@@ -75,7 +165,7 @@ Four pillars:
|---|---|---| |---|---|---|
| Where to extract usage | In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk | The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). | | Where to extract usage | In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk | The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). |
| New on_delta kind | `on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })` | Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible. | | New on_delta kind | `on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })` | Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible. |
| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. | | Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.) |
| Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. | | Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. |
| Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). | | Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). |
| Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). | | Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). |
@@ -351,12 +441,12 @@ One-shot per session. `:cost reset` clears the flag.
| # | Question | Impact | Resolution target | | # | Question | Impact | Resolution target |
|---|---|---|---| |---|---|---|---|
| Q-C1 | What to do when a provider doesn't emit `usage` (local llama.cpp, some misconfigured proxies)? Record zero / silently skip / estimate via char/4? | Accumulator entries for those models | Analyze (probe what hossenfelder actually returns per model class) | | Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
| Q-C2 | Should `<history.dir>/cost.jsonl` accumulate cross-session totals? `:cost --all-time` reporter? | New file + persistence layer | Defer to follow-up phase (v1 = session only) | | Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
| Q-C3 | Categories vs free-form tags — should `opts.category` be free-form (caller decides) or validated against a closed set? | Predictability of `:cost detail` output | Analyze | | Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
| Q-C4 | Does the hossenfelder broker forward `stream_options` to all backends? Some backends may strip it. | include_usage default | Baseline (real probe) | | Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. |
| Q-C5 | Should `cfg.cost.warn_at_dollars` trigger on the FIRST broker call that crosses, or on the NEXT one (so user sees the call complete before the warn)? | UX detail | Analyze | | Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
| Q-C6 | When `:reset` is called, should `cost_warn_fired` be cleared too (so the next warn-cross fires again)? Or only clear via `:cost reset`? | Reset surface area | Analyze | | Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
--- ---