docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at3bad07bprobed). 11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline): A1. broker.chat_stream surface clean — usage capture via closure-local + on_delta("usage") emission after curl.post_sse returns. A2. 7 caller sites for opts.category threading (probe / norris / summarize / main / delegate x2 / memory_summarize). A3. build_request signature widens to (model_cfg, msgs, stream, opts) to absorb tools / max_tokens / include_usage / stream_options without further positional growth. A4. Q-C3 RESOLVED: free-form categories (caller decides); matches Phase 6 helpers/skills convention. A5. Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call delay). A6. Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only :cost reset clears. A7. Norris call-graph rewires (commit955bd82) — secrets streaming rehydrator wraps only "text" kind; new "usage" kind passes through unchanged. No new entanglement. A8. ctx.usage_totals survives :reset (R8 parity with memory_items, project). A9. Session JSONL inherits the new field automatically (dkjson opaque encoding). A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage. Real probe required for local model — baseline action. A11. Q-C4 deferred to baseline (real broker probe). §2 build_request row updated to mention the A3 refactor. §11 Open Qs table now shows all 6 with resolutions; only Q-C4 remains as a baseline-time probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+98
-8
@@ -2,9 +2,99 @@
|
||||
|
||||
**Project:** aish — AI-augmented conversational shell
|
||||
**Document:** Phase 7 Requirements, Architecture & Design Decisions
|
||||
**Status:** Formulate (pre-analyze)
|
||||
**Status:** Analyze (formulate complete; tree at `3bad07b` probed)
|
||||
**Date:** 2026-05-16
|
||||
|
||||
**Analyze findings (2026-05-16):**
|
||||
|
||||
A1. **broker.chat_stream surface is clean for the extension.** The
|
||||
existing `on_event(data)` closure inside `M.chat_stream` already
|
||||
parses `doc.error` / `doc.choices` / `delta` / tool_calls — adding
|
||||
`if doc.usage then final_usage = ... end` is one block. Emission
|
||||
happens via a closure-local `final_usage` that the post-loop code
|
||||
in `chat_stream` reads and calls `on_delta("usage", final_usage)`
|
||||
on. `build_request` needs minor extension OR (cleaner) `chat_stream`
|
||||
inserts `stream_options.include_usage = true` into the body table
|
||||
AFTER `json.encode` — but we currently encode in `build_request`.
|
||||
Cleanest: extend `build_request(model_cfg, messages, stream, opts)`
|
||||
so it can read `opts.include_usage`. Phase 7 simplifies the
|
||||
signature in passing.
|
||||
|
||||
A2. **7 caller sites** identified for `opts.category` threading:
|
||||
|
||||
| Site | Category |
|
||||
|---|---|
|
||||
| `safety.lua:191` (LLM probe) | `"probe"` |
|
||||
| `safety.lua:354` (norris main) | `"norris"` |
|
||||
| `repl.lua:326` (summarize-on-evict) | `"summarize"` |
|
||||
| `repl.lua:685` (call_broker wrapper, used by ask_ai) | `"main"` |
|
||||
| `repl.lua:1104` (DELEGATE: handler) | `"delegate"` |
|
||||
| `repl.lua:1587` (:memory summarize) | `"memory_summarize"` |
|
||||
| `repl.lua:2156` (:delegate meta) | `"delegate"` |
|
||||
|
||||
All callers pass `opts` already; adding a `category` field is
|
||||
additive and backward-compatible (default to `"main"` when absent).
|
||||
|
||||
A3. **`build_request` signature simplification.** Today it takes
|
||||
`(model_cfg, messages, stream, tools, max_tokens)` — five positional
|
||||
args. With Phase 7 needing `include_usage` AND `stream_options`,
|
||||
positional growth gets unwieldy. **Resolution:** widen to
|
||||
`(model_cfg, messages, stream, opts)` where opts carries
|
||||
`{tools, max_tokens, include_usage, stream_options}`. Callers in
|
||||
`M.chat_stream` and `M.chat` pass their existing opts table through.
|
||||
This is a refactor but contained inside broker.lua.
|
||||
|
||||
A4. **Q-C3 RESOLVED: free-form categories.** The closed-set vs free-form
|
||||
debate resolved in favor of free-form per the helpers/skills
|
||||
convention already in place (Phase 6 :tree / :diff metas don't
|
||||
validate sub-args either). `:cost detail` will show whatever
|
||||
categories appear — small + documented closed set in practice
|
||||
(7 entries from A2), no surprise.
|
||||
|
||||
A5. **Q-C5 RESOLVED: warn fires on the call that crossed.** The crossed
|
||||
call's usage IS in the accumulator at the moment we check (we
|
||||
check AFTER `add_usage`). Firing on the NEXT call would mean a
|
||||
delay of one full broker round-trip before the user sees the
|
||||
warn — defeats the purpose. Just emit-on-cross.
|
||||
|
||||
A6. **Q-C6 RESOLVED: `:reset` does NOT clear `cost_warn_fired`.**
|
||||
Parity with `usage_totals` itself (per the §2 decision row); the
|
||||
user reset their conversation, not their cost meter. The flag
|
||||
AND the totals are reset only by the explicit `:cost reset` verb.
|
||||
|
||||
A7. **Norris call-graph rewires (existing safety.lua:354 path):** with
|
||||
issue #52 wired (commit `955bd82`), the Norris broker call now
|
||||
passes `helpers.scrub_msgs` / `helpers.streaming_rehydrator`. The
|
||||
on_delta wrapping pattern means I need to be careful that the new
|
||||
`("usage", payload)` kind also flows through any wrapper. Since
|
||||
secrets streaming_rehydrator only matches on `kind == "text"`, the
|
||||
"usage" kind passes through unchanged. No new entanglement.
|
||||
|
||||
A8. **`ctx.usage_totals` survives `:reset` per R8** — same invariant
|
||||
as `memory_items` (Phase 4) and `project` (Phase 6). Documented in
|
||||
§5 of the manifest; reinforces the "ambient context survives
|
||||
conversation reset" rule.
|
||||
|
||||
A9. **Session JSONL serialization** — assistant turn dict gets an
|
||||
optional `usage` field. `history.lua` log_turn currently calls
|
||||
`json.encode(turn)` opaquely; the dkjson serializer handles nested
|
||||
tables. No code change needed; the new field flows through
|
||||
automatically when the assistant turn carries one.
|
||||
|
||||
A10. **Q-C1 PARTIAL: local providers may not emit `usage`.** The
|
||||
formulate-time assumption was "treat absence as zero-cost / unknown".
|
||||
A real probe against `qwen-coder-7b-snappy-8k` is a baseline
|
||||
action — see B-probes below. The implementation will be defensive:
|
||||
if `doc.usage` never appears in the stream, no "usage" event is
|
||||
emitted, and the accumulator is unchanged for that turn. `:cost`
|
||||
output naturally reflects "0 calls counted for local model" if
|
||||
that's the case.
|
||||
|
||||
A11. **Q-C4 deferred to baseline**: actual `stream_options` forwarding
|
||||
by the hossenfelder proxy must be probed against a live broker.
|
||||
If the proxy strips the option, we get no `usage` events even
|
||||
for cloud calls. Baseline action.
|
||||
|
||||
PHASE0 is the locked substrate; PHASE1-6 are layered on top. This manifest
|
||||
specifies what Phase 7 adds — **cost / usage observability**: the ability
|
||||
to know, mid-session, how many tokens you've spent and how much money the
|
||||
@@ -75,7 +165,7 @@ Four pillars:
|
||||
|---|---|---|
|
||||
| Where to extract usage | In `broker.chat_stream` event loop, looking at each SSE event's `usage` field on the final chunk | The OpenAI streaming spec puts `usage` on the FINAL chunk when `stream_options: { include_usage: true }` is in the request body. The Anthropic-via-Bedrock path through OpenRouter respects this; need to verify (baseline). |
|
||||
| New on_delta kind | `on_delta("usage", { prompt_tokens, completion_tokens, total_tokens, cost?, model?, native_finish_reason? })` | Mirrors the existing `("text", chunk)` / `("tool_call", call)` shape. Callers ignore unknown kinds; backward-compatible. |
|
||||
| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. |
|
||||
| Where to enable usage on the wire | `opts.include_usage = true` (default `true`) sets `stream_options.include_usage = true` in the outbound request body | Off-switch for hosts that reject `stream_options`. Defaults on; baseline probe confirms current broker tolerates it. (A3: `build_request` signature widens to take an `opts` table; positional growth was getting unwieldy.) |
|
||||
| Accumulator location | `ctx.usage_totals[model_name][category]` table | ctx is per-conversation; matches the `:reset`-survives-or-not rules already in place. |
|
||||
| Categories | `"main"` (ask_ai), `"delegate"`, `"summarize"`, `"memory_summarize"`, `"probe"`, `"norris"` | One-tag-per-call-site. Tagged at the caller site (caller passes `opts.category` to `broker.chat_stream`). |
|
||||
| Cost extraction | `usage.cost` (OpenRouter convention; dollars as a number) plus `usage.cost_details.upstream_inference_cost` (more detailed). For Anthropic/Bedrock the cost arrives in dollars on `usage.cost`. For pure local llama.cpp: no `cost` field — record 0. | Single field name across all observed providers (per baseline B7 — to be confirmed). |
|
||||
@@ -351,12 +441,12 @@ One-shot per session. `:cost reset` clears the flag.
|
||||
|
||||
| # | Question | Impact | Resolution target |
|
||||
|---|---|---|---|
|
||||
| Q-C1 | What to do when a provider doesn't emit `usage` (local llama.cpp, some misconfigured proxies)? Record zero / silently skip / estimate via char/4? | Accumulator entries for those models | Analyze (probe what hossenfelder actually returns per model class) |
|
||||
| Q-C2 | Should `<history.dir>/cost.jsonl` accumulate cross-session totals? `:cost --all-time` reporter? | New file + persistence layer | Defer to follow-up phase (v1 = session only) |
|
||||
| Q-C3 | Categories vs free-form tags — should `opts.category` be free-form (caller decides) or validated against a closed set? | Predictability of `:cost detail` output | Analyze |
|
||||
| Q-C4 | Does the hossenfelder broker forward `stream_options` to all backends? Some backends may strip it. | include_usage default | Baseline (real probe) |
|
||||
| Q-C5 | Should `cfg.cost.warn_at_dollars` trigger on the FIRST broker call that crosses, or on the NEXT one (so user sees the call complete before the warn)? | UX detail | Analyze |
|
||||
| Q-C6 | When `:reset` is called, should `cost_warn_fired` be cleared too (so the next warn-cross fires again)? Or only clear via `:cost reset`? | Reset surface area | Analyze |
|
||||
| Q-C1 | Provider-without-usage handling | A10 — defensive silent skip; baseline probe will confirm shape on local llama.cpp. |
|
||||
| Q-C2 | Cross-session cost persistence (`cost.jsonl`) | Deferred to follow-up phase 8; v1 is session-only. |
|
||||
| Q-C3 | Categories closed-set vs free-form | A4 — **free-form**; caller decides. Matches Phase 6 helpers/skills convention. |
|
||||
| Q-C4 | `stream_options` forwarding by hossenfelder | **Baseline** — probe required against the live broker. |
|
||||
| Q-C5 | Warn fires on the crossed call or the next | A5 — **on the crossed call** (no UX-defeating delay). |
|
||||
| Q-C6 | `:reset` clears `cost_warn_fired` | A6 — **no**, only `:cost reset` clears the flag (R8 parity). |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user