1 Commits

Author SHA1 Message Date
marfrit 2244a3f1ee docs/PHASE7-baseline: live broker probes for usage shape
Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.

B1. `stream_options.include_usage = true` is safely accepted by
    both backends. REQUIRED for local llama.cpp to emit usage;
    no-op for cloud (which emits anyway). Default-true is correct.

B2. Two emission patterns observed:
    - Cloud (Bedrock): usage rides the FINAL delta chunk with
      non-empty `choices` carrying finish_reason.
    - Local: usage rides a SEPARATE chunk with `choices: []`
      preceding `[DONE]`.
    Both shapes are handled by the same `if doc.usage then ...`
    check; the existing on_event choices-branch short-circuits
    safely when choices is empty.

B3. `cost` field is dollar-denominated (number) and cloud-only.
    Local returns `timings` instead (perf, not cost). Accumulator
    captures `usage.cost` as-is; nil treated as 0. :cost detail
    annotates local lines so $0 isn't misread.

B4. `doc.model` in the usage event reflects the upstream-API-version
    (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
    `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
    caller-intended `model_cfg.model`, NOT `doc.model`, for stable
    cross-call comparison.

B5. Usage event is always the LAST data event before `[DONE]`.
    Emission of `on_delta("usage", ...)` happens after curl.post_sse
    returns — one call per stream, after all text + tool_calls.

Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:53 +00:00