Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.
B1. `stream_options.include_usage = true` is safely accepted by
both backends. REQUIRED for local llama.cpp to emit usage;
no-op for cloud (which emits anyway). Default-true is correct.
B2. Two emission patterns observed:
- Cloud (Bedrock): usage rides the FINAL delta chunk with
non-empty `choices` carrying finish_reason.
- Local: usage rides a SEPARATE chunk with `choices: []`
preceding `[DONE]`.
Both shapes are handled by the same `if doc.usage then ...`
check; the existing on_event choices-branch short-circuits
safely when choices is empty.
B3. `cost` field is dollar-denominated (number) and cloud-only.
Local returns `timings` instead (perf, not cost). Accumulator
captures `usage.cost` as-is; nil treated as 0. :cost detail
annotates local lines so $0 isn't misread.
B4. `doc.model` in the usage event reflects the upstream-API-version
(e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
`anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
caller-intended `model_cfg.model`, NOT `doc.model`, for stable
cross-call comparison.
B5. Usage event is always the LAST data event before `[DONE]`.
Emission of `on_delta("usage", ...)` happens after curl.post_sse
returns — one call per stream, after all text + tool_calls.
Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>