aa64ad3eec
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).
5-commit roadmap, bottom-up:
1. broker.lua — M.token_count helper + per-endpoint capability
cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
per (endpoint, model) for the session. char/4 fallback on any
non-200 / parse-fail / transport err. M.tokenize_supported
introspection helper.
2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
tokens widens to use it when set, with per-turn `_tokens` cache.
char/4 path unchanged when tokenize_fn nil.
3. context.lua — enforce_budget consults token_budget too (pillar
5 from A1). Loop condition: turns>max_turns OR estimate_tokens
>token_budget. Existing summarize-on-evict callback unchanged.
4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
Closure captures active_cfg upval (A5 — follows :model switches
naturally). :cost detail extension: trailing line showing
estimated session ctx tokens for comparison with the per-slot
prompt_tokens sums in the accumulator.
5. config.lua commented `tokenize = { use_endpoint = true }`
example + PHASE8.md status -> Implement.
Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).
Two items open at plan, resolve at implement:
- exact :cost detail layout for estimated session ctx row
- whether to add a :tokenize debug meta (defer unless useful in verify)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>