docs/PHASE8: plan — §13 commit roadmap (5 commits)

Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).

5-commit roadmap, bottom-up:

  1. broker.lua — M.token_count helper + per-endpoint capability
     cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
     per (endpoint, model) for the session. char/4 fallback on any
     non-200 / parse-fail / transport err. M.tokenize_supported
     introspection helper.

  2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
     tokens widens to use it when set, with per-turn `_tokens` cache.
     char/4 path unchanged when tokenize_fn nil.

  3. context.lua — enforce_budget consults token_budget too (pillar
     5 from A1). Loop condition: turns>max_turns OR estimate_tokens
     >token_budget. Existing summarize-on-evict callback unchanged.

  4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
     Closure captures active_cfg upval (A5 — follows :model switches
     naturally). :cost detail extension: trailing line showing
     estimated session ctx tokens for comparison with the per-slot
     prompt_tokens sums in the accumulator.

  5. config.lua commented `tokenize = { use_endpoint = true }`
     example + PHASE8.md status -> Implement.

Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).

Two items open at plan, resolve at implement:
  - exact :cost detail layout for estimated session ctx row
  - whether to add a :tokenize debug meta (defer unless useful in verify)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 23:24:41 +00:00
parent 79bd40db79
commit aa64ad3eec
+118 -1
View File
@@ -2,7 +2,7 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 8 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; tree at `00869ba` probed)
**Status:** Plan (formulate + analyze + baseline complete; tree at `79bd40d`)
**Date:** 2026-05-16
**Analyze findings (2026-05-16):**
@@ -396,3 +396,120 @@ Candidate follow-ups (non-binding):
independent of any current line.
Phase 8 itself is self-contained — no upstream dependencies.
---
## 13. Implementation Plan (commit-by-commit)
Bottom-up: broker first (the egress capability all callers depend
on), then context (the consumer + the new pillar 5 budget extension),
then repl.lua wiring + display, then config + status bump. Each
commit leaves the tree green (existing tests + load smoke + per-
commit feature smoke).
### Order
1. **`broker.lua``M.token_count` helper + per-endpoint capability cache.**
- Module-local `_tokenize_capable` table keyed by `endpoint .. "/" .. model`.
- `M.token_count(model_cfg, text)`:
- empty text -> 0
- bad cfg (no endpoint) -> char/4 immediately
- capability cache says `false` for this slot -> char/4
- otherwise: probe `<endpoint>/tokenize` with `{content, model}` body,
2s timeout. On `status == 200 + parseable {tokens=[...]}`:
cache `true`, return `#tokens`. Anything else (non-200, parse
fail, transport err): cache `false`, char/4.
- `M.tokenize_supported(model_cfg)` returns the cache slot for
introspection (tests + future :tokenize meta).
- Smoke: hand-call `M.token_count(local_cfg, "hello world")` -> 2;
`M.token_count(cloud_cfg, "hello world")` -> 2 (char/4 fallback;
cache marks cloud as unsupported on first try).
2. **`context.lua` — estimate_tokens widening + per-turn cache.**
- Context.new accepts `opts.tokenize_fn` -> stored as `self.tokenize_fn`.
- `Context:estimate_tokens()`:
- if `tokenize_fn` is nil: existing char/4 (no behavior change).
- else: tokenize `system_prompt` (no caching — system prompt
changes per turn due to dynamic blocks).
For each turn: if `turn._tokens` is set use it; else
compute via tokenize_fn AND cache on turn._tokens.
- No new helper; the change is internal to estimate_tokens.
- Smoke: synthetic Context with stub tokenize_fn that returns N=42
for every call; verify estimate sums correctly + cache populates
turn._tokens.
3. **`context.lua` — enforce_budget honors token_budget (pillar 5).**
- Existing `while #self.turns > self.max_turns` loop extended to:
`while #self.turns > self.max_turns OR self:estimate_tokens() > self.token_budget do`.
- Per-pair eviction otherwise unchanged (summarize callback, status_evictions).
- The estimate_tokens call inside the loop is potentially expensive
under tokenize_fn — but commit #2's per-turn cache means each
iteration is O(#turns) dict-lookups. Acceptable for the eviction
hot path.
- Smoke: Context with `token_budget = 100`, max_turns = 100, fill
with turns until `estimate_tokens() > 100`, then call enforce_budget
— should evict until under budget. With char/4: eviction happens
at known char counts. With tokenize_fn stub returning 50/turn:
evicts down to <= 100/50 = 2 turns.
4. **`repl.lua` — tokenize_fn wiring + :cost detail est-vs-actual.**
- When `config.tokenize and config.tokenize.use_endpoint`, build
`ctx_opts.tokenize_fn = function(text)
return broker.token_count(active_cfg, text)
end`. The closure captures `active_cfg` upval per A5 — follows
`:model` switches naturally.
- `:cost detail` extension: per slot, compute the sum of actual
`prompt_tokens` (already in usage_totals) AND an "estimated"
count via... actually this needs a rethink: the accumulator
doesn't store per-turn data, just running sums. The estimate
for comparison would need to be over ctx's CURRENT content
(a single snapshot, not per-call). Simplest: show a SECOND
line under `:cost detail` with the current `estimate_tokens()`
value, labeled "[estimated session ctx: N tokens]". Disagreement
with the accumulator's prompt_tokens TOTAL surfaced inline.
- Smoke: with use_endpoint=true on a local-only session, observe
enforce_budget eviction timing vs disable; observe :cost detail
estimate row.
5. **`config.lua` example block + `docs/PHASE8.md` status bump.**
- Commented-out `tokenize = { use_endpoint = true }` block in
config.lua with parity to Phase 1-7 example blocks. Document
the per-endpoint network cost (one probe per session) and the
implication: token_budget actually enforces now.
- PHASE8.md status header -> **Implement**.
### Risk index per commit
| Commit | Risk | Mitigation |
|---|---|---|
| 1 (broker) | Per-endpoint cache leaks across model_cfg deletions (e.g., user removes a model from config mid-session) | Cache is keyed by string; stale entries don't grow without bound (bounded by #configured models × 1). No GC needed. |
| 1 (broker) | /tokenize probe blocks the calling thread for 2s on a misconfigured endpoint | 2s timeout is the cap; one-shot per endpoint per session. |
| 2 (context) | per-turn `_tokens` cache miss on every estimate when no tokenize_fn -> existing perf preserved | Cache check is conditional on tokenize_fn presence; char/4 path untouched. |
| 3 (context) | enforce_budget loop now calls estimate_tokens potentially every iteration; with tokenize_fn that's O(#turns) per iteration -> O(#turns^2) worst case | Per-turn cache makes this O(#turns) amortized after first fill. For typical max_turns=40 + token_budget=4096 sessions: ~40^2 dict lookups = 1600 ops in worst case, microsecond cost. |
| 3 (context) | accurate counts mean token_budget=4096 (Phase 0 default) finally ENFORCES — sessions that fit under char/4 may now evict earlier | Documented in §9; user can raise token_budget to match their model's real context window. |
| 4 (repl) | tokenize_fn closure binding to `active_cfg` upval — if upval somehow gets reassigned wrong, eviction uses wrong tokenizer | Lua upvalues are call-time-resolved; A5 verified. Test by smoke after `:model` switch. |
| 5 (config + status) | none | |
### Tests + smoke per commit
Each commit:
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'`
- Pass a per-feature smoke (described in each row above)
### Things deliberately NOT split
- New module file for tokenize — small enough to live in broker.lua.
- Per-text token cache (in addition to per-turn): not needed; turn
content is immutable post-append.
- :tokenize meta for introspecting the cache — `M.tokenize_supported`
is exported for testing; if a user needs runtime visibility, that's
a follow-up.
### Open at plan-time (resolve at implement)
- :cost detail layout — how exactly to show "estimated session ctx"
relative to the existing per-slot rows. Pick at commit 4 (likely
a single trailing line under the detail table).
- Whether to expose `:tokenize <text>` for direct-probe debugging.
Nice-to-have; defer unless useful during verify.