From aa64ad3eecff312ab2b28b5d58a806afdf6170ae Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 23:24:41 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE8:=20plan=20=E2=80=94=20=C2=A713=20co?= =?UTF-8?q?mmit=20roadmap=20(5=20commits)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1). 5-commit roadmap, bottom-up: 1. broker.lua — M.token_count helper + per-endpoint capability cache. /tokenize probe with 2s timeout; cache true/false per (endpoint, model) for the session. char/4 fallback on any non-200 / parse-fail / transport err. M.tokenize_supported introspection helper. 2. context.lua — Context.new accepts opts.tokenize_fn; estimate_ tokens widens to use it when set, with per-turn `_tokens` cache. char/4 path unchanged when tokenize_fn nil. 3. context.lua — enforce_budget consults token_budget too (pillar 5 from A1). Loop condition: turns>max_turns OR estimate_tokens >token_budget. Existing summarize-on-evict callback unchanged. 4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true. Closure captures active_cfg upval (A5 — follows :model switches naturally). :cost detail extension: trailing line showing estimated session ctx tokens for comparison with the per-slot prompt_tokens sums in the accumulator. 5. config.lua commented `tokenize = { use_endpoint = true }` example + PHASE8.md status -> Implement. Per-commit risk index covers: probe latency cap (2s, one-shot), per-turn cache correctness (immutable post-append), enforce_budget performance (O(N) per call after cache fill), and the intentional behavior change of token_budget actually being enforced (sessions fitting under char/4 may evict earlier under accurate counts — documented in §9). Two items open at plan, resolve at implement: - exact :cost detail layout for estimated session ctx row - whether to add a :tokenize debug meta (defer unless useful in verify) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE8.md | 119 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 118 insertions(+), 1 deletion(-) diff --git a/docs/PHASE8.md b/docs/PHASE8.md index 59c85d4..4978935 100644 --- a/docs/PHASE8.md +++ b/docs/PHASE8.md @@ -2,7 +2,7 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 8 Requirements, Architecture & Design Decisions -**Status:** Analyze (formulate complete; tree at `00869ba` probed) +**Status:** Plan (formulate + analyze + baseline complete; tree at `79bd40d`) **Date:** 2026-05-16 **Analyze findings (2026-05-16):** @@ -396,3 +396,120 @@ Candidate follow-ups (non-binding): independent of any current line. Phase 8 itself is self-contained — no upstream dependencies. + +--- + +## 13. Implementation Plan (commit-by-commit) + +Bottom-up: broker first (the egress capability all callers depend +on), then context (the consumer + the new pillar 5 budget extension), +then repl.lua wiring + display, then config + status bump. Each +commit leaves the tree green (existing tests + load smoke + per- +commit feature smoke). + +### Order + +1. **`broker.lua` — `M.token_count` helper + per-endpoint capability cache.** + - Module-local `_tokenize_capable` table keyed by `endpoint .. "/" .. model`. + - `M.token_count(model_cfg, text)`: + - empty text -> 0 + - bad cfg (no endpoint) -> char/4 immediately + - capability cache says `false` for this slot -> char/4 + - otherwise: probe `/tokenize` with `{content, model}` body, + 2s timeout. On `status == 200 + parseable {tokens=[...]}`: + cache `true`, return `#tokens`. Anything else (non-200, parse + fail, transport err): cache `false`, char/4. + - `M.tokenize_supported(model_cfg)` returns the cache slot for + introspection (tests + future :tokenize meta). + - Smoke: hand-call `M.token_count(local_cfg, "hello world")` -> 2; + `M.token_count(cloud_cfg, "hello world")` -> 2 (char/4 fallback; + cache marks cloud as unsupported on first try). + +2. **`context.lua` — estimate_tokens widening + per-turn cache.** + - Context.new accepts `opts.tokenize_fn` -> stored as `self.tokenize_fn`. + - `Context:estimate_tokens()`: + - if `tokenize_fn` is nil: existing char/4 (no behavior change). + - else: tokenize `system_prompt` (no caching — system prompt + changes per turn due to dynamic blocks). + For each turn: if `turn._tokens` is set use it; else + compute via tokenize_fn AND cache on turn._tokens. + - No new helper; the change is internal to estimate_tokens. + - Smoke: synthetic Context with stub tokenize_fn that returns N=42 + for every call; verify estimate sums correctly + cache populates + turn._tokens. + +3. **`context.lua` — enforce_budget honors token_budget (pillar 5).** + - Existing `while #self.turns > self.max_turns` loop extended to: + `while #self.turns > self.max_turns OR self:estimate_tokens() > self.token_budget do`. + - Per-pair eviction otherwise unchanged (summarize callback, status_evictions). + - The estimate_tokens call inside the loop is potentially expensive + under tokenize_fn — but commit #2's per-turn cache means each + iteration is O(#turns) dict-lookups. Acceptable for the eviction + hot path. + - Smoke: Context with `token_budget = 100`, max_turns = 100, fill + with turns until `estimate_tokens() > 100`, then call enforce_budget + — should evict until under budget. With char/4: eviction happens + at known char counts. With tokenize_fn stub returning 50/turn: + evicts down to <= 100/50 = 2 turns. + +4. **`repl.lua` — tokenize_fn wiring + :cost detail est-vs-actual.** + - When `config.tokenize and config.tokenize.use_endpoint`, build + `ctx_opts.tokenize_fn = function(text) + return broker.token_count(active_cfg, text) + end`. The closure captures `active_cfg` upval per A5 — follows + `:model` switches naturally. + - `:cost detail` extension: per slot, compute the sum of actual + `prompt_tokens` (already in usage_totals) AND an "estimated" + count via... actually this needs a rethink: the accumulator + doesn't store per-turn data, just running sums. The estimate + for comparison would need to be over ctx's CURRENT content + (a single snapshot, not per-call). Simplest: show a SECOND + line under `:cost detail` with the current `estimate_tokens()` + value, labeled "[estimated session ctx: N tokens]". Disagreement + with the accumulator's prompt_tokens TOTAL surfaced inline. + - Smoke: with use_endpoint=true on a local-only session, observe + enforce_budget eviction timing vs disable; observe :cost detail + estimate row. + +5. **`config.lua` example block + `docs/PHASE8.md` status bump.** + - Commented-out `tokenize = { use_endpoint = true }` block in + config.lua with parity to Phase 1-7 example blocks. Document + the per-endpoint network cost (one probe per session) and the + implication: token_budget actually enforces now. + - PHASE8.md status header -> **Implement**. + +### Risk index per commit + +| Commit | Risk | Mitigation | +|---|---|---| +| 1 (broker) | Per-endpoint cache leaks across model_cfg deletions (e.g., user removes a model from config mid-session) | Cache is keyed by string; stale entries don't grow without bound (bounded by #configured models × 1). No GC needed. | +| 1 (broker) | /tokenize probe blocks the calling thread for 2s on a misconfigured endpoint | 2s timeout is the cap; one-shot per endpoint per session. | +| 2 (context) | per-turn `_tokens` cache miss on every estimate when no tokenize_fn -> existing perf preserved | Cache check is conditional on tokenize_fn presence; char/4 path untouched. | +| 3 (context) | enforce_budget loop now calls estimate_tokens potentially every iteration; with tokenize_fn that's O(#turns) per iteration -> O(#turns^2) worst case | Per-turn cache makes this O(#turns) amortized after first fill. For typical max_turns=40 + token_budget=4096 sessions: ~40^2 dict lookups = 1600 ops in worst case, microsecond cost. | +| 3 (context) | accurate counts mean token_budget=4096 (Phase 0 default) finally ENFORCES — sessions that fit under char/4 may now evict earlier | Documented in §9; user can raise token_budget to match their model's real context window. | +| 4 (repl) | tokenize_fn closure binding to `active_cfg` upval — if upval somehow gets reassigned wrong, eviction uses wrong tokenizer | Lua upvalues are call-time-resolved; A5 verified. Test by smoke after `:model` switch. | +| 5 (config + status) | none | | + +### Tests + smoke per commit + +Each commit: +- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31) +- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'` +- Pass a per-feature smoke (described in each row above) + +### Things deliberately NOT split + +- New module file for tokenize — small enough to live in broker.lua. +- Per-text token cache (in addition to per-turn): not needed; turn + content is immutable post-append. +- :tokenize meta for introspecting the cache — `M.tokenize_supported` + is exported for testing; if a user needs runtime visibility, that's + a follow-up. + +### Open at plan-time (resolve at implement) + +- :cost detail layout — how exactly to show "estimated session ctx" + relative to the existing per-slot rows. Pick at commit 4 (likely + a single trailing line under the detail table). +- Whether to expose `:tokenize ` for direct-probe debugging. + Nice-to-have; defer unless useful during verify.