docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).
5-commit roadmap, bottom-up:
1. broker.lua — M.token_count helper + per-endpoint capability
cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
per (endpoint, model) for the session. char/4 fallback on any
non-200 / parse-fail / transport err. M.tokenize_supported
introspection helper.
2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
tokens widens to use it when set, with per-turn `_tokens` cache.
char/4 path unchanged when tokenize_fn nil.
3. context.lua — enforce_budget consults token_budget too (pillar
5 from A1). Loop condition: turns>max_turns OR estimate_tokens
>token_budget. Existing summarize-on-evict callback unchanged.
4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
Closure captures active_cfg upval (A5 — follows :model switches
naturally). :cost detail extension: trailing line showing
estimated session ctx tokens for comparison with the per-slot
prompt_tokens sums in the accumulator.
5. config.lua commented `tokenize = { use_endpoint = true }`
example + PHASE8.md status -> Implement.
Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).
Two items open at plan, resolve at implement:
- exact :cost detail layout for estimated session ctx row
- whether to add a :tokenize debug meta (defer unless useful in verify)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+118
-1
@@ -2,7 +2,7 @@
|
||||
|
||||
**Project:** aish — AI-augmented conversational shell
|
||||
**Document:** Phase 8 Requirements, Architecture & Design Decisions
|
||||
**Status:** Analyze (formulate complete; tree at `00869ba` probed)
|
||||
**Status:** Plan (formulate + analyze + baseline complete; tree at `79bd40d`)
|
||||
**Date:** 2026-05-16
|
||||
|
||||
**Analyze findings (2026-05-16):**
|
||||
@@ -396,3 +396,120 @@ Candidate follow-ups (non-binding):
|
||||
independent of any current line.
|
||||
|
||||
Phase 8 itself is self-contained — no upstream dependencies.
|
||||
|
||||
---
|
||||
|
||||
## 13. Implementation Plan (commit-by-commit)
|
||||
|
||||
Bottom-up: broker first (the egress capability all callers depend
|
||||
on), then context (the consumer + the new pillar 5 budget extension),
|
||||
then repl.lua wiring + display, then config + status bump. Each
|
||||
commit leaves the tree green (existing tests + load smoke + per-
|
||||
commit feature smoke).
|
||||
|
||||
### Order
|
||||
|
||||
1. **`broker.lua` — `M.token_count` helper + per-endpoint capability cache.**
|
||||
- Module-local `_tokenize_capable` table keyed by `endpoint .. "/" .. model`.
|
||||
- `M.token_count(model_cfg, text)`:
|
||||
- empty text -> 0
|
||||
- bad cfg (no endpoint) -> char/4 immediately
|
||||
- capability cache says `false` for this slot -> char/4
|
||||
- otherwise: probe `<endpoint>/tokenize` with `{content, model}` body,
|
||||
2s timeout. On `status == 200 + parseable {tokens=[...]}`:
|
||||
cache `true`, return `#tokens`. Anything else (non-200, parse
|
||||
fail, transport err): cache `false`, char/4.
|
||||
- `M.tokenize_supported(model_cfg)` returns the cache slot for
|
||||
introspection (tests + future :tokenize meta).
|
||||
- Smoke: hand-call `M.token_count(local_cfg, "hello world")` -> 2;
|
||||
`M.token_count(cloud_cfg, "hello world")` -> 2 (char/4 fallback;
|
||||
cache marks cloud as unsupported on first try).
|
||||
|
||||
2. **`context.lua` — estimate_tokens widening + per-turn cache.**
|
||||
- Context.new accepts `opts.tokenize_fn` -> stored as `self.tokenize_fn`.
|
||||
- `Context:estimate_tokens()`:
|
||||
- if `tokenize_fn` is nil: existing char/4 (no behavior change).
|
||||
- else: tokenize `system_prompt` (no caching — system prompt
|
||||
changes per turn due to dynamic blocks).
|
||||
For each turn: if `turn._tokens` is set use it; else
|
||||
compute via tokenize_fn AND cache on turn._tokens.
|
||||
- No new helper; the change is internal to estimate_tokens.
|
||||
- Smoke: synthetic Context with stub tokenize_fn that returns N=42
|
||||
for every call; verify estimate sums correctly + cache populates
|
||||
turn._tokens.
|
||||
|
||||
3. **`context.lua` — enforce_budget honors token_budget (pillar 5).**
|
||||
- Existing `while #self.turns > self.max_turns` loop extended to:
|
||||
`while #self.turns > self.max_turns OR self:estimate_tokens() > self.token_budget do`.
|
||||
- Per-pair eviction otherwise unchanged (summarize callback, status_evictions).
|
||||
- The estimate_tokens call inside the loop is potentially expensive
|
||||
under tokenize_fn — but commit #2's per-turn cache means each
|
||||
iteration is O(#turns) dict-lookups. Acceptable for the eviction
|
||||
hot path.
|
||||
- Smoke: Context with `token_budget = 100`, max_turns = 100, fill
|
||||
with turns until `estimate_tokens() > 100`, then call enforce_budget
|
||||
— should evict until under budget. With char/4: eviction happens
|
||||
at known char counts. With tokenize_fn stub returning 50/turn:
|
||||
evicts down to <= 100/50 = 2 turns.
|
||||
|
||||
4. **`repl.lua` — tokenize_fn wiring + :cost detail est-vs-actual.**
|
||||
- When `config.tokenize and config.tokenize.use_endpoint`, build
|
||||
`ctx_opts.tokenize_fn = function(text)
|
||||
return broker.token_count(active_cfg, text)
|
||||
end`. The closure captures `active_cfg` upval per A5 — follows
|
||||
`:model` switches naturally.
|
||||
- `:cost detail` extension: per slot, compute the sum of actual
|
||||
`prompt_tokens` (already in usage_totals) AND an "estimated"
|
||||
count via... actually this needs a rethink: the accumulator
|
||||
doesn't store per-turn data, just running sums. The estimate
|
||||
for comparison would need to be over ctx's CURRENT content
|
||||
(a single snapshot, not per-call). Simplest: show a SECOND
|
||||
line under `:cost detail` with the current `estimate_tokens()`
|
||||
value, labeled "[estimated session ctx: N tokens]". Disagreement
|
||||
with the accumulator's prompt_tokens TOTAL surfaced inline.
|
||||
- Smoke: with use_endpoint=true on a local-only session, observe
|
||||
enforce_budget eviction timing vs disable; observe :cost detail
|
||||
estimate row.
|
||||
|
||||
5. **`config.lua` example block + `docs/PHASE8.md` status bump.**
|
||||
- Commented-out `tokenize = { use_endpoint = true }` block in
|
||||
config.lua with parity to Phase 1-7 example blocks. Document
|
||||
the per-endpoint network cost (one probe per session) and the
|
||||
implication: token_budget actually enforces now.
|
||||
- PHASE8.md status header -> **Implement**.
|
||||
|
||||
### Risk index per commit
|
||||
|
||||
| Commit | Risk | Mitigation |
|
||||
|---|---|---|
|
||||
| 1 (broker) | Per-endpoint cache leaks across model_cfg deletions (e.g., user removes a model from config mid-session) | Cache is keyed by string; stale entries don't grow without bound (bounded by #configured models × 1). No GC needed. |
|
||||
| 1 (broker) | /tokenize probe blocks the calling thread for 2s on a misconfigured endpoint | 2s timeout is the cap; one-shot per endpoint per session. |
|
||||
| 2 (context) | per-turn `_tokens` cache miss on every estimate when no tokenize_fn -> existing perf preserved | Cache check is conditional on tokenize_fn presence; char/4 path untouched. |
|
||||
| 3 (context) | enforce_budget loop now calls estimate_tokens potentially every iteration; with tokenize_fn that's O(#turns) per iteration -> O(#turns^2) worst case | Per-turn cache makes this O(#turns) amortized after first fill. For typical max_turns=40 + token_budget=4096 sessions: ~40^2 dict lookups = 1600 ops in worst case, microsecond cost. |
|
||||
| 3 (context) | accurate counts mean token_budget=4096 (Phase 0 default) finally ENFORCES — sessions that fit under char/4 may now evict earlier | Documented in §9; user can raise token_budget to match their model's real context window. |
|
||||
| 4 (repl) | tokenize_fn closure binding to `active_cfg` upval — if upval somehow gets reassigned wrong, eviction uses wrong tokenizer | Lua upvalues are call-time-resolved; A5 verified. Test by smoke after `:model` switch. |
|
||||
| 5 (config + status) | none | |
|
||||
|
||||
### Tests + smoke per commit
|
||||
|
||||
Each commit:
|
||||
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
|
||||
- Load cleanly via `luajit -e 'package.path=...; require("repl"); print("ok")'`
|
||||
- Pass a per-feature smoke (described in each row above)
|
||||
|
||||
### Things deliberately NOT split
|
||||
|
||||
- New module file for tokenize — small enough to live in broker.lua.
|
||||
- Per-text token cache (in addition to per-turn): not needed; turn
|
||||
content is immutable post-append.
|
||||
- :tokenize meta for introspecting the cache — `M.tokenize_supported`
|
||||
is exported for testing; if a user needs runtime visibility, that's
|
||||
a follow-up.
|
||||
|
||||
### Open at plan-time (resolve at implement)
|
||||
|
||||
- :cost detail layout — how exactly to show "estimated session ctx"
|
||||
relative to the existing per-slot rows. Pick at commit 4 (likely
|
||||
a single trailing line under the detail table).
|
||||
- Whether to expose `:tokenize <text>` for direct-probe debugging.
|
||||
Nice-to-have; defer unless useful during verify.
|
||||
|
||||
Reference in New Issue
Block a user