marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	467e573d24	docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs Sonnet-reviewed per reviews-use-sonnet memory directive. BLOCKERs (RESOLVED in-place): R1. §5 estimate_tokens pseudocode missing per-turn cache pattern. Prose described it; code block called tokenize_fn unconditionally. Implementer following code verbatim would hit the O(N round- trips per call) perf gap the prose flagged. Code block now shows explicit `if t._tokens then ... else t._tokens = ... end`. R2. enforce_budget loop can spin forever when system_prompt alone exceeds token_budget (e.g. 5KB project block + budget=4096 + zero turns -> turns can't shrink further but OR-condition stays true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit 3 row shows the explicit Lua-syntax condition. CONCERNs (FOLDED): R3. :cost detail per-slot ~est=N annotation was semantically undefined — accumulator sum (cumulative across calls + evicted turns) vs current-snapshot estimate are incommensurable. §6 reworked: ONE trailing summary line "[estimated session ctx: N tokens; token_budget=M (X% used)]" instead of per-slot annotations. §13 commit 4 aligned. R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT capture by value). Subtle but easy to miss — §13 commit 4 now spells out the correct vs wrong patterns explicitly. R5. 2s tokenize timeout can spuriously cache-as-unsupported when llama.cpp is busy with a concurrent completion (single-threaded inference; /tokenize queues behind). Documented in §9; v1 ships 2s, revisit during verify if it bites. R6. Per-endpoint cache key conflated two same-endpoint/different- model presets (B1: /tokenize ignores the model field). Cache key simplified to endpoint-only. One probe per endpoint per session; if a future broker honors the model field, revisit. NITs (APPLIED): N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`. N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1). N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach (trailing summary line; per-slot annotation dropped). N4. Status header tree-hash updated to current (`aa64ad3` -> stays fresh through review fold-in; commit 5 will refresh again at "Implement" status). PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for implementation phase 6 of the inner loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:27 +00:00
marfrit	aa64ad3eec	docs/PHASE8: plan — §13 commit roadmap (5 commits) Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1). 5-commit roadmap, bottom-up: 1. broker.lua — M.token_count helper + per-endpoint capability cache. <endpoint>/tokenize probe with 2s timeout; cache true/false per (endpoint, model) for the session. char/4 fallback on any non-200 / parse-fail / transport err. M.tokenize_supported introspection helper. 2. context.lua — Context.new accepts opts.tokenize_fn; estimate_ tokens widens to use it when set, with per-turn `_tokens` cache. char/4 path unchanged when tokenize_fn nil. 3. context.lua — enforce_budget consults token_budget too (pillar 5 from A1). Loop condition: turns>max_turns OR estimate_tokens >token_budget. Existing summarize-on-evict callback unchanged. 4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true. Closure captures active_cfg upval (A5 — follows :model switches naturally). :cost detail extension: trailing line showing estimated session ctx tokens for comparison with the per-slot prompt_tokens sums in the accumulator. 5. config.lua commented `tokenize = { use_endpoint = true }` example + PHASE8.md status -> Implement. Per-commit risk index covers: probe latency cap (2s, one-shot), per-turn cache correctness (immutable post-append), enforce_budget performance (O(N) per call after cache fill), and the intentional behavior change of token_budget actually being enforced (sessions fitting under char/4 may evict earlier under accurate counts — documented in §9). Two items open at plan, resolve at implement: - exact :cost detail layout for estimated session ctx row - whether to add a :tokenize debug meta (defer unless useful in verify) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:24:41 +00:00
marfrit	1a136d81b7	docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget) Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs resolved in-place (Q-T5 deferred to baseline). MAJOR FINDING: A1. enforce_budget ONLY checks max_turns, NOT token_budget — even with accurate tokenization, eviction decisions are unaffected. The new estimate_tokens() would just feed the prompt template display. Pillar 5 added: enforce_budget evicts when EITHER max_turns OR token_budget is exceeded. This is the real motivation for accurate tokenization. Other findings: A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err). A3. Single caller of estimate_tokens today; enforce_budget becomes the second (more frequent) caller — per-turn _tokens cache becomes important. A4. Q-T1: cache lives on turn dict; dies with turns on :reset. A5. Q-T2: closure captures active_cfg upval; follows :model switch naturally. A6. Q-T3: opt-out skips the probe entirely (no wiring). A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per session; under-count bounded). A8. _tokens cache invalidation: only :reset; turn content is immutable after append. A9. Probe latency ~50ms/call locally; per-turn cache amortizes to O(1) after first count. A10. estimate_tokens called OUTSIDE streaming callback; no race. A11. role:"tool" turns tokenize identically; per-turn cache works. A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal — different endpoints, different code paths. §1 expanded to 5 pillars (pillar 5 = enforce_budget extension). §3 context.lua row updated to reference the enforce_budget change + per-turn _tokens cache. §9 risk row added: accurate counts mean the default token_budget=4096 is finally ENFORCED — sessions that spilled silently under char/4 may now evict earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:21:24 +00:00
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00

4 Commits