467e573d24b3e12cf5632c807b3bf008efde08b0
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
467e573d24 |
docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs
Sonnet-reviewed per reviews-use-sonnet memory directive.
BLOCKERs (RESOLVED in-place):
R1. §5 estimate_tokens pseudocode missing per-turn cache pattern.
Prose described it; code block called tokenize_fn unconditionally.
Implementer following code verbatim would hit the O(N round-
trips per call) perf gap the prose flagged. Code block now
shows explicit `if t._tokens then ... else t._tokens = ... end`.
R2. enforce_budget loop can spin forever when system_prompt alone
exceeds token_budget (e.g. 5KB project block + budget=4096 +
zero turns -> turns can't shrink further but OR-condition stays
true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit
3 row shows the explicit Lua-syntax condition.
CONCERNs (FOLDED):
R3. :cost detail per-slot ~est=N annotation was semantically
undefined — accumulator sum (cumulative across calls + evicted
turns) vs current-snapshot estimate are incommensurable. §6
reworked: ONE trailing summary line "[estimated session ctx:
N tokens; token_budget=M (X% used)]" instead of per-slot
annotations. §13 commit 4 aligned.
R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT
capture by value). Subtle but easy to miss — §13 commit 4 now
spells out the correct vs wrong patterns explicitly.
R5. 2s tokenize timeout can spuriously cache-as-unsupported when
llama.cpp is busy with a concurrent completion (single-threaded
inference; /tokenize queues behind). Documented in §9; v1
ships 2s, revisit during verify if it bites.
R6. Per-endpoint cache key conflated two same-endpoint/different-
model presets (B1: /tokenize ignores the model field). Cache
key simplified to endpoint-only. One probe per endpoint per
session; if a future broker honors the model field, revisit.
NITs (APPLIED):
N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`.
N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1).
N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach
(trailing summary line; per-slot annotation dropped).
N4. Status header tree-hash updated to current (
|
||
|
|
aa64ad3eec |
docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).
5-commit roadmap, bottom-up:
1. broker.lua — M.token_count helper + per-endpoint capability
cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
per (endpoint, model) for the session. char/4 fallback on any
non-200 / parse-fail / transport err. M.tokenize_supported
introspection helper.
2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
tokens widens to use it when set, with per-turn `_tokens` cache.
char/4 path unchanged when tokenize_fn nil.
3. context.lua — enforce_budget consults token_budget too (pillar
5 from A1). Loop condition: turns>max_turns OR estimate_tokens
>token_budget. Existing summarize-on-evict callback unchanged.
4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
Closure captures active_cfg upval (A5 — follows :model switches
naturally). :cost detail extension: trailing line showing
estimated session ctx tokens for comparison with the per-slot
prompt_tokens sums in the accumulator.
5. config.lua commented `tokenize = { use_endpoint = true }`
example + PHASE8.md status -> Implement.
Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).
Two items open at plan, resolve at implement:
- exact :cost detail layout for estimated session ctx row
- whether to add a :tokenize debug meta (defer unless useful in verify)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1a136d81b7 |
docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)
Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).
MAJOR FINDING:
A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
with accurate tokenization, eviction decisions are unaffected.
The new estimate_tokens() would just feed the prompt template
display. Pillar 5 added: enforce_budget evicts when EITHER
max_turns OR token_budget is exceeded. This is the real
motivation for accurate tokenization.
Other findings:
A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3. Single caller of estimate_tokens today; enforce_budget becomes
the second (more frequent) caller — per-turn _tokens cache
becomes important.
A4. Q-T1: cache lives on turn dict; dies with turns on :reset.
A5. Q-T2: closure captures active_cfg upval; follows :model switch
naturally.
A6. Q-T3: opt-out skips the probe entirely (no wiring).
A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per
session; under-count bounded).
A8. _tokens cache invalidation: only :reset; turn content is
immutable after append.
A9. Probe latency ~50ms/call locally; per-turn cache amortizes to
O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
different endpoints, different code paths.
§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
00869ba412 |
docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).
Four pillars:
1. Per-endpoint /tokenize probe (cached). One round-trip on first
call per (endpoint, model); capability cached for session.
hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
tokenize — per real probe; the path is endpoint-local, not
under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
char/4 fallback.
2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
falls back to char/4 on miss. Always returns non-negative int;
never errors. 2s tight timeout; failures cache as not-supported.
3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
Context.new; uses it when present, char/4 otherwise. repl.lua
wires `tokenize_fn = function(text) return broker.token_count(
active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
Per-turn _tokens cache to amortize across estimate calls.
4. :cost detail est-vs-actual annotation. When the heuristic
disagrees with the actual prompt_tokens from broker usage by
>10%, show `~est=N`. Silent otherwise. Display-only; no
behavior change.
Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.
Baseline already observed during formulate:
- /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
- Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
- Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
(508 vs 558 on a 2KB README sample). Material for context-
budget eviction decisions.
Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).
Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|