Commit Graph

1 Commits

Author SHA1 Message Date
marfrit 00869ba412 docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).

Four pillars:

  1. Per-endpoint /tokenize probe (cached). One round-trip on first
     call per (endpoint, model); capability cached for session.
     hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
     tokenize — per real probe; the path is endpoint-local, not
     under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
     char/4 fallback.

  2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
     falls back to char/4 on miss. Always returns non-negative int;
     never errors. 2s tight timeout; failures cache as not-supported.

  3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
     Context.new; uses it when present, char/4 otherwise. repl.lua
     wires `tokenize_fn = function(text) return broker.token_count(
     active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
     Per-turn _tokens cache to amortize across estimate calls.

  4. :cost detail est-vs-actual annotation. When the heuristic
     disagrees with the actual prompt_tokens from broker usage by
     >10%, show `~est=N`. Silent otherwise. Display-only; no
     behavior change.

Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.

Baseline already observed during formulate:
  - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
  - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
  - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
    (508 vs 558 on a 2KB README sample). Material for context-
    budget eviction decisions.

Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).

Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:19:53 +00:00