marfrit/aish - aish - marfrit's space

marfrit/aish

Fork 0

Commit Graph

Author	SHA1	Message	Date
marfrit	1a136d81b7	docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget) Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs resolved in-place (Q-T5 deferred to baseline). MAJOR FINDING: A1. enforce_budget ONLY checks max_turns, NOT token_budget — even with accurate tokenization, eviction decisions are unaffected. The new estimate_tokens() would just feed the prompt template display. Pillar 5 added: enforce_budget evicts when EITHER max_turns OR token_budget is exceeded. This is the real motivation for accurate tokenization. Other findings: A2. ffi.curl.M.post signature confirmed (body, status) / (nil, err). A3. Single caller of estimate_tokens today; enforce_budget becomes the second (more frequent) caller — per-turn _tokens cache becomes important. A4. Q-T1: cache lives on turn dict; dies with turns on :reset. A5. Q-T2: closure captures active_cfg upval; follows :model switch naturally. A6. Q-T3: opt-out skips the probe entirely (no wiring). A7. Q-T6: tools-schema tokens deferred to follow-up (fixed per session; under-count bounded). A8. _tokens cache invalidation: only :reset; turn content is immutable after append. A9. Probe latency ~50ms/call locally; per-turn cache amortizes to O(1) after first count. A10. estimate_tokens called OUTSIDE streaming callback; no race. A11. role:"tool" turns tokenize identically; per-turn cache works. A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal — different endpoints, different code paths. §1 expanded to 5 pillars (pillar 5 = enforce_budget extension). §3 context.lua row updated to reference the enforce_budget change + per-turn _tokens cache. §9 risk row added: accurate counts mean the default token_budget=4096 is finally ENFORCED — sessions that spilled silently under char/4 may now evict earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:21:24 +00:00
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00

Author

SHA1

Message

Date

marfrit

1a136d81b7

docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)

Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).

MAJOR FINDING:

A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
    with accurate tokenization, eviction decisions are unaffected.
    The new estimate_tokens() would just feed the prompt template
    display. Pillar 5 added: enforce_budget evicts when EITHER
    max_turns OR token_budget is exceeded. This is the real
    motivation for accurate tokenization.

Other findings:

A2.  ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3.  Single caller of estimate_tokens today; enforce_budget becomes
     the second (more frequent) caller — per-turn _tokens cache
     becomes important.
A4.  Q-T1: cache lives on turn dict; dies with turns on :reset.
A5.  Q-T2: closure captures active_cfg upval; follows :model switch
     naturally.
A6.  Q-T3: opt-out skips the probe entirely (no wiring).
A7.  Q-T6: tools-schema tokens deferred to follow-up (fixed per
     session; under-count bounded).
A8.  _tokens cache invalidation: only :reset; turn content is
     immutable after append.
A9.  Probe latency ~50ms/call locally; per-turn cache amortizes to
     O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
     different endpoints, different code paths.

§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 23:21:24 +00:00

marfrit

00869ba412

docs/PHASE8: formulate — accurate tokenization (resolves Q1)

Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).

Four pillars:

  1. Per-endpoint /tokenize probe (cached). One round-trip on first
     call per (endpoint, model); capability cached for session.
     hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
     tokenize — per real probe; the path is endpoint-local, not
     under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
     char/4 fallback.

  2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
     falls back to char/4 on miss. Always returns non-negative int;
     never errors. 2s tight timeout; failures cache as not-supported.

  3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
     Context.new; uses it when present, char/4 otherwise. repl.lua
     wires `tokenize_fn = function(text) return broker.token_count(
     active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
     Per-turn _tokens cache to amortize across estimate calls.

  4. :cost detail est-vs-actual annotation. When the heuristic
     disagrees with the actual prompt_tokens from broker usage by
     >10%, show `~est=N`. Silent otherwise. Display-only; no
     behavior change.

Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.

Baseline already observed during formulate:
  - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
  - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
  - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
    (508 vs 558 on a 2KB README sample). Material for context-
    budget eviction decisions.

Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).

Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 23:19:53 +00:00

2 Commits