marfrit/aish - aish - marfrit's space

marfrit/aish

Fork 0

Commit Graph

Author	SHA1	Message	Date
marfrit	79bd40db79	docs/PHASE8-baseline: live /tokenize probes Four findings, all align with formulate/analyze: B1. /tokenize IGNORES the `model` request field — returns the tokenization of whichever model is currently loaded on the proxy backend, NOT the requested model. Acceptable: a real BPE count is still much better than char/4, and the gap between Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s regardless, so cloud falls back to char/4 via the capability cache. B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars. Network round-trip dominates. Per-turn _tokens cache amortizes to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time cost on first enforce_budget call. Acceptable. B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs; we use #response.tokens for count, discard the IDs). JSON not SSE; ffi.curl.M.post is the right call. B4. Cloud /tokenize 404s as expected. Capability cache marks it unsupported on first probe; char/4 fallback silent thereafter. No design change. Q-T5 RESOLVED per B1. All open questions now resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:22:05 +00:00

Author

SHA1

Message

Date

marfrit

79bd40db79

docs/PHASE8-baseline: live /tokenize probes

Four findings, all align with formulate/analyze:

B1. /tokenize IGNORES the `model` request field — returns the
    tokenization of whichever model is currently loaded on the
    proxy backend, NOT the requested model. Acceptable: a real BPE
    count is still much better than char/4, and the gap between
    Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s
    regardless, so cloud falls back to char/4 via the capability
    cache.

B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars.
    Network round-trip dominates. Per-turn _tokens cache amortizes
    to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time
    cost on first enforce_budget call. Acceptable.

B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs;
    we use #response.tokens for count, discard the IDs). JSON not
    SSE; ffi.curl.M.post is the right call.

B4. Cloud /tokenize 404s as expected. Capability cache marks it
    unsupported on first probe; char/4 fallback silent thereafter.
    No design change.

Q-T5 RESOLVED per B1. All open questions now resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 23:22:05 +00:00

1 Commits