From db26d0ccb7211762d31316dd373df09563f46e11 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 23:30:37 +0000 Subject: [PATCH] context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) --- context.lua | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/context.lua b/context.lua index 71058cc..095d110 100644 --- a/context.lua +++ b/context.lua @@ -319,11 +319,22 @@ function Context:to_messages() return msgs end --- Evict the oldest pair (user + assistant) while we exceed max_turns. Returns --- total turns evicted. Caller is responsible for rendering the §8 status line. +-- Evict the oldest pair (user + assistant) while we exceed max_turns +-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted. +-- Caller is responsible for rendering the §8 status line. +-- +-- R2 guard: when system_prompt alone exceeds token_budget, the OR +-- condition stays true even when turns are empty — would spin +-- forever calling table.remove on a 0-length list. The `and +-- #self.turns > 0` clause ensures we exit when there's nothing +-- left to evict. Over-budget system_prompts (large [project] +-- blocks, etc.) are then on the user to shrink via :tree off / +-- :memory clear / etc. function Context:enforce_budget() local evicted = 0 - while #self.turns > self.max_turns do + while (#self.turns > self.max_turns + or self:estimate_tokens() > self.token_budget) + and #self.turns > 0 do -- Collect evicted slice (pair: user + assistant) local pair = {} pair[#pair + 1] = self.turns[1]