context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)

Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:30:37 +00:00
parent 8502517021
commit db26d0ccb7
1 changed files with 14 additions and 3 deletions
@@ -319,11 +319,22 @@ function Context:to_messages()
    return msgs
 end

-- Evict the oldest pair (user + assistant) while we exceed max_turns. Returns
-- total turns evicted. Caller is responsible for rendering the §8 status line.
+-- Evict the oldest pair (user + assistant) while we exceed max_turns
+-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted.
+-- Caller is responsible for rendering the §8 status line.
+--
+-- R2 guard: when system_prompt alone exceeds token_budget, the OR
+-- condition stays true even when turns are empty — would spin
+-- forever calling table.remove on a 0-length list. The `and
+-- #self.turns > 0` clause ensures we exit when there's nothing
+-- left to evict. Over-budget system_prompts (large [project]
+-- blocks, etc.) are then on the user to shrink via :tree off /
+-- :memory clear / etc.
 function Context:enforce_budget()
    local evicted = 0
-    while #self.turns > self.max_turns do
+    while (#self.turns > self.max_turns
+           or self:estimate_tokens() > self.token_budget)
+          and #self.turns > 0 do
        -- Collect evicted slice (pair: user + assistant)
        local pair = {}
        pair[#pair + 1] = self.turns[1]