From db26d0ccb7211762d31316dd373df09563f46e11 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat, 16 May 2026 23:30:37 +0000
Subject: [PATCH] context: enforce_budget honors token_budget + R2 guard (Phase
 8 commit #3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pillar 5 (analyze finding A1) — the real value-add of Phase 8.
Until now, ctx.token_budget = 4096 was set but never enforced;
enforce_budget only looked at max_turns. With commit #2's accurate
tokenization wired in (via commit #4), eviction now finally fires
when the actual context fills the budget.

Loop condition change:

  before:
    while #self.turns > self.max_turns do

  after:
    while (#self.turns > self.max_turns
           or self:estimate_tokens() > self.token_budget)
          and #self.turns > 0 do

R2 guard: the `and #self.turns > 0` clause is essential. When
system_prompt alone exceeds token_budget (e.g. a 5000-token [project]
block with token_budget=4096), the OR-condition stays true even
when turns are empty — table.remove on a 0-length list would
no-op forever while evicted++ spins. Sonnet review caught this;
without the guard, real users could hit an infinite loop just
by setting a small token_budget + opening a large project tree.

Per-pair eviction logic (summarize callback + pair-pop) inside the
loop is unchanged. The estimate_tokens call is potentially expensive
under tokenize_fn — commit #2's per-turn cache amortizes to O(N)
per iteration after first fill; for max_turns=40 + budget=4096
sessions the worst case is microseconds per call.

Unit-verified across 5 cases (with and without tokenize_fn):
  1. max_turns eviction unchanged (no behavior regression).
  2. char/4 path: tight budget evicts to 0 when sys > budget,
     exits via R2 guard.
  3. char/4 path: practical budget evicts to a stable count.
  4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn
     count.
  5. R2 critical: zero turns + oversize sys -> immediate exit,
     evicted=0, no spin.

Behavior change for existing users: a session that fit under
token_budget=4096 by char/4 (~16K chars) may now evict earlier
because accurate counts are HIGHER for most natural-text inputs
(per baseline B2). Users on cloud presets with very large context
windows (Claude 200K) should raise token_budget to match — see §9
risk row in PHASE8.md.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 context.lua | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/context.lua b/context.lua
index 71058cc..095d110 100644
--- a/context.lua
+++ b/context.lua
@@ -319,11 +319,22 @@ function Context:to_messages()
     return msgs
 end
 
--- Evict the oldest pair (user + assistant) while we exceed max_turns. Returns
--- total turns evicted. Caller is responsible for rendering the §8 status line.
+-- Evict the oldest pair (user + assistant) while we exceed max_turns
+-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted.
+-- Caller is responsible for rendering the §8 status line.
+--
+-- R2 guard: when system_prompt alone exceeds token_budget, the OR
+-- condition stays true even when turns are empty — would spin
+-- forever calling table.remove on a 0-length list. The `and
+-- #self.turns > 0` clause ensures we exit when there's nothing
+-- left to evict. Over-budget system_prompts (large [project]
+-- blocks, etc.) are then on the user to shrink via :tree off /
+-- :memory clear / etc.
 function Context:enforce_budget()
     local evicted = 0
-    while #self.turns > self.max_turns do
+    while (#self.turns > self.max_turns
+           or self:estimate_tokens() > self.token_budget)
+          and #self.turns > 0 do
         -- Collect evicted slice (pair: user + assistant)
         local pair = {}
         pair[#pair + 1] = self.turns[1]