context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)

Pillar 5 (analyze finding A1) — the real value-add of Phase 8.
Until now, ctx.token_budget = 4096 was set but never enforced;
enforce_budget only looked at max_turns. With commit #2's accurate
tokenization wired in (via commit #4), eviction now finally fires
when the actual context fills the budget.

Loop condition change:

  before:
    while #self.turns > self.max_turns do

  after:
    while (#self.turns > self.max_turns
           or self:estimate_tokens() > self.token_budget)
          and #self.turns > 0 do

R2 guard: the `and #self.turns > 0` clause is essential. When
system_prompt alone exceeds token_budget (e.g. a 5000-token [project]
block with token_budget=4096), the OR-condition stays true even
when turns are empty — table.remove on a 0-length list would
no-op forever while evicted++ spins. Sonnet review caught this;
without the guard, real users could hit an infinite loop just
by setting a small token_budget + opening a large project tree.

Per-pair eviction logic (summarize callback + pair-pop) inside the
loop is unchanged. The estimate_tokens call is potentially expensive
under tokenize_fn — commit #2's per-turn cache amortizes to O(N)
per iteration after first fill; for max_turns=40 + budget=4096
sessions the worst case is microseconds per call.

Unit-verified across 5 cases (with and without tokenize_fn):
  1. max_turns eviction unchanged (no behavior regression).
  2. char/4 path: tight budget evicts to 0 when sys > budget,
     exits via R2 guard.
  3. char/4 path: practical budget evicts to a stable count.
  4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn
     count.
  5. R2 critical: zero turns + oversize sys -> immediate exit,
     evicted=0, no spin.

Behavior change for existing users: a session that fit under
token_budget=4096 by char/4 (~16K chars) may now evict earlier
because accurate counts are HIGHER for most natural-text inputs
(per baseline B2). Users on cloud presets with very large context
windows (Claude 200K) should raise token_budget to match — see §9
risk row in PHASE8.md.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 23:30:37 +00:00
parent 8502517021
commit db26d0ccb7
+14 -3
View File
@@ -319,11 +319,22 @@ function Context:to_messages()
return msgs
end
-- Evict the oldest pair (user + assistant) while we exceed max_turns. Returns
-- total turns evicted. Caller is responsible for rendering the §8 status line.
-- Evict the oldest pair (user + assistant) while we exceed max_turns
-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted.
-- Caller is responsible for rendering the §8 status line.
--
-- R2 guard: when system_prompt alone exceeds token_budget, the OR
-- condition stays true even when turns are empty — would spin
-- forever calling table.remove on a 0-length list. The `and
-- #self.turns > 0` clause ensures we exit when there's nothing
-- left to evict. Over-budget system_prompts (large [project]
-- blocks, etc.) are then on the user to shrink via :tree off /
-- :memory clear / etc.
function Context:enforce_budget()
local evicted = 0
while #self.turns > self.max_turns do
while (#self.turns > self.max_turns
or self:estimate_tokens() > self.token_budget)
and #self.turns > 0 do
-- Collect evicted slice (pair: user + assistant)
local pair = {}
pair[#pair + 1] = self.turns[1]