context: enforce_budget honors token_budget + R2 guard (Phase 8 commit #3)
Pillar 5 (analyze finding A1) — the real value-add of Phase 8. Until now, ctx.token_budget = 4096 was set but never enforced; enforce_budget only looked at max_turns. With commit #2's accurate tokenization wired in (via commit #4), eviction now finally fires when the actual context fills the budget. Loop condition change: before: while #self.turns > self.max_turns do after: while (#self.turns > self.max_turns or self:estimate_tokens() > self.token_budget) and #self.turns > 0 do R2 guard: the `and #self.turns > 0` clause is essential. When system_prompt alone exceeds token_budget (e.g. a 5000-token [project] block with token_budget=4096), the OR-condition stays true even when turns are empty — table.remove on a 0-length list would no-op forever while evicted++ spins. Sonnet review caught this; without the guard, real users could hit an infinite loop just by setting a small token_budget + opening a large project tree. Per-pair eviction logic (summarize callback + pair-pop) inside the loop is unchanged. The estimate_tokens call is potentially expensive under tokenize_fn — commit #2's per-turn cache amortizes to O(N) per iteration after first fill; for max_turns=40 + budget=4096 sessions the worst case is microseconds per call. Unit-verified across 5 cases (with and without tokenize_fn): 1. max_turns eviction unchanged (no behavior regression). 2. char/4 path: tight budget evicts to 0 when sys > budget, exits via R2 guard. 3. char/4 path: practical budget evicts to a stable count. 4. tokenize_fn stub: evicts to exactly the (budget - sys)/per-turn count. 5. R2 critical: zero turns + oversize sys -> immediate exit, evicted=0, no spin. Behavior change for existing users: a session that fit under token_budget=4096 by char/4 (~16K chars) may now evict earlier because accurate counts are HIGHER for most natural-text inputs (per baseline B2). Users on cloud presets with very large context windows (Claude 200K) should raise token_budget to match — see §9 risk row in PHASE8.md. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+14
-3
@@ -319,11 +319,22 @@ function Context:to_messages()
|
|||||||
return msgs
|
return msgs
|
||||||
end
|
end
|
||||||
|
|
||||||
-- Evict the oldest pair (user + assistant) while we exceed max_turns. Returns
|
-- Evict the oldest pair (user + assistant) while we exceed max_turns
|
||||||
-- total turns evicted. Caller is responsible for rendering the §8 status line.
|
-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted.
|
||||||
|
-- Caller is responsible for rendering the §8 status line.
|
||||||
|
--
|
||||||
|
-- R2 guard: when system_prompt alone exceeds token_budget, the OR
|
||||||
|
-- condition stays true even when turns are empty — would spin
|
||||||
|
-- forever calling table.remove on a 0-length list. The `and
|
||||||
|
-- #self.turns > 0` clause ensures we exit when there's nothing
|
||||||
|
-- left to evict. Over-budget system_prompts (large [project]
|
||||||
|
-- blocks, etc.) are then on the user to shrink via :tree off /
|
||||||
|
-- :memory clear / etc.
|
||||||
function Context:enforce_budget()
|
function Context:enforce_budget()
|
||||||
local evicted = 0
|
local evicted = 0
|
||||||
while #self.turns > self.max_turns do
|
while (#self.turns > self.max_turns
|
||||||
|
or self:estimate_tokens() > self.token_budget)
|
||||||
|
and #self.turns > 0 do
|
||||||
-- Collect evicted slice (pair: user + assistant)
|
-- Collect evicted slice (pair: user + assistant)
|
||||||
local pair = {}
|
local pair = {}
|
||||||
pair[#pair + 1] = self.turns[1]
|
pair[#pair + 1] = self.turns[1]
|
||||||
|
|||||||
Reference in New Issue
Block a user