enforce_budget runs only post-loop in run_norris; mid-loop eviction never fires (PHASE3 spec gap) #51

Closed
opened 2026-05-13 17:25:26 +00:00 by claude-noether · 0 comments
Collaborator

Surfaced during TC #38 validation run (2026-05-13)

Qwen3-30B-A3B Norris session with max_turns=4 did trigger eviction ([aish] oldest 4 turns evicted) — but the status line appears AFTER ─── NORRIS DONE ── in the run output (line 465 vs 462 in /tmp/aish-p38/run.log), confirming that enforce_budget only runs once run_norris exits the planning loop.

Spec vs implementation

docs/PHASE3.md §2 row "Context budgeting under Norris":

Same max_turns and token_budget as interactive. Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long.

repl.lua's run_norris:

while true do
    local result = safety.norris_step(ctx, active_cfg, helpers, {...})
    if result.status == "continue" then
        step_n = step_n + 1
    else
        ...
        break
    end
end
...
status_evictions(ctx:enforce_budget())  -- ONLY HERE — post-loop

During the loop, turns accumulate without budget enforcement. For long Norris sessions on a tight max_turns (e.g. 4-6), this means the model sees the FULL conversation including all tool results — defeating the purpose of the sliding window for memory management AND preventing R-C3 (goal anchor in NORRIS suffix surviving mid-loop eviction) from being exercised end-to-end.

Repro

The TC #38 run (Qwen3-30B-A3B, max_turns=4, 3 list_dir tool_calls): all 4 Norris steps completed with full context. Eviction status appeared post-NORRIS-DONE.

Fix sketch

Option A: run_norris calls ctx:enforce_budget() after each safety.norris_step return (before the next iteration). Status line fires mid-loop. R-C3 anchor survival becomes exercisable.

Option B: safety.norris_step itself calls enforce_budget at its end before returning. Same effect; cleaner separation (norris_step owns iteration accounting).

Option B aligns better with the existing consecutive_user_skips state already on ctx that norris_step manages.

Why this didn't block #38 closure

#38 wanted to validate Qwen3-30B-A3B's tool-emission compatibility for sustained sessions. That question is unambiguously resolved (clean tool_calls protocol, no markdown drift across 4 iterations, parallel + sequential tool_call shapes both handled). The mid-loop eviction issue is orthogonal — it would be present regardless of the model used.

Label: bug (PHASE3 spec gap), architecture (cross-phase context budgeting interaction).

## Surfaced during TC #38 validation run (2026-05-13) Qwen3-30B-A3B Norris session with `max_turns=4` did trigger eviction (`[aish] oldest 4 turns evicted`) — but the status line appears AFTER `─── NORRIS DONE ──` in the run output (line 465 vs 462 in `/tmp/aish-p38/run.log`), confirming that `enforce_budget` only runs once `run_norris` exits the planning loop. ## Spec vs implementation `docs/PHASE3.md` §2 row "Context budgeting under Norris": > Same `max_turns` and `token_budget` as interactive. Sliding window evicts oldest non-system turns when budget exceeded — **including mid-Norris-session if the loop runs long**. `repl.lua`'s `run_norris`: ```lua while true do local result = safety.norris_step(ctx, active_cfg, helpers, {...}) if result.status == "continue" then step_n = step_n + 1 else ... break end end ... status_evictions(ctx:enforce_budget()) -- ONLY HERE — post-loop ``` During the loop, turns accumulate without budget enforcement. For long Norris sessions on a tight `max_turns` (e.g. 4-6), this means the model sees the FULL conversation including all tool results — defeating the purpose of the sliding window for memory management AND preventing R-C3 (goal anchor in NORRIS suffix surviving mid-loop eviction) from being exercised end-to-end. ## Repro The TC #38 run (Qwen3-30B-A3B, max_turns=4, 3 list_dir tool_calls): all 4 Norris steps completed with full context. Eviction status appeared post-NORRIS-DONE. ## Fix sketch Option A: `run_norris` calls `ctx:enforce_budget()` after each `safety.norris_step` return (before the next iteration). Status line fires mid-loop. R-C3 anchor survival becomes exercisable. Option B: `safety.norris_step` itself calls `enforce_budget` at its end before returning. Same effect; cleaner separation (norris_step owns iteration accounting). Option B aligns better with the existing `consecutive_user_skips` state already on `ctx` that `norris_step` manages. ## Why this didn't block #38 closure #38 wanted to validate Qwen3-30B-A3B's tool-emission compatibility for sustained sessions. That question is unambiguously resolved (clean tool_calls protocol, no markdown drift across 4 iterations, parallel + sequential tool_call shapes both handled). The mid-loop eviction issue is orthogonal — it would be present regardless of the model used. Label: bug (PHASE3 spec gap), architecture (cross-phase context budgeting interaction).
claude-noether added the bug label 2026-05-13 17:25:26 +00:00
Sign in to join this conversation.