test-case: [background] suppressed in Norris mode #43

Closed
opened 2026-05-13 08:21:39 +00:00 by claude-noether · 1 comment
Collaborator

Steps

  1. Boot aish with memory items (use prior test data or :remember X first).
  2. Run a normal query — observe that the system prompt should include the [background] block (you'll need to instrument or trust the design).
  3. :norris find files in /tmp (use a model that emits real tool_calls).
  4. While Norris is running, observe the model's awareness — does it reference background items? It should NOT (the [background] block is suppressed under Norris).
  5. After Norris exits (DONE / aborted / budget), the next interactive query should again include [background].

Expected

  • Norris-mode broker calls have a shorter system prompt (NORRIS suffix instead of [background] suffix).
  • Resumed interactive mode after Norris ends → [background] returns.
  • No crash, no token-budget overflow.

What this exercises

  • R-C1 fold-in: context.to_messages() suppresses background when ctx.norris_active.
  • Token-budget hygiene over long Norris sessions.

Likely failure modes

  • Norris steps run slow because the system prompt is bloated → R-C1 didn't take; check if not self.norris_active then guard in to_messages.
  • After Norris exits, [background] doesn't return → ctx.norris_active not cleared properly.

How to verify the suppression empirically

Temporarily add a print of #sys_content in context.to_messages() and compare per-iteration sizes during Norris vs interactive.

## Steps 1. Boot aish with memory items (use prior test data or `:remember X` first). 2. Run a normal query — observe that the system prompt should include the [background] block (you'll need to instrument or trust the design). 3. `:norris find files in /tmp` (use a model that emits real tool_calls). 4. While Norris is running, observe the model's awareness — does it reference background items? It should NOT (the [background] block is suppressed under Norris). 5. After Norris exits (DONE / aborted / budget), the next interactive query should again include [background]. ## Expected - Norris-mode broker calls have a shorter system prompt (NORRIS suffix instead of [background] suffix). - Resumed interactive mode after Norris ends → [background] returns. - No crash, no token-budget overflow. ## What this exercises - R-C1 fold-in: context.to_messages() suppresses background when ctx.norris_active. - Token-budget hygiene over long Norris sessions. ## Likely failure modes - Norris steps run slow because the system prompt is bloated → R-C1 didn't take; check `if not self.norris_active then` guard in to_messages. - After Norris exits, [background] doesn't return → ctx.norris_active not cleared properly. ## How to verify the suppression empirically Temporarily add a print of `#sys_content` in context.to_messages() and compare per-iteration sizes during Norris vs interactive.
claude-noether added the test-case label 2026-05-13 08:21:39 +00:00
Author
Collaborator

PASS structural (autonomous run, 2026-05-13). Direct probe of context:to_messages() confirms [background] block IS present in interactive mode (ctx.norris_active=false) and ABSENT when ctx.norris_active=true. NORRIS suffix correctly present in Norris mode. R-C1 suppression honored.

Real-model verification of 'planner doesn't reference background items' is subjective — would still benefit from your eyes during a real Norris run. Closing on the structural pass; reopen if you observe the model leaking background context mid-plan.

**PASS structural** (autonomous run, 2026-05-13). Direct probe of `context:to_messages()` confirms [background] block IS present in interactive mode (`ctx.norris_active=false`) and ABSENT when `ctx.norris_active=true`. NORRIS suffix correctly present in Norris mode. R-C1 suppression honored. Real-model verification of 'planner doesn't reference background items' is subjective — would still benefit from your eyes during a real Norris run. Closing on the structural pass; reopen if you observe the model leaking background context mid-plan.
Sign in to join this conversation.