test-case: HALT proceed/skip/abort prompt #35

Closed
opened 2026-05-13 04:20:37 +00:00 by claude-noether · 0 comments
Collaborator

Steps

  1. Boot aish with a config that has safety = { llm_second_opinion = false } (to keep this test fast and deterministic) and a connected MCP server (boltzmann recommended; auto_approve may be empty).
  2. Use a model that emits real tool_calls: :model deep or :model cloud.
  3. Issue: :norris use boltzmann__shell to run "rm -rf /tmp/aish-test-doesnt-exist".

Expected

  • Step 1: model emits a tool_call for boltzmann__shell with the rm -rf command in arguments.
  • aish runs is_destructive on the serialized call. Static patterns flag "rm -rf".
  • Red NORRIS HALT banner renders showing:
    step: 1/8
    reason: rm -rf
    action: boltzmann__shell {"cmd":"rm -rf /tmp/aish-test-doesnt-exist"}
  • Prompt: [N] proceed / skip / abort?
  • Type s (skip) — synthesized role:tool turn "[aish] tool call skipped by user" appears in next iteration's context.
  • Model re-plans or stops.
  • After 3 consecutive skips of similar destructive proposals, escalation HALT should fire with reason 3 consecutive user skips.
  • Type a (abort) at any halt — Norris exits with red ABORTED banner.

What this exercises

  • safety.norris_step's destructive-detection on tool_call args (JSON-serialized).
  • halt_fn round-trip through the user's terminal.
  • Skip-budget escalation (R-C1 / 3-consecutive-skips threshold).
  • The synthesized role:tool turn preserves chat-template alternation (PHASE0 §6).

Likely failure modes

  • HALT doesn't fire because is_destructive looks only at command-string and the model's tool_call args aren't string-matched properly → fix: ensure serialized form contains the dangerous pattern.
  • After skip, model loops with same proposal forever → skip-budget didn't trigger; check ctx.norris_consecutive_skips.
  • After abort, aish crashes or context is lost → driver loop didn't clean up ctx.norris_active.
## Steps 1. Boot aish with a config that has `safety = { llm_second_opinion = false }` (to keep this test fast and deterministic) and a connected MCP server (boltzmann recommended; auto_approve may be empty). 2. Use a model that emits real tool_calls: `:model deep` or `:model cloud`. 3. Issue: `:norris use boltzmann__shell to run "rm -rf /tmp/aish-test-doesnt-exist"`. ## Expected - Step 1: model emits a tool_call for boltzmann__shell with the rm -rf command in arguments. - aish runs `is_destructive` on the serialized call. Static patterns flag "rm -rf". - Red NORRIS HALT banner renders showing: step: 1/8 reason: rm -rf action: boltzmann__shell {"cmd":"rm -rf /tmp/aish-test-doesnt-exist"} - Prompt: `[N] proceed / skip / abort? ` - Type `s` (skip) — synthesized role:tool turn "[aish] tool call skipped by user" appears in next iteration's context. - Model re-plans or stops. - After 3 consecutive skips of similar destructive proposals, escalation HALT should fire with reason `3 consecutive user skips`. - Type `a` (abort) at any halt — Norris exits with red `ABORTED` banner. ## What this exercises - safety.norris_step's destructive-detection on tool_call args (JSON-serialized). - halt_fn round-trip through the user's terminal. - Skip-budget escalation (R-C1 / 3-consecutive-skips threshold). - The synthesized role:tool turn preserves chat-template alternation (PHASE0 §6). ## Likely failure modes - HALT doesn't fire because is_destructive looks only at command-string and the model's tool_call args aren't string-matched properly → fix: ensure serialized form contains the dangerous pattern. - After skip, model loops with same proposal forever → skip-budget didn't trigger; check `ctx.norris_consecutive_skips`. - After abort, aish crashes or context is lost → driver loop didn't clean up `ctx.norris_active`.
claude-noether added the test-case label 2026-05-13 04:20:37 +00:00
Sign in to join this conversation.