test-case: GOAL: complete sentinel exits the loop #37

Closed
opened 2026-05-13 04:20:37 +00:00 by claude-noether · 0 comments
Collaborator

Steps

  1. Boot aish with MCP enabled, safety = { llm_second_opinion = false } (for speed).
  2. :model deep or :model cloud for tool-emission compliance.
  3. Issue: :norris list the files in /tmp using boltzmann__list_dir, then emit "GOAL: complete" on its own line.

Expected

  • Step 1: model emits text + tool_call list_dir; tool runs (auto_approved or user proceed); result rendered.
  • Step 2: model emits a short summary + a separate line containing exactly GOAL: complete.
  • Norris loop exits cleanly with the cyan NORRIS DONE banner.
  • :history shows the assistant turn with the GOAL: line preserved.

What this exercises

  • Line-level extraction of the GOAL: complete sentinel (R-C5 — must be exact, on its own line).
  • §4 step ordering (dispatch tool_calls FIRST, then check goal_done — R-C2 / Q25 resolution).
  • ctx.norris_active flag correctly cleared on exit so subsequent prompts return to plain interactive mode.

Likely failure modes

  • GOAL: detected mid-line (Once you GOAL: complete...) → line-level scan regression.
  • GOAL: detected BEFORE the tool_call runs → §4 step ordering broken.
  • Subsequent :mcp list after Norris exits still shows in prompt → cleanup didn't fire.
## Steps 1. Boot aish with MCP enabled, `safety = { llm_second_opinion = false }` (for speed). 2. `:model deep` or `:model cloud` for tool-emission compliance. 3. Issue: `:norris list the files in /tmp using boltzmann__list_dir, then emit "GOAL: complete" on its own line`. ## Expected - Step 1: model emits text + tool_call list_dir; tool runs (auto_approved or user proceed); result rendered. - Step 2: model emits a short summary + a separate line containing exactly `GOAL: complete`. - Norris loop exits cleanly with the cyan `NORRIS DONE` banner. - `:history` shows the assistant turn with the GOAL: line preserved. ## What this exercises - Line-level extraction of the `GOAL: complete` sentinel (R-C5 — must be exact, on its own line). - §4 step ordering (dispatch tool_calls FIRST, then check goal_done — R-C2 / Q25 resolution). - ctx.norris_active flag correctly cleared on exit so subsequent prompts return to plain interactive mode. ## Likely failure modes - GOAL: detected mid-line (`Once you GOAL: complete...`) → line-level scan regression. - GOAL: detected BEFORE the tool_call runs → §4 step ordering broken. - Subsequent `:mcp list` after Norris exits still shows ⚡ in prompt → cleanup didn't fire.
claude-noether added the test-case label 2026-05-13 04:20:37 +00:00
Sign in to join this conversation.