test-case: CMD: extraction still works alongside MCP (substrate invariant) #31

Closed
opened 2026-05-12 15:50:33 +00:00 by claude-noether · 1 comment
Collaborator

Steps

  1. Boot aish with MCP configured (any setup).
  2. Use :model fast (qwen-1.5b — happens to prefer CMD:).
  3. Ask a shell-natural question: What's in /tmp? Use a shell command.
  4. Expected: model emits a CMD: ls /tmp line. Confirm prompt appears.
  5. Answer y.
  6. Observe exec output.
  7. Now ask tell me one fun fact — purely conversational, no CMD: or tool.

Expected

  • Step 5: shell exec frame renders, output shown.
  • The exec output is captured into pending_exec_output and prepended to the next user turn (PHASE0.md §6 behavior).
  • Step 7: model responds with text only, no CMD: extraction triggers, no tool_call.
  • The mcp block being present in config does NOT change Phase 0/1 behavior when the model emits a CMD: line.

What this exercises

  • PHASE0.md §3 substrate invariant: CMD: extraction marker is unchanged.
  • The fact that tools field being sent in the request body doesn't suppress CMD: extraction on the response.
  • pending_exec_output × tool-sub-loop interaction (in this test the sub-loop doesn't run, but the buffer mechanism remains valid).

Likely failure modes

  • CMD: line missing the confirm prompt → executor.extract_cmd_lines regression.
  • exec output not flushed into the next user turn → context.append_user broke.
  • Model emits BOTH a CMD: AND a tool_call → §2 "both stay live" — should work but verify both are exercised (CMD: confirm prompt + tool-call frame).
## Steps 1. Boot aish with MCP configured (any setup). 2. Use `:model fast` (qwen-1.5b — happens to prefer CMD:). 3. Ask a shell-natural question: `What's in /tmp? Use a shell command.` 4. Expected: model emits a `CMD: ls /tmp` line. Confirm prompt appears. 5. Answer `y`. 6. Observe exec output. 7. Now ask `tell me one fun fact` — purely conversational, no CMD: or tool. ## Expected - Step 5: shell exec frame renders, output shown. - The exec output is captured into `pending_exec_output` and prepended to the next user turn (PHASE0.md §6 behavior). - Step 7: model responds with text only, no CMD: extraction triggers, no tool_call. - The `mcp` block being present in config does NOT change Phase 0/1 behavior when the model emits a `CMD:` line. ## What this exercises - PHASE0.md §3 substrate invariant: `CMD:` extraction marker is unchanged. - The fact that `tools` field being sent in the request body doesn't suppress `CMD:` extraction on the response. - `pending_exec_output` × tool-sub-loop interaction (in this test the sub-loop doesn't run, but the buffer mechanism remains valid). ## Likely failure modes - `CMD:` line missing the confirm prompt → executor.extract_cmd_lines regression. - exec output not flushed into the next user turn → context.append_user broke. - Model emits BOTH a CMD: AND a tool_call → §2 "both stay live" — should work but verify both are exercised (CMD: confirm prompt + tool-call frame).
claude-noether added the test-case label 2026-05-12 15:50:33 +00:00
Owner

 mfritsche   aish   main ≡  x1  x1    luajit main.lua
aish: loaded config from ./config.lua
[aish] mcp boltzmann: 7 tools
[aish:fast]> What's in /tmp? Use a shell command.
CMD: ls /tmp
execute 'ls /tmp'? [y/N] y
─── exec output ───
bun-node-af24e281e lua_8aU6vs lua_EH242t lua_JulJCf lua_sphLvC lua_zedBQy
distccd.log lua_9aYvbH lua_eKpkmp lua_K7c8Ps lua_STOTmr lua_ZOYbQh
kdeconnect_mfritsche lua_9OM9UZ lua_EQwy2G lua_L9v5fa lua_Su9nP0 org.chromium.Chromium.WDaszS
lua_1Dx0qw lua_9SV3Kq lua_fedX7d lua_lkMEje lua_szRydn plasma-csd-generator.QtqaUp
lua_1GFHJ4 lua_As2YR2 lua_FfmqnA lua_lkWHcG lua_T8w7pJ pthread_probe.st
lua_1OlCgG lua_aSxEge lua_FsgDJj lua_lY83be lua_TCmR6m pthread_size
lua_28ItXM lua_AvMZVn lua_gP3yyh lua_mCTxpj lua_tNBgz4 pthread_size.c
lua_3pyg15 lua_AzL15H lua_gxFTma lua_mWVnxf lua_uLPWAA sddm-:0-lbOHaP
lua_49zt0i lua_bGc1rm lua_H2G7uq lua_oPM50a lua_vCtpXy sddm-auth-94101db5-7824-43ec-8a7a-43edb009e924
lua_4AN40L lua_bkPEFT lua_hDTkal lua_QoBCSc lua_VnASHc shimprobe
lua_5j1PZL lua_CI0dYg lua_HIOWGH lua_qRsXM7 lua_WSm4tQ systemd-private-bb5322f80b184a4c834e72267574e866-bluetooth.service-BV0ZrW
lua_6bKuQw lua_CjXFdp lua_HNa1zB lua_QXsLXD lua_WwQGo3 systemd-private-bb5322f80b184a4c834e72267574e866-polkit.service-d5rMlr
lua_6g10JR lua_CWzxpl lua_i6aG00 lua_ragU6G lua_WY3mcU systemd-private-bb5322f80b184a4c834e72267574e866-systemd-logind.service-k9eGM7
lua_7I0H7o lua_DEDQ3N lua_IF2uT9 lua_rnb9Vj lua_Xgm8C6 systemd-private-bb5322f80b184a4c834e72267574e866-upower.service-vvkM8v
lua_7QmMyR lua_Dofk8z lua_IxfWtj lua_RWT5dt lua_xIWvhF
lua_7TTGhk lua_efWRy2 lua_J4Czeo lua_sGZuj8 lua_YbyPH7
─── exit 0 ───
[aish:fast]> tell me one fun fact
CMD: echo "The most common file extension in /tmp is .log."
execute 'echo "The most common file extension in /tmp is .log."'? [y/N] y
─── exec output ───
The most common file extension in /tmp is .log.
─── exit 0 ───

 mfritsche   aish   main ≡  x1  x1    luajit main.lua aish: loaded config from ./config.lua [aish] mcp boltzmann: 7 tools [aish:fast]> What's in /tmp? Use a shell command. CMD: ls /tmp execute 'ls /tmp'? [y/N] y ─── exec output ─── bun-node-af24e281e lua_8aU6vs lua_EH242t lua_JulJCf lua_sphLvC lua_zedBQy distccd.log lua_9aYvbH lua_eKpkmp lua_K7c8Ps lua_STOTmr lua_ZOYbQh kdeconnect_mfritsche lua_9OM9UZ lua_EQwy2G lua_L9v5fa lua_Su9nP0 org.chromium.Chromium.WDaszS lua_1Dx0qw lua_9SV3Kq lua_fedX7d lua_lkMEje lua_szRydn plasma-csd-generator.QtqaUp lua_1GFHJ4 lua_As2YR2 lua_FfmqnA lua_lkWHcG lua_T8w7pJ pthread_probe.st lua_1OlCgG lua_aSxEge lua_FsgDJj lua_lY83be lua_TCmR6m pthread_size lua_28ItXM lua_AvMZVn lua_gP3yyh lua_mCTxpj lua_tNBgz4 pthread_size.c lua_3pyg15 lua_AzL15H lua_gxFTma lua_mWVnxf lua_uLPWAA sddm-:0-lbOHaP lua_49zt0i lua_bGc1rm lua_H2G7uq lua_oPM50a lua_vCtpXy sddm-auth-94101db5-7824-43ec-8a7a-43edb009e924 lua_4AN40L lua_bkPEFT lua_hDTkal lua_QoBCSc lua_VnASHc shimprobe lua_5j1PZL lua_CI0dYg lua_HIOWGH lua_qRsXM7 lua_WSm4tQ systemd-private-bb5322f80b184a4c834e72267574e866-bluetooth.service-BV0ZrW lua_6bKuQw lua_CjXFdp lua_HNa1zB lua_QXsLXD lua_WwQGo3 systemd-private-bb5322f80b184a4c834e72267574e866-polkit.service-d5rMlr lua_6g10JR lua_CWzxpl lua_i6aG00 lua_ragU6G lua_WY3mcU systemd-private-bb5322f80b184a4c834e72267574e866-systemd-logind.service-k9eGM7 lua_7I0H7o lua_DEDQ3N lua_IF2uT9 lua_rnb9Vj lua_Xgm8C6 systemd-private-bb5322f80b184a4c834e72267574e866-upower.service-vvkM8v lua_7QmMyR lua_Dofk8z lua_IxfWtj lua_RWT5dt lua_xIWvhF lua_7TTGhk lua_efWRy2 lua_J4Czeo lua_sGZuj8 lua_YbyPH7 ─── exit 0 ─── [aish:fast]> tell me one fun fact CMD: echo "The most common file extension in /tmp is .log." execute 'echo "The most common file extension in /tmp is .log."'? [y/N] y ─── exec output ─── The most common file extension in /tmp is .log. ─── exit 0 ───
Sign in to join this conversation.