test-case: tool-call sub-loop end-to-end #26

Closed
opened 2026-05-12 15:50:33 +00:00 by claude-noether · 1 comment
Collaborator

Steps

With boltzmann MCP connected (per TC1):

  1. Add boltzmann.list_dir to auto_approve so we skip the confirm prompt for THIS test (we test the confirm path in TC5 separately):
    auto_approve = { ["boltzmann.list_dir"] = true },
    
  2. Switch to a model that actually emits tool_calls — small models tend to fall back to CMD:. Either:
    • :model deep (qwen3-30b — slow but compliant), OR
    • :model cloud (anthropic/claude-haiku-4.5 via OpenRouter — fast and reliable)
  3. Type a tool-relevant prompt, ideally one that names the tool to nudge the model:
    Use the boltzmann.list_dir tool with path=/etc to show me what's in /etc.
    

Expected

  1. Model streams a brief intro (or none) and emits a tool_call.
  2. NO confirm prompt (auto_approve hit).
  3. A tool-call frame appears:
    ─── tool: boltzmann.list_dir ───
    {"path":"/etc"}
    <directory listing>
    ─── ok ───
    
  4. Model continues with text describing the result, then stops.
  5. :history shows: user → assistant(with tool_calls) → tool → assistant(text).

What this exercises

  • tools_schema() assembly
  • broker.chat_stream tool_call delta accumulator
  • safety.confirm_tool_call auto_approve hit path
  • dispatch_tool_call against live lmcp
  • flatten_content text-block concatenation
  • renderer.tool_call_begin + tool_call_end
  • Context append for assistant-with-tool_calls + role:"tool"
  • Sub-loop re-issuing broker with extended context
  • Sub-loop termination on pure-text response

Likely failure modes

  • Model emits CMD: boltzmann.list_dir /etc instead of a tool_call → not aish's bug, the small model isn't compliant with the OpenAI tool-calling protocol. Try :model deep or :model cloud.
  • Tool frame appears but model gets stuck in an infinite tool_call loop → max_tool_depth should cap it; verify the status [aish] tool-call depth limit reached (N); stopping sub-loop fires.
  • Tool result is empty / mangled → check flatten_content against actual response (lmcp returns content: [{type:"text", text: ...}]).
## Steps With boltzmann MCP connected (per TC1): 1. Add `boltzmann.list_dir` to `auto_approve` so we skip the confirm prompt for THIS test (we test the confirm path in TC5 separately): ```lua auto_approve = { ["boltzmann.list_dir"] = true }, ``` 2. Switch to a model that actually emits tool_calls — small models tend to fall back to `CMD:`. Either: - `:model deep` (qwen3-30b — slow but compliant), OR - `:model cloud` (anthropic/claude-haiku-4.5 via OpenRouter — fast and reliable) 3. Type a tool-relevant prompt, ideally one that names the tool to nudge the model: ``` Use the boltzmann.list_dir tool with path=/etc to show me what's in /etc. ``` ## Expected 1. Model streams a brief intro (or none) and emits a tool_call. 2. NO confirm prompt (auto_approve hit). 3. A tool-call frame appears: ``` ─── tool: boltzmann.list_dir ─── {"path":"/etc"} <directory listing> ─── ok ─── ``` 4. Model continues with text describing the result, then stops. 5. `:history` shows: user → assistant(with tool_calls) → tool → assistant(text). ## What this exercises - `tools_schema()` assembly - `broker.chat_stream` tool_call delta accumulator - `safety.confirm_tool_call` auto_approve hit path - `dispatch_tool_call` against live lmcp - `flatten_content` text-block concatenation - `renderer.tool_call_begin` + `tool_call_end` - Context append for assistant-with-tool_calls + role:"tool" - Sub-loop re-issuing broker with extended context - Sub-loop termination on pure-text response ## Likely failure modes - Model emits `CMD: boltzmann.list_dir /etc` instead of a tool_call → not aish's bug, the small model isn't compliant with the OpenAI tool-calling protocol. Try `:model deep` or `:model cloud`. - Tool frame appears but model gets stuck in an infinite tool_call loop → `max_tool_depth` should cap it; verify the status `[aish] tool-call depth limit reached (N); stopping sub-loop` fires. - Tool result is empty / mangled → check `flatten_content` against actual response (lmcp returns `content: [{type:"text", text: ...}]`).
claude-noether added the test-case label 2026-05-12 15:50:33 +00:00
Owner

(Full list available in the tool response. The directory contains typical system configuration files and directories. Would you like me to inspect a specific subdirectory or file?)
[aish:deep]> :history
[1] user: Use the boltzmann.list_dir tool with path=/etc to show me what's in /etc.
[2] assistant: Okay, the user wants to see what's in the /etc directory using the boltzmann.list_dir tool. Let me check the available functions. Looking at the tools provided, there's a function called boltzmann.list_dir. Its description says it lists directory contents, and the parameters include a path with a default of ".". The user specified path=/etc, so I need to call this function with path set to "/etc". I should structure the tool call correctly. The function name is boltzmann.list_dir, and the arguments should be {"path": "/etc"}. Let me make sure there are no typos. Yep, that's right. The default path is current directory, but the user wants /etc, so overriding that parameter. No other functions seem necessary here. The user isn't asking for anything else, just listing the directory. So the correct tool call is to use boltzmann.list_dir with the specified path.
[3] tool: alsa apparmor.d arch-release arptables.conf audisp audit avahi bash.bash_logout bash.bashrc bash_completion.d bindresvport.blacklist binfmt.d bluetooth bredos-dracut.conf ca-certificates cifs-utils colord conf.d credstore credstore.encrypted cryptsetup-keys.d crypttab cups daxctl.conf.d dconf debuginfod default depmod.d distcc dnsmasq.conf dracut.conf dracut.conf.d e2scrub.conf ebtables.conf environment environment.pacnew ethertypes fonts freetds fstab fuse.conf gai.conf gdb gnutls govctl gprofng.rc group group- group.pacnew grub.d gshadow gshadow- gshadow.pacnew gssapi_mech.conf gtk-3.0 healthd.conf host.conf hostname hosts ImageMagick-7 infiniband-diags inputrc iptables issue iwpmd.conf java-openjdk kernel keyutils krb5.conf ld.so.cache ld.so.conf ld.so.conf.d libao.conf libaudit.conf libblockdev libibverbs.d libinput libnl libpaper.d libva.conf libvirt lightdm locale.conf locale.gen locale.gen.pacnew localtime login.defs logrotate.d lsb-release lxc machine-id mailcap makepkg.conf makepkg.conf.d makepkg.conf.pacnew makepkg.d man_db.conf mc mdadm.conf mercurial mime.types mke2fs.conf mkinitcpio.conf mkinitcpio.conf.bak mkinitcpio.d ModemManager modprobe.d modules-load.d mtab mtools.conf nanorc ndctl ndctl.conf.d netconfig NetworkManager nftables.conf nginx nilfs_cleanerd.conf nohang nsswitch.conf ODBCDataSources odbc.ini odbcinst.ini OpenCL openldap os-release pacman.conf pacman.conf.bak pacman.conf.pacnew pacman.d pam.d papersize paperspecs passwd passwd- passwd.pacnew pipewire pkcs11 polkit-1 ppp profile profile.d protocols pulse qemu rc_keymaps rc_maps.cfg rdma request-key.conf request-key.d resolv.conf rhashrc rpc rsyncd.conf sasl2 screenrc sddm.conf securetty security sensors3.conf sensors.d services shadow shadow- shadow.pacnew shells signond.conf skel slsh.rc smartd.conf srp_daemon.conf ssh ssl subgid subgid- subuid subuid- sudo.conf sudoers sudoers.d sudo_logsrvd.conf sysctl.d systemd tmpfiles.d tpm2-tss ts.conf u-boot-menu udev udisks2 UPower user-tmpfiles.d vconsole.conf vde2 vdpau_wrapper.cfg vulkan wgetrc wpa_supplicant X11 xattr.conf xdg xml
[4] assistant: Okay, the user asked to list the contents of /etc using the boltzmann.list_dir tool. I made the tool call, and the response came back with a long list of directories and files. Now I need to present this information to the user in a clear way. First, I should check if the list is complete. The response seems to have a lot of entries, but maybe some are truncated. The user might want a summary or specific details. However, the user's request was straightforward—just to show what's in /etc. Since the tool's response is already provided, I should format it nicely. The original list is in a code block, so I'll keep it that way. Maybe add a note that the list is extensive and suggest specific subdirectories if needed. The user might not need all the details, so offering further assistance could be helpful. I should also make sure there are no errors in the response. The list looks normal for an /etc directory, which typically contains configuration files. No obvious issues here. Finally, wrap up by confirming the action was completed and ask if they need more information. That way, the user knows the task is done and can ask for more if needed. Here's the contents of /etc: alsa apparmor.d arch-release ... (Full list available in the tool response. The directory contains typical system configuration files and directories. Would you like me to inspect a specific subdirectory or file?)

*(Full list available in the tool response. The directory contains typical system configuration files and directories. Would you like me to inspect a specific subdirectory or file?)* [aish:deep]> :history [1] user: Use the boltzmann.list_dir tool with path=/etc to show me what's in /etc. [2] assistant: <think> Okay, the user wants to see what's in the /etc directory using the boltzmann.list_dir tool. Let me check the available functions. Looking at the tools provided, there's a function called boltzmann.list_dir. Its description says it lists directory contents, and the parameters include a path with a default of ".". The user specified path=/etc, so I need to call this function with path set to "/etc". I should structure the tool call correctly. The function name is boltzmann.list_dir, and the arguments should be {"path": "/etc"}. Let me make sure there are no typos. Yep, that's right. The default path is current directory, but the user wants /etc, so overriding that parameter. No other functions seem necessary here. The user isn't asking for anything else, just listing the directory. So the correct tool call is to use boltzmann.list_dir with the specified path. </think> [3] tool: alsa apparmor.d arch-release arptables.conf audisp audit avahi bash.bash_logout bash.bashrc bash_completion.d bindresvport.blacklist binfmt.d bluetooth bredos-dracut.conf ca-certificates cifs-utils colord conf.d credstore credstore.encrypted cryptsetup-keys.d crypttab cups daxctl.conf.d dconf debuginfod default depmod.d distcc dnsmasq.conf dracut.conf dracut.conf.d e2scrub.conf ebtables.conf environment environment.pacnew ethertypes fonts freetds fstab fuse.conf gai.conf gdb gnutls govctl gprofng.rc group group- group.pacnew grub.d gshadow gshadow- gshadow.pacnew gssapi_mech.conf gtk-3.0 healthd.conf host.conf hostname hosts ImageMagick-7 infiniband-diags inputrc iptables issue iwpmd.conf java-openjdk kernel keyutils krb5.conf ld.so.cache ld.so.conf ld.so.conf.d libao.conf libaudit.conf libblockdev libibverbs.d libinput libnl libpaper.d libva.conf libvirt lightdm locale.conf locale.gen locale.gen.pacnew localtime login.defs logrotate.d lsb-release lxc machine-id mailcap makepkg.conf makepkg.conf.d makepkg.conf.pacnew makepkg.d man_db.conf mc mdadm.conf mercurial mime.types mke2fs.conf mkinitcpio.conf mkinitcpio.conf.bak mkinitcpio.d ModemManager modprobe.d modules-load.d mtab mtools.conf nanorc ndctl ndctl.conf.d netconfig NetworkManager nftables.conf nginx nilfs_cleanerd.conf nohang nsswitch.conf ODBCDataSources odbc.ini odbcinst.ini OpenCL openldap os-release pacman.conf pacman.conf.bak pacman.conf.pacnew pacman.d pam.d papersize paperspecs passwd passwd- passwd.pacnew pipewire pkcs11 polkit-1 ppp profile profile.d protocols pulse qemu rc_keymaps rc_maps.cfg rdma request-key.conf request-key.d resolv.conf rhashrc rpc rsyncd.conf sasl2 screenrc sddm.conf securetty security sensors3.conf sensors.d services shadow shadow- shadow.pacnew shells signond.conf skel slsh.rc smartd.conf srp_daemon.conf ssh ssl subgid subgid- subuid subuid- sudo.conf sudoers sudoers.d sudo_logsrvd.conf sysctl.d systemd tmpfiles.d tpm2-tss ts.conf u-boot-menu udev udisks2 UPower user-tmpfiles.d vconsole.conf vde2 vdpau_wrapper.cfg vulkan wgetrc wpa_supplicant X11 xattr.conf xdg xml [4] assistant: <think> Okay, the user asked to list the contents of /etc using the boltzmann.list_dir tool. I made the tool call, and the response came back with a long list of directories and files. Now I need to present this information to the user in a clear way. First, I should check if the list is complete. The response seems to have a lot of entries, but maybe some are truncated. The user might want a summary or specific details. However, the user's request was straightforward—just to show what's in /etc. Since the tool's response is already provided, I should format it nicely. The original list is in a code block, so I'll keep it that way. Maybe add a note that the list is extensive and suggest specific subdirectories if needed. The user might not need all the details, so offering further assistance could be helpful. I should also make sure there are no errors in the response. The list looks normal for an /etc directory, which typically contains configuration files. No obvious issues here. Finally, wrap up by confirming the action was completed and ask if they need more information. That way, the user knows the task is done and can ask for more if needed. </think> Here's the contents of `/etc`: ``` alsa apparmor.d arch-release ... ``` *(Full list available in the tool response. The directory contains typical system configuration files and directories. Would you like me to inspect a specific subdirectory or file?)*
Sign in to join this conversation.