docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs

Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:

BLOCKERs:
  B1 context.lua impact widened — Phase 1 :append asserts content and
     discards extra fields. Need (a) shape-per-role assert, (b) preserve
     tool_calls/tool_call_id on store, (c) emit from to_messages().
  B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
     401 returns a non-JSON-RPC body that would have been mis-decoded.
  B3 §3 typo schema -> inputSchema.
  B4 pending_exec_output × tool-call sub-loop interaction specified.
  B5 §3/§12 broker dependency contradiction — broker takes opts.tools
     from caller; no layering inversion.

CONCERNs:
  C1 M.chat return polymorphism dropped (no consumer).
  C2 tool_calls[].index absent fallback: default to 0.
  C3 Re-injection stores accumulated text, not hard-coded empty.
  C4 :mcp connect failure: no auto-retry, status-log once.
  C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
     role:"tool" turn — keeps strict-template alternation legal
     exactly the way PHASE0 §6 demanded for exec output.
  C6 §9 confirms §4 amendment is additive (preserves §3 invariant).

NITs:
  N3 protocolVersion fallback (lmcp doesn't negotiate).
  N4 Alternation assert in Context:append.
  N7 Model-routing bug filed as aish#23.
  N8 Day-one fallback test for use_tool_role=false in commit #3.

Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 13:00:07 +00:00
parent 447e430254
commit d3570ccea4
+151 -66
View File
@@ -2,7 +2,7 @@
**Project:** aish — AI-augmented conversational shell **Project:** aish — AI-augmented conversational shell
**Document:** Phase 2 Requirements, Architecture & Design Decisions **Document:** Phase 2 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; live-probed against lmcp v0.5.4 + hossenfelder proxy) **Status:** Plan (review pass folded in 2026-05-12)
**Date:** 2026-05-12 **Date:** 2026-05-12
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
@@ -50,7 +50,7 @@ Three pillars per PHASE0.md §11 row 2:
| Decision | Choice | Rationale | | Decision | Choice | Rationale |
|---|---|---| |---|---|---|
| MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. | | MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION`. aish sends the same in `initialize`; future model bumps are negotiated at connect time. | | MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. | | MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. | | Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. | | Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
@@ -67,14 +67,14 @@ Three pillars per PHASE0.md §11 row 2:
| File | State after Phase 1 | Phase 2 changes | | File | State after Phase 1 | Phase 2 changes |
|---|---|---| |---|---|---|
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, schema}]`, `session:call_tool(name, args) -> result | tool_error`, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. | | `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, inputSchema}]`, `session:call_tool(name, args) -> (result_table, kind)` where `kind ∈ {"ok","handler_error","rpc_error"}` so callers can route the response per §4's error split, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's `401 {"error":"unauthorized"}` body, which is NOT JSON-RPC-shaped — has no `jsonrpc`/`id` fields) from JSON-RPC envelope errors; needs `ffi/curl.M.post` extended to return status code (see ffi/curl.lua row). |
| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. | | `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Request body grows `tools = mcp.tools_schema()` (assembled from all connected sessions). on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`; tool_call payload includes id+name+arguments-delta. `M.chat` wrapper updates to buffer both. | | `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Signature widens to `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional array of `{type, function:{name, description, parameters}}`) is passed through to the request body; **omitted entirely if absent or empty** (some servers reject `"tools": []`). The on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`. **`broker.lua` does NOT depend on `mcp.lua`** — repl assembles the tools array and passes it in; broker stays a transport layer. `M.chat` (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it). |
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output` | New role: `"tool"`. Assistant turns may carry `tool_calls = [{id, name, arguments}]`. `to_messages()` flattens these into OpenAI-shape messages. Alternation rules: assistant-with-tool_calls is followed by N tool turns (one per call), then assistant text. | | `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. | | `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. | | `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. | | `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
| `ffi/curl.lua` | post + post_sse | **No additions in v1** — analyze finding ruled out the long-lived SSE GET channel for lmcp. Phase 1 `M.post` (already does sync POST with response capture) is sufficient for MCP JSON-RPC. | | `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string | **One small extension**: `M.post` returns `(body, status_code)` instead of just `body` (or surfaces status via a second slot). MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. Phase 1 callers (broker.chat in its current shape) are unaffected — they ignore the second return value. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. | | `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and §4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
@@ -107,7 +107,8 @@ are out of scope here; track for v2 if a richer server appears.
1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`. 1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`.
2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`. 2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`.
3. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body). 3. **Version mismatch**: lmcp ignores client's `protocolVersion` and always returns its compiled-in `MCP_VERSION` (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status (`[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding`) and continues. v1 has no version-gated behavior.
4. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
### Tool discovery ### Tool discovery
@@ -118,30 +119,49 @@ are out of scope here; track for v2 if a richer server appears.
### Tool invocation ### Tool invocation
`tools/call` with `{ name, arguments }`. lmcp distinguishes two failure `tools/call` with `{ name, arguments }`. Failure has three flavors and
modes: all of them result in **a `role:"tool"` turn being appended** so the
assistant's `tool_calls` is never left orphaned in context (strict
templates reject `assistant.tool_calls` without a matching `tool`
reply — same gotcha PHASE0.md §6 warned about):
- **Tool-handler exception** → JSON-RPC `result` with `isError: true` - **Tool-handler exception** → JSON-RPC `result` with `isError: true`
and `content: [{ type:"text", text: "Error: ..." }]`. **Model-recoverable**: and `content: [{ type:"text", text: "Error: ..." }]`. Feed
feed it back to the model as the next `role:"tool"` turn content and `content` straight back as the `role:"tool"` turn body. Model-recoverable.
let it react. - **Baseline `isError: false` on actual failure** (PHASE2-baseline.md §3
- **Unknown method or unknown tool** → JSON-RPC `error` with found this — boltzmann's `read_file` returns content text containing
`code: -32601` ("Method not found" / "Tool not found"). **Transport "Error: ..." but `isError: false`). Pass content through unchanged —
level**: surface as an aish status, do not feed to model. let the model read the text. `isError` is advisory, not authoritative.
- **JSON-RPC envelope error** (e.g. `{code: -32601, message: "Tool not
found"}`) → synthesize a `role:"tool"` turn with
`content = "[aish] tool dispatch failed: <error.message>"` and the
matching `tool_call_id`. Also surface a status line for the user.
This both keeps alternation legal and tells the model what happened
so its next plan is informed.
- **HTTP-level failure** (auth, unreachable, timeout) → same shape:
synthesize a `role:"tool"` turn with
`content = "[aish] tool transport error: <reason>"`. Same alternation
rationale.
This split resolves Q21. This split resolves Q21 (with the C5/C7 review fix folded in).
### Lifecycle ### Lifecycle
- Connect on startup (from `config.mcp.servers`) — best effort; failures - Connect on startup (from `config.mcp.servers`) — best effort; failures
are status-logged, don't abort aish. "Connect" here means: do the are status-logged once, don't abort aish, and the session is **absent
`initialize` round-trip + cache `tools/list` results. from `mcp_sessions` until manually reconnected via `:mcp connect`**.
No automatic retry. "Connect" here means: do the `initialize`
round-trip + cache `tools/list` results.
- `:mcp connect <url>` adds a session at runtime; alias auto-derived - `:mcp connect <url>` adds a session at runtime; alias auto-derived
from hostname or supplied as second arg. from hostname or supplied as second arg.
- `:mcp disconnect <alias>` drops cached state. There's no long-lived - `:mcp disconnect <alias>` drops cached state. There's no long-lived
HTTP connection to close (every RPC was already `Connection: close`). HTTP connection to close (every RPC was already `Connection: close`).
- On aish quit, sessions are just forgotten — nothing to clean up - On aish quit, sessions are just forgotten — nothing to clean up
server-side. server-side.
- An unreachable server simply contributes no tools to the broker
request body — the model is not told that tools were "meant" to be
available. If `tools_schema()` returns empty across all sessions, the
broker omits the `tools` field entirely.
--- ---
@@ -186,19 +206,31 @@ data: {"choices":[{"finish_reason":"tool_calls",...}]}
On `finish_reason: tool_calls`, the accumulated calls are emitted to On `finish_reason: tool_calls`, the accumulated calls are emitted to
on_delta as `kind="tool_call"` with full payloads. on_delta as `kind="tool_call"` with full payloads.
**Index-absent fallback**: per the OpenAI spec, `index` is REQUIRED on
streaming `tool_calls[]` deltas — but some local llama.cpp builds have
been reported to omit it for single-call streams. If a delta has
`tool_calls` but no `index`, treat it as `index = 0` and accumulate
into the slot-0 buffer. Log a one-shot debug status the first time
this is observed per stream.
### Re-injection into context ### Re-injection into context
The assistant turn carries **whatever text was streamed before
`finish_reason: tool_calls`** (which may be non-empty — models often
say "Sure, let me look that up" before calling). The renderer flushes
that text first, then renders the tool-call frame around dispatch.
```lua ```lua
-- After tool execution -- After tool execution
ctx:append({ ctx:append({
role = "assistant", role = "assistant",
content = "", -- or any model-emitted text content = accumulated_text, -- may be "" if model emitted no prose
tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} }, tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
}) })
ctx:append({ ctx:append({
role = "tool", role = "tool",
tool_call_id = "call_…", tool_call_id = "call_…",
content = <tool-result-text>, content = <tool-result-text-or-synthesized-error>,
}) })
``` ```
@@ -207,7 +239,8 @@ strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
handled differently here — tool turns ARE expected to follow assistant handled differently here — tool turns ARE expected to follow assistant
tool_calls per the OpenAI chat-template convention. If a model's tool_calls per the OpenAI chat-template convention. If a model's
template still rejects this shape, fall back to the `[tool: X]` prefix template still rejects this shape, fall back to the `[tool: X]` prefix
strategy used for exec output (Q18 below). strategy used for exec output (Q18 below — fallback is plumbed via the
`context.use_tool_role` flag; default `true`).
### Re-issuing the broker request ### Re-issuing the broker request
@@ -318,7 +351,11 @@ User-visible changes:
- `CMD:` lines still work exactly as before for shell. - `CMD:` lines still work exactly as before for shell.
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4) Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
amended to add `mcp.lua`; that amendment ships in the manifest commit. amended to **add** `mcp.lua` (no rename of any existing file). Adding
a new file is additive and preserves the §3 module-stability invariant
("File names are stable across phases — later phases fill in bodies,
not rename files"). The amendment ships in commit #1 of the §12 plan
(C6 in the review).
`config.lua`: existing configs without an `mcp` section continue to work `config.lua`: existing configs without an `mcp` section continue to work
— no MCP servers means no tools sent in the broker request body, no — no MCP servers means no tools sent in the broker request body, no
@@ -360,7 +397,7 @@ Specifically out of Phase 2 scope despite proximity:
| Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. | | Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. |
Open-end carried forward to Phase 7 (verify): Open-end carried forward to Phase 7 (verify):
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Fix lives in boltzmann (parallel to the SSE-buffering bug tracked at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15)). Phase 7 needs the proxy fix to fully close Q18. - **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Tracked as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23) (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15) — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.
Resolved at formulate (above in §2 table): Resolved at formulate (above in §2 table):
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged. - Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
@@ -385,13 +422,18 @@ and Phase 1 implementation cadence.
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`, 1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` + `session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server `:close()`. Uses Phase 1's `ffi/curl.M.post` for transport — **same
Bearer auth (`auth_token` literal or `auth_env` indirection). Returns commit lands the `M.post` extension to return `(body, status_code)`
the raw `result` table for tool calls (caller distinguishes per §3 row** so `mcp.lua` can distinguish HTTP `401` (non-JSON-RPC
`isError`/content vs JSON-RPC `error`). **Test in isolation** via body `{"error":"unauthorized"}`) from JSON-RPC envelope errors.
Per-server Bearer auth (`auth_token` literal or `auth_env`
indirection). `:call_tool` returns `(result_table, kind)` where
`kind ∈ {"ok","handler_error","rpc_error"}` so callers route per
§4. **Test in isolation** via
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`. `luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
`router.lua` in the same commit. `router.lua` in the same commit (additive — preserves §3
module-stability invariant per §9).
2. **`safety.lua` — confirm-gate surface.** Implement just 2. **`safety.lua` — confirm-gate surface.** Implement just
`M.confirm_tool_call(name, args, cfg)` per §6. Reads `M.confirm_tool_call(name, args, cfg)` per §6. Reads
@@ -399,31 +441,37 @@ and Phase 1 implementation cadence.
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
in isolation** with mocked rl + various policy shapes. in isolation** with mocked rl + various policy shapes.
3. **`context.lua` extensions.** Add `role:"tool"` support: turns can 3. **`context.lua` extensions.** Three concrete edits per §3 row:
carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`. (a) loosen `Context:append`'s assert from "content required" to
`to_messages()` rendering: an assistant turn with `tool_calls` keeps shape-per-role (assistant may have empty content if `tool_calls`
its `content` plus the `tool_calls` field; tool turns render with present; `tool` requires `tool_call_id` + `content`); (b) preserve
`role:"tool"` + `tool_call_id` + `content`. Alternation policy `tool_calls` / `tool_call_id` in stored turns (not just role+content);
widens: assistant-with-tool_calls is followed by N tool turns. No (c) extend `to_messages()` to emit those fields. Add alternation
change to `pending_exec_output` behavior. **Test in isolation** by assert (N4 in review). `pending_exec_output` is **unchanged**:
building a context and round-tripping `to_messages()`. buffer persists across tool-call sub-loops; flushes on next genuine
user turn (§3 row). **Tests in isolation**: (i) build a context with
assistant+tool_calls + tool turns, round-trip through `to_messages()`,
eyeball JSON shape; (ii) day-one fallback test (N8) — same context
with `use_tool_role = false` must emit the `[tool: alias.name]\n…`
prefix shape instead of a `role:"tool"` message.
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)` 4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
(top rule + `name(json-snippet)` indented dim) and (top rule + `name(json-snippet)` indented dim) and
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status). `M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
Visual parity with the exec frame. **Test visually** with a one-liner. Visual parity with the exec frame. **Test visually** with a one-liner.
5. **`broker.lua` extensions.** `chat_stream` request body grows 5. **`broker.lua` extensions.** Signature widens:
`tools = caller-supplied`; we don't reach into mcp from here, the `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional
caller passes the assembled schema. The on_delta callback widens to array) is passed through to the request body; **omitted entirely
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text when nil or empty**. The on_delta callback widens to
path unchanged. Tool-call path: accumulator keyed by `index`, `on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`.
concatenates `function.arguments` until `finish_reason: "tool_calls"`, Text path unchanged. Tool-call path: accumulator keyed by `index`
then emits one `on_delta("tool_call", {id,name,arguments})` per (default 0 if absent — C2), concatenates `function.arguments` until
completed call. `M.chat` (the buffering wrapper) keeps its current `finish_reason: "tool_calls"`, then emits one
string-return shape for non-tool callers but optionally returns a `on_delta("tool_call", {id,name,arguments})` per completed call.
`{text, tool_calls}` table when tools were involved. **Test against **`M.chat` shape unchanged** in this phase (C1 in review — no
hossenfelder** with `tools` declared + streaming. caller for a polymorphic return). **Test against hossenfelder**
with `tools` declared + streaming.
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`, 6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
populated from `config.mcp.servers` at startup. Helpers: populated from `config.mcp.servers` at startup. Helpers:
@@ -466,24 +514,34 @@ and Phase 1 implementation cadence.
read the error text. Do NOT short-circuit on the flag. read the error text. Do NOT short-circuit on the flag.
- **JSON-RPC `error` from `tools/call`.** Surface as aish status - **JSON-RPC `error` from `tools/call`.** Surface as aish status
(`[aish] mcp: alias.tool: <error.message>`); do not feed to model. AND synthesize a `role:"tool"` turn with
Do NOT append a tool turn — there was no result. The model will `content = "[aish] tool dispatch failed: <error.message>"` and the
re-plan on the next user turn. matching `tool_call_id`. The alternation rationale (§4) requires
this — leaving the assistant's `tool_calls` orphaned breaks strict
chat templates exactly the way PHASE0.md §6 warned about. The model
receives the error and can re-plan within the same turn.
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai - **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
invocation. When hit, surface as status and break — append the invocation. When hit, surface as status and break — append the
assistant's last text (if any) and let the user reply. assistant's last text (if any) and let the user reply.
- **Argument JSON might be invalid.** A model can stream malformed JSON - **Argument JSON might be invalid.** A model can stream malformed JSON
in `function.arguments`. `dkjson.decode` failure → status warning + in `function.arguments`. `dkjson.decode` failure → DO NOT execute on
skip that call; do NOT execute on partial parse. partial parse. Synthesize a `role:"tool"` turn with
`content = "[aish] tool arguments not parseable as JSON: <decode-err>"`
and the matching `tool_call_id` (same alternation rationale as
JSON-RPC error above; C7 in review).
- **Q18 fallback path** (strict templates rejecting `role:"tool"`). - **Q18 fallback path** (strict templates rejecting `role:"tool"`).
Plumb a `context.use_tool_role` flag (default true). If a real-world Plumb a `context.use_tool_role` flag (default true). If a real-world
rejection appears at Phase 7, flip the flag and convert tool turns to rejection appears at Phase 7, flip the flag and convert tool turns to
`[tool: alias.name]\n<content>` prefix on the next user turn (same `[tool: alias.name]\n<content>` prefix on the next user turn (same
pattern as `pending_exec_output`). v1 doesn't need to test the pattern as `pending_exec_output`). **Day-one verification** (N8 in
fallback — just have the switch. review): commit #3 includes a small in-isolation test that builds a
context with `use_tool_role = false`, appends an assistant+tool_calls
turn followed by a tool result, and confirms `to_messages()` emits
the prefix shape instead of a `role:"tool"` turn. Keeps the fallback
alive rather than dead-coded until Phase 7 first runs it under stress.
### Test checkpoints ### Test checkpoints
@@ -493,30 +551,57 @@ After each commit, verify with a targeted probe before moving on:
|---|---| |---|---|
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp | | #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked | | #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape | | #3 `context.lua` | (i) round-trip a context with tool turns through `to_messages()`, eyeball JSON shape; (ii) day-one fallback test with `use_tool_role = false` emits the `[tool: …]` prefix shape (N8) |
| #4 `renderer.lua` | one-liner emits frame around fake tool result | | #4 `renderer.lua` | one-liner emits frame around fake tool result |
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body | | #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation | | #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent | | #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
### Commits expected: 7 (plus 1 manifest amendment if scope creeps) ### Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
in surface — single new file plus targeted extensions. Tracked to land in surface — single new file plus targeted extensions. Tracked to land
in one working session if the boltzmann proxy bugs don't intrude. in one working session if the boltzmann proxy bugs don't intrude.
### Open at plan; resolve at review ### Resolved at review (2026-05-12)
- **Q18 plumbing only or default-flip?** Default `use_tool_role = true` - **Q18 default** — `use_tool_role = true` defaulted, fallback exercised
and ship the fallback as dead code? Or default `false` (prefix day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if
injection) for safety until verify confirms? Recommend `true` — mistral-nemo (once #23 is fixed) rejects.
protocol-layer probe in analyze said it worked; Phase 7 can flip if - **`:mcp connect` re-fetch policy** — v1 trusts the `listChanged: false`
reality differs. capability; manual disconnect+reconnect is the workaround if a server's
tools change. No automatic re-fetch.
- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles ### Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)
servers that misreport `listChanged:false`)? v1: no — trust the
capability. Manual `:mcp disconnect` + `:mcp connect` is the Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against
workaround if a server's tools change. the formulate+analyze+plan draft. Resolutions applied in this revision:
- **B1** context.lua impact widened — assert loosening + field
preservation + `to_messages` emit are now explicit in §3.
- **B2** `ffi/curl.M.post` extended to return `(body, status_code)` so
`mcp.lua` distinguishes HTTP `401` from JSON-RPC envelope errors.
- **B3** `inputSchema` typo fixed in §3 mcp.lua row.
- **B4** `pending_exec_output` × tool-call sub-loop interaction
specified (persists across; flushes on next user turn).
- **B5** §3/§12 dependency contradiction resolved — broker takes
`opts.tools` from the caller; no layering inversion.
- **C1** `M.chat` polymorphic return dropped.
- **C2** Index-absent fallback specified (default to 0).
- **C3** Re-injection example now stores accumulated text in the
assistant turn, not hard-coded empty string.
- **C4** `:mcp connect` failure semantics specified (no auto-retry).
- **C5/C7** Both orphan-tool_calls scenarios now synthesize a
`role:"tool"` turn with `[aish] tool dispatch failed: ...` content
to preserve alternation.
- **C6** §9 explicitly notes the §4 amendment is additive.
- **N3** protocolVersion fallback specified (lmcp doesn't negotiate).
- **N4** alternation assert added to context.lua row.
- **N7** model-routing bug filed as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23).
- **N8** day-one fallback test added to commit #3 checkpoints.
CONCERNs / NITs not folded (defended as wording-only, not load-bearing):
N1, N2, N5, N6 — left as-is.
--- ---