Files
aish/docs/PHASE2.md
T
marfrit 447e430254 docs/PHASE2 §12: implementation plan — 7-commit roadmap
Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua
→ repl.lua → config.lua. Same cadence as Phase 0/1.

Risks called out explicitly:
- Empty tools array → omit field entirely (some servers reject [])
- isError:false on actual failure (baseline §3 finding) → pass content
  through regardless; let model read error text
- JSON-RPC error from tools/call → aish status only, no tool turn
  appended, no model recovery
- max_tool_depth=8 cap on tool-call sub-loop
- Argument JSON streaming may yield malformed JSON → status warn + skip
- Q18 fallback (use_tool_role=true default; prefix-injection plumbed
  but dead-coded; verify can flip)
- Connect-at-startup is sequential (~30ms × N); fine for N≤3

Two items left open for review: Q18 default flip vs ship-true-flip-on-fail,
and whether :mcp connect should re-fetch tools after the initial cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:37:27 +00:00

524 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# aish — Phase 2 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 2 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; live-probed against lmcp v0.5.4 + hossenfelder proxy)
**Date:** 2026-05-12
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
manifest specifies what Phase 2 adds. Section numbers reference back to
PHASE0.md / PHASE1.md where relevant.
---
## 1. Scope of Phase 2
Three pillars per PHASE0.md §11 row 2:
1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
Target reference implementation: `lmcp`. Operations needed for v1:
`initialize`, `tools/list`, `tools/call`. Multiple servers may be
connected concurrently; tools are namespaced `<server>.<tool>`.
2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
`tools` in the request body; the model emits `tool_calls` in the
response; `mcp.lua` dispatches each call to the right server; the
tool result is fed back as a `role:"tool"` turn in `context.lua` and
the chat continues.
3. **Authorization gate**`safety.lua` (PHASE0.md §4 stub) finally gets
implemented. Every tool call is confirmed by the user by default,
with per-tool and per-server `auto_approve` policies in `config.lua`.
**Phase 2 is done when:**
- aish can connect to at least one local `lmcp` server declared in
`config.lua` and one connected via `:mcp connect <url>` at runtime.
- `:mcp list` shows connected servers; `:mcp tools` shows discovered
tools across all servers.
- A model conversation can invoke a tool: the broker request carries
the live tools schema; the response's `tool_calls` are confirmed by
the user; each call dispatches to the right MCP server; the result
re-enters the chat; the model continues with the result available.
- `CMD:` extraction (PHASE0.md §6 substrate invariant) still works
unchanged — Phase 2 is additive, not replacing.
- A tool with `auto_approve = true` (in config) executes without the
confirm prompt; a non-approved tool still prompts.
---
## 2. Technology Decisions (delta from Phase 1)
| Decision | Choice | Rationale |
|---|---|---|
| MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION`. aish sends the same in `initialize`; future model bumps are negotiated at connect time. |
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
| `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
| Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
| System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
| Tool-call streaming | Streaming-from-day-one — `broker.chat_stream`'s on_delta callback widens to handle `tool_calls` deltas in addition to text deltas | Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. **Wire shape confirmed at analyze** (2026-05-12 probe vs hossenfelder): `delta.tool_calls[]` arrives indexed; id+type+function.name appear on the opening delta; `function.arguments` is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5. |
| Tool-call concurrency | Sequential dispatch in Phase 2 v1 — process `tool_calls[0]` to completion, then `[1]`, etc. | Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20). |
| MCP server lifecycle | aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) | Declared in config or connected by URL; aish is a client only. |
---
## 3. Module Changes
| File | State after Phase 1 | Phase 2 changes |
|---|---|---|
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, schema}]`, `session:call_tool(name, args) -> result | tool_error`, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. |
| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Request body grows `tools = mcp.tools_schema()` (assembled from all connected sessions). on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`; tool_call payload includes id+name+arguments-delta. `M.chat` wrapper updates to buffer both. |
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output` | New role: `"tool"`. Assistant turns may carry `tool_calls = [{id, name, arguments}]`. `to_messages()` flattens these into OpenAI-shape messages. Alternation rules: assistant-with-tool_calls is followed by N tool turns (one per call), then assistant text. |
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
| `ffi/curl.lua` | post + post_sse | **No additions in v1** — analyze finding ruled out the long-lived SSE GET channel for lmcp. Phase 1 `M.post` (already does sync POST with response capture) is sufficient for MCP JSON-RPC. |
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
`router.lua` in the §4 table. Same commit lands the manifest amendment.
---
## 4. MCP Transport (analyze findings — lmcp v0.5.4)
lmcp implements only the **synchronous POST** flavor of the MCP
streamable-HTTP spec. Each RPC is one HTTP transaction:
```
client → server: POST /mcp Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>
Body: { jsonrpc:"2.0", id, method, params }
Returns: { jsonrpc, id, result | error }
Connection: close
```
lmcp's `GET /mcp` exists but only sends a one-shot `event: endpoint`
announcing the POST URL, then closes — there is no held-open
server→client channel. Combined with the `listChanged: false`
capability lmcp announces in `initialize`, **aish does not open a
persistent SSE channel** to lmcp servers in v1. Notifications-from-server
are out of scope here; track for v2 if a richer server appears.
### Handshake
1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`.
2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`.
3. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
### Tool discovery
1. `tools/list` RPC → `{ tools: [{ name, description, inputSchema }] }`.
2. Cache per-session **for the session lifetime** — lmcp announces
`listChanged: false`, so there's no need to refetch or listen for
change notifications.
### Tool invocation
`tools/call` with `{ name, arguments }`. lmcp distinguishes two failure
modes:
- **Tool-handler exception** → JSON-RPC `result` with `isError: true`
and `content: [{ type:"text", text: "Error: ..." }]`. **Model-recoverable**:
feed it back to the model as the next `role:"tool"` turn content and
let it react.
- **Unknown method or unknown tool** → JSON-RPC `error` with
`code: -32601` ("Method not found" / "Tool not found"). **Transport
level**: surface as an aish status, do not feed to model.
This split resolves Q21.
### Lifecycle
- Connect on startup (from `config.mcp.servers`) — best effort; failures
are status-logged, don't abort aish. "Connect" here means: do the
`initialize` round-trip + cache `tools/list` results.
- `:mcp connect <url>` adds a session at runtime; alias auto-derived
from hostname or supplied as second arg.
- `:mcp disconnect <alias>` drops cached state. There's no long-lived
HTTP connection to close (every RPC was already `Connection: close`).
- On aish quit, sessions are just forgotten — nothing to clean up
server-side.
---
## 5. Tool-Call Bridge
### Broker request body (delta from Phase 1)
```json
{
"model": "...",
"messages": [...],
"stream": true,
"temperature": 0.2,
"tools": [
{ "type":"function",
"function": { "name":"<alias>.<tool>",
"description":"...",
"parameters": <inputSchema> } },
...
]
}
```
The `tools` array is assembled by `mcp.tools_schema()` — flattens
`tools/list` results from every connected session, namespacing each tool
as `<alias>.<name>`.
### Response handling (streaming)
llama.cpp / OpenAI deltas may include:
```json
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
"function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
"function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}
```
`broker.chat_stream` accumulates tool-call deltas keyed by `index`; the
`arguments` field is a JSON-string that arrives chunked and is concatenated.
On `finish_reason: tool_calls`, the accumulated calls are emitted to
on_delta as `kind="tool_call"` with full payloads.
### Re-injection into context
```lua
-- After tool execution
ctx:append({
role = "assistant",
content = "", -- or any model-emitted text
tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
})
ctx:append({
role = "tool",
tool_call_id = "call_…",
content = <tool-result-text>,
})
```
`to_messages()` renders both shapes for the next broker request. The
strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
handled differently here — tool turns ARE expected to follow assistant
tool_calls per the OpenAI chat-template convention. If a model's
template still rejects this shape, fall back to the `[tool: X]` prefix
strategy used for exec output (Q18 below).
### Re-issuing the broker request
After tool turns are appended, the broker is called again with the
extended messages array. The model may emit more `tool_calls`, more
text, or both. Loop until the response has no `tool_calls` (i.e. a
plain text assistant turn).
Budget: a max-tool-call-depth setting (default 8) prevents runaway loops.
Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
---
## 6. Authorization (safety.lua Phase 2 surface)
```lua
-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
function M.confirm_tool_call(name, args, cfg)
local policy = cfg.mcp and cfg.mcp.auto_approve or {}
if policy[name] then return true end
-- Per-server prefix check: "alias.*" entries
local alias = name:match("^([^.]+)%.")
if alias and policy[alias .. ".*"] then return true end
-- Otherwise prompt
local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
return ans:lower():sub(1,1) == "y"
end
```
Config schema (analyze-revised — Bearer auth fields added):
```lua
mcp = {
servers = {
boltzmann = {
url = "http://boltzmann.fritz.box:8080/mcp",
auth_env = "BOLTZMANN_MCP_TOKEN", -- read from env at startup
},
broglie = {
url = "http://broglie.fritz.box:8080/mcp",
-- no auth (LAN-only deployment)
},
nc = {
url = "https://nc.reauktion.de:8080/mcp",
auth_token = "literal-token-if-not-using-env", -- alternative
},
},
auto_approve = {
["boltzmann.read_file"] = true, -- specific tool
["broglie.*"] = true, -- whole server
},
max_tool_depth = 8,
}
```
Auth precedence per server: `auth_token` literal > `auth_env` indirection
> nil (no Authorization header sent). Mirrors PHASE0 §10's `key_env`
convention for cloud model API keys.
Norris mode (Phase 3) will extend this: when autonomous, the destructive-op
heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
---
## 7. Meta Commands (Phase 2 additions)
| Command | Action |
|---|---|
| `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
| `:mcp disconnect <alias>` | Close one session |
| `:mcp list` | Show connected sessions (alias, url, tool count, status) |
| `:mcp tools` | List tools across all sessions (`alias.name` — short description) |
| `:mcp tool <alias.name>` | Show one tool's full inputSchema (debug aid) |
Existing `:help` updated to list these.
---
## 8. System Prompt Augmentation
`broker.lua`'s default system prompt grows by ~4 lines:
```
You may have access to MCP tools — they appear in this request's `tools`
field. Call a tool by emitting a tool_call; the result will be supplied
in the next turn. Use tools for structured operations (file reads,
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
when available; fall back to `CMD:` for anything not exposed as a tool.
```
The actual tool list is in the `tools` request-body field, not the
prompt. This avoids per-turn token bloat for the full schema.
§3 substrate invariants are unchanged. The `CMD:` extraction marker stays
the local-shell route; tools are the additive structured route.
---
## 9. Migration from Phase 1
User-visible changes:
- New `:mcp …` meta commands when MCP servers are configured or
connected at runtime.
- Assistant responses may now invoke tools — user sees a confirm prompt
(similar to `CMD:` execution gate) followed by an indented tool-call
frame with the result.
- `CMD:` lines still work exactly as before for shell.
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
amended to add `mcp.lua`; that amendment ships in the manifest commit.
`config.lua`: existing configs without an `mcp` section continue to work
— no MCP servers means no tools sent in the broker request body, no
auth checks, no behavior change.
---
## 10. Out of Scope (Phase 2)
Per PHASE0.md §11, these belong elsewhere:
- Chuck Norris autonomous mode (Phase 3) — even though tool-calls
enable richer autonomy, the *autonomous policy* is Phase 3's.
- Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only
implements the per-call confirm-prompt surface.
- `memory.jsonl` summarization across sessions (Phase 4).
- Multi-model routing / cloud fallback (Phase 5).
- Tree-sitter syntax highlighting (Phase 6).
Specifically out of Phase 2 scope despite proximity:
- Stdio-transport MCP servers (Q17 below).
- Parallel tool-call dispatch (Q20).
- MCP `resources/list` and `prompts/list` capabilities — Phase 2
v1 only implements `tools/*`. Resources/prompts deferred (probably
Phase 4 alongside memory).
- Server-sent `notifications/progress` for long-running tool calls —
ignored in v1; status surface comes later.
---
## 11. Open Questions
| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q17 | ~~MCP transport abstraction: stdio vs HTTP+SSE~~ | mcp.lua API shape | **Resolved at analyze.** Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and `listChanged: false` removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here. |
| Q18 | Tool-result re-injection: standard OpenAI `role:"tool"` turn, or `[tool: X]` prefix to next user turn (matching the §6 exec-output pattern)? | context.lua + broker.lua | **Partly resolved.** Live probe (2026-05-12, hossenfelder) shows `role:"tool"` accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is **blocked** by the hossenfelder proxy routing all `model` field values to the loaded fast model — see open-end below. Default v1 path: `role:"tool"` (standard); fallback to `[tool: X]` prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify. |
| Q19 | Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model? | context.lua + executor of tool-result | Phase 2 (plan); Phase 4 may refine with memory.jsonl |
| Q20 | Parallel `tool_calls`: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only? | mcp.lua dispatch | Phase 2 (verify) — track for v2 |
| Q21 | ~~MCP error mapping~~ | mcp.lua + broker.lua | **Resolved at analyze.** lmcp distinguishes: `result.isError=true` (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC `error` (unknown method/tool, transport-level, surface as aish status). See §4. |
| Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. |
Open-end carried forward to Phase 7 (verify):
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Fix lives in boltzmann (parallel to the SSE-buffering bug tracked at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15)). Phase 7 needs the proxy fix to fully close Q18.
Resolved at formulate (above in §2 table):
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
- Q7 (MCP discovery) — both, config-declared default + runtime `:mcp connect`.
- Q8 (authorization) — per-call confirm default, per-tool/per-server `auto_approve` policy.
- Q9 (system-prompt augmentation) — hybrid: static frame + dynamic `tools` body field.
- Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.
Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
- Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
- Q21 (error mapping) — isError vs JSON-RPC error split per §4.
---
## 12. Implementation Plan (commit-by-commit)
Bottom-up — start with modules with the fewest dependencies, end with the
REPL wiring that exercises everything together. Same shape as Phase 0
and Phase 1 implementation cadence.
### Order
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server
Bearer auth (`auth_token` literal or `auth_env` indirection). Returns
the raw `result` table for tool calls (caller distinguishes
`isError`/content vs JSON-RPC `error`). **Test in isolation** via
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
`router.lua` in the same commit.
2. **`safety.lua` — confirm-gate surface.** Implement just
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
`cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
in isolation** with mocked rl + various policy shapes.
3. **`context.lua` extensions.** Add `role:"tool"` support: turns can
carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`.
`to_messages()` rendering: an assistant turn with `tool_calls` keeps
its `content` plus the `tool_calls` field; tool turns render with
`role:"tool"` + `tool_call_id` + `content`. Alternation policy
widens: assistant-with-tool_calls is followed by N tool turns. No
change to `pending_exec_output` behavior. **Test in isolation** by
building a context and round-tripping `to_messages()`.
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
(top rule + `name(json-snippet)` indented dim) and
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
Visual parity with the exec frame. **Test visually** with a one-liner.
5. **`broker.lua` extensions.** `chat_stream` request body grows
`tools = caller-supplied`; we don't reach into mcp from here, the
caller passes the assembled schema. The on_delta callback widens to
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text
path unchanged. Tool-call path: accumulator keyed by `index`,
concatenates `function.arguments` until `finish_reason: "tool_calls"`,
then emits one `on_delta("tool_call", {id,name,arguments})` per
completed call. `M.chat` (the buffering wrapper) keeps its current
string-return shape for non-tool callers but optionally returns a
`{text, tool_calls}` table when tools were involved. **Test against
hossenfelder** with `tools` declared + streaming.
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
populated from `config.mcp.servers` at startup. Helpers:
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
- `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
- `ask_ai` loop now: stream response → if any tool_calls completed,
for each call: `safety.confirm_tool_call``dispatch_tool_call`
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
→ repeat until pure-text response or `max_tool_depth` reached
- New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
`:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
**End-to-end test** via the REPL against a real boltzmann lmcp +
hossenfelder broker.
7. **`config.lua` example block.** Add a commented-out `mcp = { servers
= { boltzmann = {...} }, auto_approve = {...} }` example so users can
see the shape. Not behavior-impacting; documentation only. Bundled
with commit #6 if small or split if substantial.
### Risk / non-obvious
- **Empty tools array.** If `config.mcp.servers` is absent or all
connects fail, the broker request body must **omit** `tools`
entirely (some servers reject `"tools": []`). Don't send the field
when empty.
- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
per server, don't abort aish. Parallel via coroutines is out of scope
here — sequential is fine for v1.
- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
concatenating all `text` blocks and ignoring non-text. Log a status
warning if non-text blocks are seen. Adequate for boltzmann/hertz
tools (all text); image/resource tools deferred.
- **`isError: false` on actual failure** (baseline finding §3 of
PHASE2-baseline.md). Pass content through unchanged; let the model
read the error text. Do NOT short-circuit on the flag.
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
(`[aish] mcp: alias.tool: <error.message>`); do not feed to model.
Do NOT append a tool turn — there was no result. The model will
re-plan on the next user turn.
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
invocation. When hit, surface as status and break — append the
assistant's last text (if any) and let the user reply.
- **Argument JSON might be invalid.** A model can stream malformed JSON
in `function.arguments`. `dkjson.decode` failure → status warning +
skip that call; do NOT execute on partial parse.
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
Plumb a `context.use_tool_role` flag (default true). If a real-world
rejection appears at Phase 7, flip the flag and convert tool turns to
`[tool: alias.name]\n<content>` prefix on the next user turn (same
pattern as `pending_exec_output`). v1 doesn't need to test the
fallback — just have the switch.
### Test checkpoints
After each commit, verify with a targeted probe before moving on:
| Commit | Verify |
|---|---|
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape |
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
### Commits expected: 7 (plus 1 manifest amendment if scope creeps)
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
in surface — single new file plus targeted extensions. Tracked to land
in one working session if the boltzmann proxy bugs don't intrude.
### Open at plan; resolve at review
- **Q18 plumbing only or default-flip?** Default `use_tool_role = true`
and ship the fallback as dead code? Or default `false` (prefix
injection) for safety until verify confirms? Recommend `true` —
protocol-layer probe in analyze said it worked; Phase 7 can flip if
reality differs.
- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles
servers that misreport `listChanged:false`)? v1: no — trust the
capability. Manual `:mcp disconnect` + `:mcp connect` is the
workaround if a server's tools change.
---
*End of Phase 2 Manifest — aish*