docs/PHASE2 §12: implementation plan — 7-commit roadmap

Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua → repl.lua → config.lua. Same cadence as Phase 0/1. Risks called out explicitly: - Empty tools array → omit field entirely (some servers reject []) - isError:false on actual failure (baseline §3 finding) → pass content through regardless; let model read error text - JSON-RPC error from tools/call → aish status only, no tool turn appended, no model recovery - max_tool_depth=8 cap on tool-call sub-loop - Argument JSON streaming may yield malformed JSON → status warn + skip - Q18 fallback (use_tool_role=true default; prefix-injection plumbed but dead-coded; verify can flip) - Connect-at-startup is sequential (~30ms × N); fine for N≤3 Two items left open for review: Q18 default flip vs ship-true-flip-on-fail, and whether :mcp connect should re-fetch tools after the initial cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:37:27 +00:00
parent c5116bf129
commit 447e430254
1 changed files with 145 additions and 0 deletions
@@ -375,4 +375,149 @@ Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):

 ---

+## 12. Implementation Plan (commit-by-commit)
+
+Bottom-up — start with modules with the fewest dependencies, end with the
+REPL wiring that exercises everything together. Same shape as Phase 0
+and Phase 1 implementation cadence.
+
+### Order
+
+1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
+   `session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
+   `:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server
+   Bearer auth (`auth_token` literal or `auth_env` indirection). Returns
+   the raw `result` table for tool calls (caller distinguishes
+   `isError`/content vs JSON-RPC `error`). **Test in isolation** via
+   `luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
+   Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
+   `router.lua` in the same commit.
+
+2. **`safety.lua` — confirm-gate surface.** Implement just
+   `M.confirm_tool_call(name, args, cfg)` per §6. Reads
+   `cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
+   to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
+   in isolation** with mocked rl + various policy shapes.
+
+3. **`context.lua` extensions.** Add `role:"tool"` support: turns can
+   carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`.
+   `to_messages()` rendering: an assistant turn with `tool_calls` keeps
+   its `content` plus the `tool_calls` field; tool turns render with
+   `role:"tool"` + `tool_call_id` + `content`. Alternation policy
+   widens: assistant-with-tool_calls is followed by N tool turns. No
+   change to `pending_exec_output` behavior. **Test in isolation** by
+   building a context and round-tripping `to_messages()`.
+
+4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
+   (top rule + `name(json-snippet)` indented dim) and
+   `M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
+   Visual parity with the exec frame. **Test visually** with a one-liner.
+
+5. **`broker.lua` extensions.** `chat_stream` request body grows
+   `tools = caller-supplied`; we don't reach into mcp from here, the
+   caller passes the assembled schema. The on_delta callback widens to
+   `on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text
+   path unchanged. Tool-call path: accumulator keyed by `index`,
+   concatenates `function.arguments` until `finish_reason: "tool_calls"`,
+   then emits one `on_delta("tool_call", {id,name,arguments})` per
+   completed call. `M.chat` (the buffering wrapper) keeps its current
+   string-return shape for non-tool callers but optionally returns a
+   `{text, tool_calls}` table when tools were involved. **Test against
+   hossenfelder** with `tools` declared + streaming.
+
+6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
+   populated from `config.mcp.servers` at startup. Helpers:
+   - `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
+   - `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
+   - `ask_ai` loop now: stream response → if any tool_calls completed,
+     for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
+     append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
+     → repeat until pure-text response or `max_tool_depth` reached
+   - New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
+     `:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
+   **End-to-end test** via the REPL against a real boltzmann lmcp +
+   hossenfelder broker.
+
+7. **`config.lua` example block.** Add a commented-out `mcp = { servers
+   = { boltzmann = {...} }, auto_approve = {...} }` example so users can
+   see the shape. Not behavior-impacting; documentation only. Bundled
+   with commit #6 if small or split if substantial.
+
+### Risk / non-obvious
+
+- **Empty tools array.** If `config.mcp.servers` is absent or all
+  connects fail, the broker request body must **omit** `tools`
+  entirely (some servers reject `"tools": []`). Don't send the field
+  when empty.
+
+- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
+  N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
+  per server, don't abort aish. Parallel via coroutines is out of scope
+  here — sequential is fine for v1.
+
+- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
+  The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
+  concatenating all `text` blocks and ignoring non-text. Log a status
+  warning if non-text blocks are seen. Adequate for boltzmann/hertz
+  tools (all text); image/resource tools deferred.
+
+- **`isError: false` on actual failure** (baseline finding §3 of
+  PHASE2-baseline.md). Pass content through unchanged; let the model
+  read the error text. Do NOT short-circuit on the flag.
+
+- **JSON-RPC `error` from `tools/call`.** Surface as aish status
+  (`[aish] mcp: alias.tool: <error.message>`); do not feed to model.
+  Do NOT append a tool turn — there was no result. The model will
+  re-plan on the next user turn.
+
+- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
+  invocation. When hit, surface as status and break — append the
+  assistant's last text (if any) and let the user reply.
+
+- **Argument JSON might be invalid.** A model can stream malformed JSON
+  in `function.arguments`. `dkjson.decode` failure → status warning +
+  skip that call; do NOT execute on partial parse.
+
+- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
+  Plumb a `context.use_tool_role` flag (default true). If a real-world
+  rejection appears at Phase 7, flip the flag and convert tool turns to
+  `[tool: alias.name]\n<content>` prefix on the next user turn (same
+  pattern as `pending_exec_output`). v1 doesn't need to test the
+  fallback — just have the switch.
+
+### Test checkpoints
+
+After each commit, verify with a targeted probe before moving on:
+
+| Commit | Verify |
+|---|---|
+| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
+| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
+| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape |
+| #4 `renderer.lua` | one-liner emits frame around fake tool result |
+| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
+| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
+| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
+
+### Commits expected: 7 (plus 1 manifest amendment if scope creeps)
+
+Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
+in surface — single new file plus targeted extensions. Tracked to land
+in one working session if the boltzmann proxy bugs don't intrude.
+
+### Open at plan; resolve at review
+
+- **Q18 plumbing only or default-flip?** Default `use_tool_role = true`
+  and ship the fallback as dead code? Or default `false` (prefix
+  injection) for safety until verify confirms? Recommend `true` —
+  protocol-layer probe in analyze said it worked; Phase 7 can flip if
+  reality differs.
+
+- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles
+  servers that misreport `listChanged:false`)? v1: no — trust the
+  capability. Manual `:mcp disconnect` + `:mcp connect` is the
+  workaround if a server's tools change.
+
+---
+
 *End of Phase 2 Manifest — aish*