diff --git a/docs/PHASE2.md b/docs/PHASE2.md index fab60ea..88e86c4 100644 --- a/docs/PHASE2.md +++ b/docs/PHASE2.md @@ -375,4 +375,149 @@ Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder): --- +## 12. Implementation Plan (commit-by-commit) + +Bottom-up — start with modules with the fewest dependencies, end with the +REPL wiring that exercises everything together. Same shape as Phase 0 +and Phase 1 implementation cadence. + +### Order + +1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`, + `session:initialize()` + `:list_tools()` + `:call_tool(name, args)` + + `:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server + Bearer auth (`auth_token` literal or `auth_env` indirection). Returns + the raw `result` table for tool calls (caller distinguishes + `isError`/content vs JSON-RPC `error`). **Test in isolation** via + `luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`. + Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and + `router.lua` in the same commit. + +2. **`safety.lua` — confirm-gate surface.** Implement just + `M.confirm_tool_call(name, args, cfg)` per §6. Reads + `cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back + to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test + in isolation** with mocked rl + various policy shapes. + +3. **`context.lua` extensions.** Add `role:"tool"` support: turns can + carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`. + `to_messages()` rendering: an assistant turn with `tool_calls` keeps + its `content` plus the `tool_calls` field; tool turns render with + `role:"tool"` + `tool_call_id` + `content`. Alternation policy + widens: assistant-with-tool_calls is followed by N tool turns. No + change to `pending_exec_output` behavior. **Test in isolation** by + building a context and round-tripping `to_messages()`. + +4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)` + (top rule + `name(json-snippet)` indented dim) and + `M.tool_call_end(content, is_error)` (bottom rule with dim/red status). + Visual parity with the exec frame. **Test visually** with a one-liner. + +5. **`broker.lua` extensions.** `chat_stream` request body grows + `tools = caller-supplied`; we don't reach into mcp from here, the + caller passes the assembled schema. The on_delta callback widens to + `on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text + path unchanged. Tool-call path: accumulator keyed by `index`, + concatenates `function.arguments` until `finish_reason: "tool_calls"`, + then emits one `on_delta("tool_call", {id,name,arguments})` per + completed call. `M.chat` (the buffering wrapper) keeps its current + string-return shape for non-tool callers but optionally returns a + `{text, tool_calls}` table when tools were involved. **Test against + hossenfelder** with `tools` declared + streaming. + +6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`, + populated from `config.mcp.servers` at startup. Helpers: + - `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name` + - `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content + - `ask_ai` loop now: stream response → if any tool_calls completed, + for each call: `safety.confirm_tool_call` → `dispatch_tool_call` → + append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream` + → repeat until pure-text response or `max_tool_depth` reached + - New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool `, + `:mcp connect [alias]`, `:mcp disconnect ` + **End-to-end test** via the REPL against a real boltzmann lmcp + + hossenfelder broker. + +7. **`config.lua` example block.** Add a commented-out `mcp = { servers + = { boltzmann = {...} }, auto_approve = {...} }` example so users can + see the shape. Not behavior-impacting; documentation only. Bundled + with commit #6 if small or split if substantial. + +### Risk / non-obvious + +- **Empty tools array.** If `config.mcp.servers` is absent or all + connects fail, the broker request body must **omit** `tools` + entirely (some servers reject `"tools": []`). Don't send the field + when empty. + +- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For + N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged + per server, don't abort aish. Parallel via coroutines is out of scope + here — sequential is fine for v1. + +- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`. + The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by + concatenating all `text` blocks and ignoring non-text. Log a status + warning if non-text blocks are seen. Adequate for boltzmann/hertz + tools (all text); image/resource tools deferred. + +- **`isError: false` on actual failure** (baseline finding §3 of + PHASE2-baseline.md). Pass content through unchanged; let the model + read the error text. Do NOT short-circuit on the flag. + +- **JSON-RPC `error` from `tools/call`.** Surface as aish status + (`[aish] mcp: alias.tool: `); do not feed to model. + Do NOT append a tool turn — there was no result. The model will + re-plan on the next user turn. + +- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai + invocation. When hit, surface as status and break — append the + assistant's last text (if any) and let the user reply. + +- **Argument JSON might be invalid.** A model can stream malformed JSON + in `function.arguments`. `dkjson.decode` failure → status warning + + skip that call; do NOT execute on partial parse. + +- **Q18 fallback path** (strict templates rejecting `role:"tool"`). + Plumb a `context.use_tool_role` flag (default true). If a real-world + rejection appears at Phase 7, flip the flag and convert tool turns to + `[tool: alias.name]\n` prefix on the next user turn (same + pattern as `pending_exec_output`). v1 doesn't need to test the + fallback — just have the switch. + +### Test checkpoints + +After each commit, verify with a targeted probe before moving on: + +| Commit | Verify | +|---|---| +| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp | +| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked | +| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape | +| #4 `renderer.lua` | one-liner emits frame around fake tool result | +| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body | +| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation | +| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent | + +### Commits expected: 7 (plus 1 manifest amendment if scope creeps) + +Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller +in surface — single new file plus targeted extensions. Tracked to land +in one working session if the boltzmann proxy bugs don't intrude. + +### Open at plan; resolve at review + +- **Q18 plumbing only or default-flip?** Default `use_tool_role = true` + and ship the fallback as dead code? Or default `false` (prefix + injection) for safety until verify confirms? Recommend `true` — + protocol-layer probe in analyze said it worked; Phase 7 can flip if + reality differs. + +- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles + servers that misreport `listChanged:false`)? v1: no — trust the + capability. Manual `:mcp disconnect` + `:mcp connect` is the + workaround if a server's tools change. + +--- + *End of Phase 2 Manifest — aish*