docs/PHASE2 §12: implementation plan — 7-commit roadmap

Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua
→ repl.lua → config.lua. Same cadence as Phase 0/1.

Risks called out explicitly:
- Empty tools array → omit field entirely (some servers reject [])
- isError:false on actual failure (baseline §3 finding) → pass content
  through regardless; let model read error text
- JSON-RPC error from tools/call → aish status only, no tool turn
  appended, no model recovery
- max_tool_depth=8 cap on tool-call sub-loop
- Argument JSON streaming may yield malformed JSON → status warn + skip
- Q18 fallback (use_tool_role=true default; prefix-injection plumbed
  but dead-coded; verify can flip)
- Connect-at-startup is sequential (~30ms × N); fine for N≤3

Two items left open for review: Q18 default flip vs ship-true-flip-on-fail,
and whether :mcp connect should re-fetch tools after the initial cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 12:37:27 +00:00
parent c5116bf129
commit 447e430254
+145
View File
@@ -375,4 +375,149 @@ Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
---
## 12. Implementation Plan (commit-by-commit)
Bottom-up — start with modules with the fewest dependencies, end with the
REPL wiring that exercises everything together. Same shape as Phase 0
and Phase 1 implementation cadence.
### Order
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server
Bearer auth (`auth_token` literal or `auth_env` indirection). Returns
the raw `result` table for tool calls (caller distinguishes
`isError`/content vs JSON-RPC `error`). **Test in isolation** via
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
`router.lua` in the same commit.
2. **`safety.lua` — confirm-gate surface.** Implement just
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
`cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
in isolation** with mocked rl + various policy shapes.
3. **`context.lua` extensions.** Add `role:"tool"` support: turns can
carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`.
`to_messages()` rendering: an assistant turn with `tool_calls` keeps
its `content` plus the `tool_calls` field; tool turns render with
`role:"tool"` + `tool_call_id` + `content`. Alternation policy
widens: assistant-with-tool_calls is followed by N tool turns. No
change to `pending_exec_output` behavior. **Test in isolation** by
building a context and round-tripping `to_messages()`.
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
(top rule + `name(json-snippet)` indented dim) and
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
Visual parity with the exec frame. **Test visually** with a one-liner.
5. **`broker.lua` extensions.** `chat_stream` request body grows
`tools = caller-supplied`; we don't reach into mcp from here, the
caller passes the assembled schema. The on_delta callback widens to
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text
path unchanged. Tool-call path: accumulator keyed by `index`,
concatenates `function.arguments` until `finish_reason: "tool_calls"`,
then emits one `on_delta("tool_call", {id,name,arguments})` per
completed call. `M.chat` (the buffering wrapper) keeps its current
string-return shape for non-tool callers but optionally returns a
`{text, tool_calls}` table when tools were involved. **Test against
hossenfelder** with `tools` declared + streaming.
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
populated from `config.mcp.servers` at startup. Helpers:
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
- `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
- `ask_ai` loop now: stream response → if any tool_calls completed,
for each call: `safety.confirm_tool_call``dispatch_tool_call`
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
→ repeat until pure-text response or `max_tool_depth` reached
- New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
`:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
**End-to-end test** via the REPL against a real boltzmann lmcp +
hossenfelder broker.
7. **`config.lua` example block.** Add a commented-out `mcp = { servers
= { boltzmann = {...} }, auto_approve = {...} }` example so users can
see the shape. Not behavior-impacting; documentation only. Bundled
with commit #6 if small or split if substantial.
### Risk / non-obvious
- **Empty tools array.** If `config.mcp.servers` is absent or all
connects fail, the broker request body must **omit** `tools`
entirely (some servers reject `"tools": []`). Don't send the field
when empty.
- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
per server, don't abort aish. Parallel via coroutines is out of scope
here — sequential is fine for v1.
- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
concatenating all `text` blocks and ignoring non-text. Log a status
warning if non-text blocks are seen. Adequate for boltzmann/hertz
tools (all text); image/resource tools deferred.
- **`isError: false` on actual failure** (baseline finding §3 of
PHASE2-baseline.md). Pass content through unchanged; let the model
read the error text. Do NOT short-circuit on the flag.
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
(`[aish] mcp: alias.tool: <error.message>`); do not feed to model.
Do NOT append a tool turn — there was no result. The model will
re-plan on the next user turn.
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
invocation. When hit, surface as status and break — append the
assistant's last text (if any) and let the user reply.
- **Argument JSON might be invalid.** A model can stream malformed JSON
in `function.arguments`. `dkjson.decode` failure → status warning +
skip that call; do NOT execute on partial parse.
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
Plumb a `context.use_tool_role` flag (default true). If a real-world
rejection appears at Phase 7, flip the flag and convert tool turns to
`[tool: alias.name]\n<content>` prefix on the next user turn (same
pattern as `pending_exec_output`). v1 doesn't need to test the
fallback — just have the switch.
### Test checkpoints
After each commit, verify with a targeted probe before moving on:
| Commit | Verify |
|---|---|
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape |
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
### Commits expected: 7 (plus 1 manifest amendment if scope creeps)
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
in surface — single new file plus targeted extensions. Tracked to land
in one working session if the boltzmann proxy bugs don't intrude.
### Open at plan; resolve at review
- **Q18 plumbing only or default-flip?** Default `use_tool_role = true`
and ship the fallback as dead code? Or default `false` (prefix
injection) for safety until verify confirms? Recommend `true` —
protocol-layer probe in analyze said it worked; Phase 7 can flip if
reality differs.
- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles
servers that misreport `listChanged:false`)? v1: no — trust the
capability. Manual `:mcp disconnect` + `:mcp connect` is the
workaround if a server's tools change.
---
*End of Phase 2 Manifest — aish*