docs/PHASE2 §12: implementation plan — 7-commit roadmap
Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua → repl.lua → config.lua. Same cadence as Phase 0/1. Risks called out explicitly: - Empty tools array → omit field entirely (some servers reject []) - isError:false on actual failure (baseline §3 finding) → pass content through regardless; let model read error text - JSON-RPC error from tools/call → aish status only, no tool turn appended, no model recovery - max_tool_depth=8 cap on tool-call sub-loop - Argument JSON streaming may yield malformed JSON → status warn + skip - Q18 fallback (use_tool_role=true default; prefix-injection plumbed but dead-coded; verify can flip) - Connect-at-startup is sequential (~30ms × N); fine for N≤3 Two items left open for review: Q18 default flip vs ship-true-flip-on-fail, and whether :mcp connect should re-fetch tools after the initial cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+145
@@ -375,4 +375,149 @@ Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
|
||||
|
||||
---
|
||||
|
||||
## 12. Implementation Plan (commit-by-commit)
|
||||
|
||||
Bottom-up — start with modules with the fewest dependencies, end with the
|
||||
REPL wiring that exercises everything together. Same shape as Phase 0
|
||||
and Phase 1 implementation cadence.
|
||||
|
||||
### Order
|
||||
|
||||
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
|
||||
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
|
||||
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server
|
||||
Bearer auth (`auth_token` literal or `auth_env` indirection). Returns
|
||||
the raw `result` table for tool calls (caller distinguishes
|
||||
`isError`/content vs JSON-RPC `error`). **Test in isolation** via
|
||||
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
|
||||
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
|
||||
`router.lua` in the same commit.
|
||||
|
||||
2. **`safety.lua` — confirm-gate surface.** Implement just
|
||||
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
|
||||
`cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
|
||||
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
|
||||
in isolation** with mocked rl + various policy shapes.
|
||||
|
||||
3. **`context.lua` extensions.** Add `role:"tool"` support: turns can
|
||||
carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`.
|
||||
`to_messages()` rendering: an assistant turn with `tool_calls` keeps
|
||||
its `content` plus the `tool_calls` field; tool turns render with
|
||||
`role:"tool"` + `tool_call_id` + `content`. Alternation policy
|
||||
widens: assistant-with-tool_calls is followed by N tool turns. No
|
||||
change to `pending_exec_output` behavior. **Test in isolation** by
|
||||
building a context and round-tripping `to_messages()`.
|
||||
|
||||
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
|
||||
(top rule + `name(json-snippet)` indented dim) and
|
||||
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
|
||||
Visual parity with the exec frame. **Test visually** with a one-liner.
|
||||
|
||||
5. **`broker.lua` extensions.** `chat_stream` request body grows
|
||||
`tools = caller-supplied`; we don't reach into mcp from here, the
|
||||
caller passes the assembled schema. The on_delta callback widens to
|
||||
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text
|
||||
path unchanged. Tool-call path: accumulator keyed by `index`,
|
||||
concatenates `function.arguments` until `finish_reason: "tool_calls"`,
|
||||
then emits one `on_delta("tool_call", {id,name,arguments})` per
|
||||
completed call. `M.chat` (the buffering wrapper) keeps its current
|
||||
string-return shape for non-tool callers but optionally returns a
|
||||
`{text, tool_calls}` table when tools were involved. **Test against
|
||||
hossenfelder** with `tools` declared + streaming.
|
||||
|
||||
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
|
||||
populated from `config.mcp.servers` at startup. Helpers:
|
||||
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
|
||||
- `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
|
||||
- `ask_ai` loop now: stream response → if any tool_calls completed,
|
||||
for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
|
||||
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
|
||||
→ repeat until pure-text response or `max_tool_depth` reached
|
||||
- New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
|
||||
`:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
|
||||
**End-to-end test** via the REPL against a real boltzmann lmcp +
|
||||
hossenfelder broker.
|
||||
|
||||
7. **`config.lua` example block.** Add a commented-out `mcp = { servers
|
||||
= { boltzmann = {...} }, auto_approve = {...} }` example so users can
|
||||
see the shape. Not behavior-impacting; documentation only. Bundled
|
||||
with commit #6 if small or split if substantial.
|
||||
|
||||
### Risk / non-obvious
|
||||
|
||||
- **Empty tools array.** If `config.mcp.servers` is absent or all
|
||||
connects fail, the broker request body must **omit** `tools`
|
||||
entirely (some servers reject `"tools": []`). Don't send the field
|
||||
when empty.
|
||||
|
||||
- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
|
||||
N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
|
||||
per server, don't abort aish. Parallel via coroutines is out of scope
|
||||
here — sequential is fine for v1.
|
||||
|
||||
- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
|
||||
The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
|
||||
concatenating all `text` blocks and ignoring non-text. Log a status
|
||||
warning if non-text blocks are seen. Adequate for boltzmann/hertz
|
||||
tools (all text); image/resource tools deferred.
|
||||
|
||||
- **`isError: false` on actual failure** (baseline finding §3 of
|
||||
PHASE2-baseline.md). Pass content through unchanged; let the model
|
||||
read the error text. Do NOT short-circuit on the flag.
|
||||
|
||||
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
|
||||
(`[aish] mcp: alias.tool: <error.message>`); do not feed to model.
|
||||
Do NOT append a tool turn — there was no result. The model will
|
||||
re-plan on the next user turn.
|
||||
|
||||
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
|
||||
invocation. When hit, surface as status and break — append the
|
||||
assistant's last text (if any) and let the user reply.
|
||||
|
||||
- **Argument JSON might be invalid.** A model can stream malformed JSON
|
||||
in `function.arguments`. `dkjson.decode` failure → status warning +
|
||||
skip that call; do NOT execute on partial parse.
|
||||
|
||||
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
|
||||
Plumb a `context.use_tool_role` flag (default true). If a real-world
|
||||
rejection appears at Phase 7, flip the flag and convert tool turns to
|
||||
`[tool: alias.name]\n<content>` prefix on the next user turn (same
|
||||
pattern as `pending_exec_output`). v1 doesn't need to test the
|
||||
fallback — just have the switch.
|
||||
|
||||
### Test checkpoints
|
||||
|
||||
After each commit, verify with a targeted probe before moving on:
|
||||
|
||||
| Commit | Verify |
|
||||
|---|---|
|
||||
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
|
||||
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
|
||||
| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape |
|
||||
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
|
||||
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
|
||||
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
|
||||
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
|
||||
|
||||
### Commits expected: 7 (plus 1 manifest amendment if scope creeps)
|
||||
|
||||
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
|
||||
in surface — single new file plus targeted extensions. Tracked to land
|
||||
in one working session if the boltzmann proxy bugs don't intrude.
|
||||
|
||||
### Open at plan; resolve at review
|
||||
|
||||
- **Q18 plumbing only or default-flip?** Default `use_tool_role = true`
|
||||
and ship the fallback as dead code? Or default `false` (prefix
|
||||
injection) for safety until verify confirms? Recommend `true` —
|
||||
protocol-layer probe in analyze said it worked; Phase 7 can flip if
|
||||
reality differs.
|
||||
|
||||
- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles
|
||||
servers that misreport `listChanged:false`)? v1: no — trust the
|
||||
capability. Manual `:mcp disconnect` + `:mcp connect` is the
|
||||
workaround if a server's tools change.
|
||||
|
||||
---
|
||||
|
||||
*End of Phase 2 Manifest — aish*
|
||||
|
||||
Reference in New Issue
Block a user