docs/PHASE2: formulate — MCP client + tool-calling bridge

Phase 2 formulate manifest. Three pillars per PHASE0.md §11 row 2: mcp.lua (JSON-RPC 2.0 over HTTP+SSE, target: lmcp), tool-calling bridge (OpenAI tools field <-> MCP tools/call), and the safety.lua authorization gate (per-call confirm + auto_approve policy). Resolves PHASE0.md §13 Q6–Q10: Q6 CMD: + tool-calls coexist; substrate §3 unchanged Q7 config-declared servers + runtime :mcp connect Q8 per-call confirm default, auto_approve policy in config Q9 hybrid system prompt: static frame + dynamic tools body field Q10 streaming-from-day-one on Phase 1 SSE; on_delta widens to (kind, payload) New questions tracked in §11 (Q17–Q22): transport abstraction, role:tool vs prefix injection (mistral-nemo template verification needed), large tool-result handling, parallel dispatch, error mapping, aish-as-MCP-server (parked). §4 module layout amended: mcp.lua slots between broker.lua and router.lua. The amendment is documented in this manifest; the actual §4 table edit lands when implementation starts (Phase 2 implement phase). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:23:53 +00:00
parent f7c3c32aa2
commit ec6793c93c
1 changed files with 347 additions and 0 deletions
@@ -0,0 +1,347 @@
+# aish — Phase 2 Manifest
+
+**Project:** aish — AI-augmented conversational shell
+**Document:** Phase 2 Requirements, Architecture & Design Decisions
+**Status:** Formulate (pre-analysis)
+**Date:** 2026-05-12
+
+PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
+manifest specifies what Phase 2 adds. Section numbers reference back to
+PHASE0.md / PHASE1.md where relevant.
+
+---
+
+## 1. Scope of Phase 2
+
+Three pillars per PHASE0.md §11 row 2:
+
+1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
+   Target reference implementation: `lmcp`. Operations needed for v1:
+   `initialize`, `tools/list`, `tools/call`. Multiple servers may be
+   connected concurrently; tools are namespaced `<server>.<tool>`.
+2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
+   `tools` in the request body; the model emits `tool_calls` in the
+   response; `mcp.lua` dispatches each call to the right server; the
+   tool result is fed back as a `role:"tool"` turn in `context.lua` and
+   the chat continues.
+3. **Authorization gate** — `safety.lua` (PHASE0.md §4 stub) finally gets
+   implemented. Every tool call is confirmed by the user by default,
+   with per-tool and per-server `auto_approve` policies in `config.lua`.
+
+**Phase 2 is done when:**
+
+- aish can connect to at least one local `lmcp` server declared in
+  `config.lua` and one connected via `:mcp connect <url>` at runtime.
+- `:mcp list` shows connected servers; `:mcp tools` shows discovered
+  tools across all servers.
+- A model conversation can invoke a tool: the broker request carries
+  the live tools schema; the response's `tool_calls` are confirmed by
+  the user; each call dispatches to the right MCP server; the result
+  re-enters the chat; the model continues with the result available.
+- `CMD:` extraction (PHASE0.md §6 substrate invariant) still works
+  unchanged — Phase 2 is additive, not replacing.
+- A tool with `auto_approve = true` (in config) executes without the
+  confirm prompt; a non-approved tool still prompts.
+
+---
+
+## 2. Technology Decisions (delta from Phase 1)
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| MCP transport | HTTP+SSE per the MCP spec's HTTP transport flavor | Matches `lmcp`'s native transport. Reuses the libcurl easy interface + the Phase 1 SSE parser in `ffi/curl.lua`. Avoids spawning child processes (stdio transport requires Phase 1-style PTY plumbing per server, more moving parts). Stdio is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
+| MCP protocol version | `2025-03-26` (or whatever `lmcp` announces in its `initialize` response) | Track lmcp's spec target. Negotiated at connect time; aish caps at the spec version it knows. |
+| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
+| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
+| `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
+| Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
+| System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
+| Tool-call streaming | Streaming-from-day-one — `broker.chat_stream`'s on_delta callback widens to handle `tool_calls` deltas in addition to text deltas | Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. |
+| Tool-call concurrency | Sequential dispatch in Phase 2 v1 — process `tool_calls[0]` to completion, then `[1]`, etc. | Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20). |
+| MCP server lifecycle | aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) | Declared in config or connected by URL; aish is a client only. |
+
+---
+
+## 3. Module Changes
+
+| File | State after Phase 1 | Phase 2 changes |
+|---|---|---|
+| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, alias) -> session`, `session:initialize()`, `session:list_tools() -> [{name, description, schema}]`, `session:call_tool(name, args) -> result`, `session:close()`. JSON-RPC 2.0 over HTTP POST for client→server, SSE GET for server→client notifications. Per-session state: connected, tools-cache, pending request ID counter. |
+| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
+| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Request body grows `tools = mcp.tools_schema()` (assembled from all connected sessions). on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`; tool_call payload includes id+name+arguments-delta. `M.chat` wrapper updates to buffer both. |
+| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output` | New role: `"tool"`. Assistant turns may carry `tool_calls = [{id, name, arguments}]`. `to_messages()` flattens these into OpenAI-shape messages. Alternation rules: assistant-with-tool_calls is followed by N tool turns (one per call), then assistant text. |
+| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
+| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
+| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
+| `ffi/curl.lua` | post + post_sse | One probable addition: GET request helper (for the server→client SSE channel of MCP transport). Or wrap the SSE-GET inside `mcp.lua` directly if it's tight enough. Decided at analyze. |
+| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
+
+§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
+`router.lua` in the §4 table. Same commit lands the manifest amendment.
+
+---
+
+## 4. MCP Transport
+
+`lmcp` exposes an HTTP+SSE transport. From a client perspective there are
+two channels per session:
+
+```
+client → server:   POST <base-url>     Content-Type: application/json
+                                       Body: { jsonrpc:"2.0", id, method, params }
+                                       Returns: { jsonrpc, id, result | error }
+
+server → client:   GET  <base-url>     Accept: text/event-stream
+                                       (held open; receives notifications/...)
+```
+
+### Handshake (per MCP spec)
+
+1. `initialize` request: `{ protocolVersion, capabilities, clientInfo }`.
+2. Server response: `{ protocolVersion, capabilities, serverInfo }`. Cache
+   the announced server capabilities — only invoke RPCs the server says
+   it supports (e.g. `tools` capability).
+3. `notifications/initialized` (one-way notification) signals end of
+   handshake.
+
+### Tool discovery
+
+1. `tools/list` RPC → `{ tools: [{ name, description, inputSchema }] }`.
+2. Cache per-session. Re-fetch on `notifications/tools/list_changed`
+   from the SSE channel.
+
+### Tool invocation
+
+1. `tools/call` with `{ name, arguments }` → `{ content: [{type, text}], isError }`.
+2. The full text payload flows back to the model as the next `role:"tool"`
+   turn's content.
+
+### SSE channel
+
+Held open via `ffi/curl.post_sse` style — but using a GET (Phase 2 likely
+needs `M.get_sse` mirroring the POST variant). Each event is a JSON-RPC
+notification. Phase 2 v1 only listens for `notifications/tools/list_changed`;
+other notification types (progress, log) are ignored and tracked for
+later phases.
+
+### Lifecycle
+
+- Connect on startup (from `config.mcp.servers`) — best effort; failures
+  are status-logged, don't abort aish.
+- `:mcp connect <url>` adds a session at runtime; alias auto-derived from
+  hostname or supplied as second arg.
+- `:mcp disconnect <alias>` closes both channels.
+- On aish quit, all sessions are closed cleanly (best effort).
+
+---
+
+## 5. Tool-Call Bridge
+
+### Broker request body (delta from Phase 1)
+
+```json
+{
+  "model": "...",
+  "messages": [...],
+  "stream": true,
+  "temperature": 0.2,
+  "tools": [
+    { "type":"function",
+      "function": { "name":"<alias>.<tool>",
+                    "description":"...",
+                    "parameters": <inputSchema> } },
+    ...
+  ]
+}
+```
+
+The `tools` array is assembled by `mcp.tools_schema()` — flattens
+`tools/list` results from every connected session, namespacing each tool
+as `<alias>.<name>`.
+
+### Response handling (streaming)
+
+llama.cpp / OpenAI deltas may include:
+
+```json
+data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
+                "function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
+data: {"choices":[{"delta":{"tool_calls":[{"index":0,
+                "function":{"arguments":"1}"}}]}}]}
+data: {"choices":[{"finish_reason":"tool_calls",...}]}
+```
+
+`broker.chat_stream` accumulates tool-call deltas keyed by `index`; the
+`arguments` field is a JSON-string that arrives chunked and is concatenated.
+On `finish_reason: tool_calls`, the accumulated calls are emitted to
+on_delta as `kind="tool_call"` with full payloads.
+
+### Re-injection into context
+
+```lua
+-- After tool execution
+ctx:append({
+  role = "assistant",
+  content = "",                -- or any model-emitted text
+  tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
+})
+ctx:append({
+  role = "tool",
+  tool_call_id = "call_…",
+  content = <tool-result-text>,
+})
+```
+
+`to_messages()` renders both shapes for the next broker request. The
+strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
+handled differently here — tool turns ARE expected to follow assistant
+tool_calls per the OpenAI chat-template convention. If a model's
+template still rejects this shape, fall back to the `[tool: X]` prefix
+strategy used for exec output (Q18 below).
+
+### Re-issuing the broker request
+
+After tool turns are appended, the broker is called again with the
+extended messages array. The model may emit more `tool_calls`, more
+text, or both. Loop until the response has no `tool_calls` (i.e. a
+plain text assistant turn).
+
+Budget: a max-tool-call-depth setting (default 8) prevents runaway loops.
+Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
+
+---
+
+## 6. Authorization (safety.lua Phase 2 surface)
+
+```lua
+-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
+function M.confirm_tool_call(name, args, cfg)
+    local policy = cfg.mcp and cfg.mcp.auto_approve or {}
+    if policy[name] then return true end
+    -- Per-server prefix check: "alias.*" entries
+    local alias = name:match("^([^.]+)%.")
+    if alias and policy[alias .. ".*"] then return true end
+    -- Otherwise prompt
+    local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
+    local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
+    return ans:lower():sub(1,1) == "y"
+end
+```
+
+Config schema:
+
+```lua
+mcp = {
+    servers = {
+        local_fs = { url = "http://localhost:9700/" },
+        gitea    = { url = "http://localhost:9701/" },
+    },
+    auto_approve = {
+        ["local_fs.read_file"] = true,    -- specific tool
+        ["gitea.*"]            = true,    -- whole server
+    },
+    max_tool_depth = 8,
+}
+```
+
+Norris mode (Phase 3) will extend this: when autonomous, the destructive-op
+heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
+
+---
+
+## 7. Meta Commands (Phase 2 additions)
+
+| Command | Action |
+|---|---|
+| `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
+| `:mcp disconnect <alias>` | Close one session |
+| `:mcp list` | Show connected sessions (alias, url, tool count, status) |
+| `:mcp tools` | List tools across all sessions (`alias.name` — short description) |
+| `:mcp tool <alias.name>` | Show one tool's full inputSchema (debug aid) |
+
+Existing `:help` updated to list these.
+
+---
+
+## 8. System Prompt Augmentation
+
+`broker.lua`'s default system prompt grows by ~4 lines:
+
+```
+You may have access to MCP tools — they appear in this request's `tools`
+field. Call a tool by emitting a tool_call; the result will be supplied
+in the next turn. Use tools for structured operations (file reads,
+queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
+when available; fall back to `CMD:` for anything not exposed as a tool.
+```
+
+The actual tool list is in the `tools` request-body field, not the
+prompt. This avoids per-turn token bloat for the full schema.
+
+§3 substrate invariants are unchanged. The `CMD:` extraction marker stays
+the local-shell route; tools are the additive structured route.
+
+---
+
+## 9. Migration from Phase 1
+
+User-visible changes:
+- New `:mcp …` meta commands when MCP servers are configured or
+  connected at runtime.
+- Assistant responses may now invoke tools — user sees a confirm prompt
+  (similar to `CMD:` execution gate) followed by an indented tool-call
+  frame with the result.
+- `CMD:` lines still work exactly as before for shell.
+
+Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
+amended to add `mcp.lua`; that amendment ships in the manifest commit.
+
+`config.lua`: existing configs without an `mcp` section continue to work
+— no MCP servers means no tools sent in the broker request body, no
+auth checks, no behavior change.
+
+---
+
+## 10. Out of Scope (Phase 2)
+
+Per PHASE0.md §11, these belong elsewhere:
+- Chuck Norris autonomous mode (Phase 3) — even though tool-calls
+  enable richer autonomy, the *autonomous policy* is Phase 3's.
+- Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only
+  implements the per-call confirm-prompt surface.
+- `memory.jsonl` summarization across sessions (Phase 4).
+- Multi-model routing / cloud fallback (Phase 5).
+- Tree-sitter syntax highlighting (Phase 6).
+
+Specifically out of Phase 2 scope despite proximity:
+- Stdio-transport MCP servers (Q17 below).
+- Parallel tool-call dispatch (Q20).
+- MCP `resources/list` and `prompts/list` capabilities — Phase 2
+  v1 only implements `tools/*`. Resources/prompts deferred (probably
+  Phase 4 alongside memory).
+- Server-sent `notifications/progress` for long-running tool calls —
+  ignored in v1; status surface comes later.
+
+---
+
+## 11. Open Questions
+
+| # | Question | Impact | Resolve by |
+|---|---|---|---|
+| Q17 | MCP transport abstraction: design `mcp.lua` from day one for both HTTP+SSE and stdio transports (transport_t interface), or hard-code HTTP+SSE and refit if a stdio-only server appears? | mcp.lua API shape | Phase 2 (plan) |
+| Q18 | Tool result re-injection: standard OpenAI `role:"tool"` turn, or `[tool: X]` prefix to next user turn (matching the §6 exec-output pattern)? Strict chat templates may reject `tool` role — needs verification against mistral-nemo specifically. | context.lua + broker.lua | Phase 2 (analyze) |
+| Q19 | Large tool-result payloads (file reads, query dumps): pass-through, truncate at N chars, or summarize via fast model? Token-budget pressure scales with tool use. | context.lua + executor of tool-result | Phase 2 (plan); Phase 4 may refine with memory.jsonl |
+| Q20 | Parallel `tool_calls`: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only (declared in tool metadata)? | mcp.lua dispatch | Phase 2 (verify) — track for v2 |
+| Q21 | MCP server error mapping: JSON-RPC `error` response → tool-result content with `isError=true` (model can react), or aish-level transport error (broker aborts)? | mcp.lua + broker.lua | Phase 2 (plan) |
+| Q22 | aish's own command surface as an MCP server (eat your own dog food: expose `aish.exec`, `aish.read_file`, etc. via MCP so other clients can drive aish)? | scope expansion / new module | **Out of Phase 2.** Tracked for "maybe Phase 4 or later"; flagging here so it's not silently lost. |
+
+Resolved at formulate (above in §2 table):
+- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
+- Q7 (MCP discovery) — both, config-declared default + runtime `:mcp connect`.
+- Q8 (authorization) — per-call confirm default, per-tool/per-server `auto_approve` policy.
+- Q9 (system-prompt augmentation) — hybrid: static frame + dynamic `tools` body field.
+- Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.
+
+---
+
+*End of Phase 2 Manifest — aish*