Files
aish/docs/PHASE2.md
T
marfrit d3570ccea4 docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs
Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:

BLOCKERs:
  B1 context.lua impact widened — Phase 1 :append asserts content and
     discards extra fields. Need (a) shape-per-role assert, (b) preserve
     tool_calls/tool_call_id on store, (c) emit from to_messages().
  B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
     401 returns a non-JSON-RPC body that would have been mis-decoded.
  B3 §3 typo schema -> inputSchema.
  B4 pending_exec_output × tool-call sub-loop interaction specified.
  B5 §3/§12 broker dependency contradiction — broker takes opts.tools
     from caller; no layering inversion.

CONCERNs:
  C1 M.chat return polymorphism dropped (no consumer).
  C2 tool_calls[].index absent fallback: default to 0.
  C3 Re-injection stores accumulated text, not hard-coded empty.
  C4 :mcp connect failure: no auto-retry, status-log once.
  C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
     role:"tool" turn — keeps strict-template alternation legal
     exactly the way PHASE0 §6 demanded for exec output.
  C6 §9 confirms §4 amendment is additive (preserves §3 invariant).

NITs:
  N3 protocolVersion fallback (lmcp doesn't negotiate).
  N4 Alternation assert in Context:append.
  N7 Model-routing bug filed as aish#23.
  N8 Day-one fallback test for use_tool_role=false in commit #3.

Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:00:07 +00:00

36 KiB
Raw Blame History

aish — Phase 2 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 2 Requirements, Architecture & Design Decisions Status: Plan (review pass folded in 2026-05-12) Date: 2026-05-12

PHASE0.md is the locked substrate; PHASE1.md is layered on top. This manifest specifies what Phase 2 adds. Section numbers reference back to PHASE0.md / PHASE1.md where relevant.


1. Scope of Phase 2

Three pillars per PHASE0.md §11 row 2:

  1. MCP client (mcp.lua) — JSON-RPC 2.0 over HTTP+SSE transport. Target reference implementation: lmcp. Operations needed for v1: initialize, tools/list, tools/call. Multiple servers may be connected concurrently; tools are namespaced <server>.<tool>.
  2. Tool-calling protocol bridge — the broker sends OpenAI-compatible tools in the request body; the model emits tool_calls in the response; mcp.lua dispatches each call to the right server; the tool result is fed back as a role:"tool" turn in context.lua and the chat continues.
  3. Authorization gatesafety.lua (PHASE0.md §4 stub) finally gets implemented. Every tool call is confirmed by the user by default, with per-tool and per-server auto_approve policies in config.lua.

Phase 2 is done when:

  • aish can connect to at least one local lmcp server declared in config.lua and one connected via :mcp connect <url> at runtime.
  • :mcp list shows connected servers; :mcp tools shows discovered tools across all servers.
  • A model conversation can invoke a tool: the broker request carries the live tools schema; the response's tool_calls are confirmed by the user; each call dispatches to the right MCP server; the result re-enters the chat; the model continues with the result available.
  • CMD: extraction (PHASE0.md §6 substrate invariant) still works unchanged — Phase 2 is additive, not replacing.
  • A tool with auto_approve = true (in config) executes without the confirm prompt; a non-approved tool still prompts.

2. Technology Decisions (delta from Phase 1)

Decision Choice Rationale
MCP transport HTTP POST per RPC, Connection: close per response, no long-lived SSE GET channel in v1 Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's capabilities.tools.listChanged = false, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary.
MCP protocol version 2025-03-26 (confirmed by live probe of boltzmann:8080/mcp) lmcp pins this in MCP_VERSION and does not negotiate — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends 2025-03-26 in initialize and accepts whatever the server returns; on mismatch it logs [aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding and continues. v1 has no version-gated behavior to abort on.
MCP auth Bearer token via Authorization: Bearer <token> header, per-server Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports auth_token literal and auth_env env-var indirection per server (mirrors key_env in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil.
Tool-call wire format OpenAI tools field on /v1/chat/completions body; tool_calls on assistant deltas; role:"tool" turn with tool_call_id for results Standard, supported by llama.cpp and OpenRouter. Aligns with the existing /v1/chat/completions substrate invariant.
Tool namespacing <server-alias>.<tool-name> for both the wire-level tool name and :mcp tools listing Avoids name collisions across servers. The alias comes from the config key or the connect URL hash.
CMD: coexistence with tool-calls Both stay live, no policy preference. Substrate invariant §3 unchanged. Resolves Q6 (see §10). CMD: is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce.
Authorization default Per-call confirm (mirrors PHASE0.md §10 confirm_cmd for shell) Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8.
System prompt augmentation Hybrid: static frame in broker.lua system prompt + dynamic tools array in the request body Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9.
Tool-call streaming Streaming-from-day-one — broker.chat_stream's on_delta callback widens to handle tool_calls deltas in addition to text deltas Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. Wire shape confirmed at analyze (2026-05-12 probe vs hossenfelder): delta.tool_calls[] arrives indexed; id+type+function.name appear on the opening delta; function.arguments is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5.
Tool-call concurrency Sequential dispatch in Phase 2 v1 — process tool_calls[0] to completion, then [1], etc. Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20).
MCP server lifecycle aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) Declared in config or connected by URL; aish is a client only.

3. Module Changes

File State after Phase 1 Phase 2 changes
mcp.lua New file (not in PHASE0 §4 layout; this Phase amends the layout to add it) Implement: M.connect(url, opts) -> session (opts: alias, auth_token, auth_env), session:initialize(), session:list_tools() -> [{name, description, inputSchema}], session:call_tool(name, args) -> (result_table, kind) where kind ∈ {"ok","handler_error","rpc_error"} so callers can route the response per §4's error split, session:close(). JSON-RPC 2.0 over HTTP POST (Content-Type: application/json, Accept: application/json, Authorization: Bearer <token>). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's 401 {"error":"unauthorized"} body, which is NOT JSON-RPC-shaped — has no jsonrpc/id fields) from JSON-RPC envelope errors; needs ffi/curl.M.post extended to return status code (see ffi/curl.lua row).
safety.lua Stub Implement Phase 2 surface only: M.confirm_tool_call(tool_name, args, policy) -> bool. Reads config.mcp.auto_approve (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3.
broker.lua Streaming chat_stream(cfg, msgs, on_delta) Signature widens to chat_stream(cfg, msgs, on_delta, opts). opts.tools (optional array of {type, function:{name, description, parameters}}) is passed through to the request body; omitted entirely if absent or empty (some servers reject "tools": []). The on_delta callback widens to on_delta(kind, payload) where kind ∈ {"text", "tool_call"}. broker.lua does NOT depend on mcp.lua — repl assembles the tools array and passes it in; broker stays a transport layer. M.chat (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it).
context.lua turns = {{role, content}, ...} + pending_exec_output; Context:append asserts turn.content and rebuilds the entry as {role, content} only — extra fields are dropped Three concrete edits: (a) loosen :append so role == "assistant" can carry tool_calls = [{id, name, arguments}] with content allowed empty, and role == "tool" requires tool_call_id + content (the assert moves from "content required" to "shape per role"); (b) preserve tool_calls and tool_call_id in the stored turn (not just role+content); (c) to_messages() emits tool_calls on assistant turns and tool_call_id on tool turns. Add a debug assertion that role == "tool" follows an assistant turn with non-empty tool_calls (catches design bugs early; N4 in review). pending_exec_output interaction: the buffer persists across the tool-call sub-loop (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded.
repl.lua meta cmds + ask_ai stream loop After ask_ai sees tool_calls, enter a tool-execution sub-loop: confirm-gate each call via safety.confirm_tool_call, dispatch via mcp.session:call_tool, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: :mcp connect <url> [alias], :mcp list, :mcp tools, :mcp disconnect <alias>.
renderer.lua streaming text + exec frame Add tool_call_begin(name, args), tool_call_end(result, ok). Visual style: indented, dim, parallel to the exec frame.
config.lua example with models/shell/context/history Schema additions: mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }. Documented in §10 below.
ffi/curl.lua post + post_sse; M.post does not set FAILONERROR, so non-2xx responses return the body as a normal string One small extension: M.post returns (body, status_code) instead of just body (or surfaces status via a second slot). MCP auth failures from lmcp arrive as HTTP 401 with a non-JSON-RPC body ({"error":"unauthorized"}); mcp.lua must distinguish HTTP-level failure from JSON-RPC envelope errors. Phase 1 callers (broker.chat in its current shape) are unaffected — they ignore the second return value. No SSE GET channel is added (analyze finding ruled it out for lmcp).
history.lua JSONL session log Tool turns are logged like any other turn — {role:"tool", tool_call_id:"...", content:"..."}. Resume reconstructs them via ctx:append like user/assistant turns.

§4 module-layout amendment: mcp.lua slots between broker.lua and router.lua in the §4 table. Same commit lands the manifest amendment.


4. MCP Transport (analyze findings — lmcp v0.5.4)

lmcp implements only the synchronous POST flavor of the MCP streamable-HTTP spec. Each RPC is one HTTP transaction:

client → server:   POST /mcp           Content-Type: application/json
                                       Accept: application/json
                                       Authorization: Bearer <token>
                                       Body: { jsonrpc:"2.0", id, method, params }
                                       Returns: { jsonrpc, id, result | error }
                                       Connection: close

lmcp's GET /mcp exists but only sends a one-shot event: endpoint announcing the POST URL, then closes — there is no held-open server→client channel. Combined with the listChanged: false capability lmcp announces in initialize, aish does not open a persistent SSE channel to lmcp servers in v1. Notifications-from-server are out of scope here; track for v2 if a richer server appears.

Handshake

  1. initialize request: { protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }.
  2. Server response (lmcp): { protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }.
  3. Version mismatch: lmcp ignores client's protocolVersion and always returns its compiled-in MCP_VERSION (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status ([aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding) and continues. v1 has no version-gated behavior.
  4. notifications/initialized POST (one-way; lmcp returns HTTP 202 with no body).

Tool discovery

  1. tools/list RPC → { tools: [{ name, description, inputSchema }] }.
  2. Cache per-session for the session lifetime — lmcp announces listChanged: false, so there's no need to refetch or listen for change notifications.

Tool invocation

tools/call with { name, arguments }. Failure has three flavors and all of them result in a role:"tool" turn being appended so the assistant's tool_calls is never left orphaned in context (strict templates reject assistant.tool_calls without a matching tool reply — same gotcha PHASE0.md §6 warned about):

  • Tool-handler exception → JSON-RPC result with isError: true and content: [{ type:"text", text: "Error: ..." }]. Feed content straight back as the role:"tool" turn body. Model-recoverable.
  • Baseline isError: false on actual failure (PHASE2-baseline.md §3 found this — boltzmann's read_file returns content text containing "Error: ..." but isError: false). Pass content through unchanged — let the model read the text. isError is advisory, not authoritative.
  • JSON-RPC envelope error (e.g. {code: -32601, message: "Tool not found"}) → synthesize a role:"tool" turn with content = "[aish] tool dispatch failed: <error.message>" and the matching tool_call_id. Also surface a status line for the user. This both keeps alternation legal and tells the model what happened so its next plan is informed.
  • HTTP-level failure (auth, unreachable, timeout) → same shape: synthesize a role:"tool" turn with content = "[aish] tool transport error: <reason>". Same alternation rationale.

This split resolves Q21 (with the C5/C7 review fix folded in).

Lifecycle

  • Connect on startup (from config.mcp.servers) — best effort; failures are status-logged once, don't abort aish, and the session is absent from mcp_sessions until manually reconnected via :mcp connect. No automatic retry. "Connect" here means: do the initialize round-trip + cache tools/list results.
  • :mcp connect <url> adds a session at runtime; alias auto-derived from hostname or supplied as second arg.
  • :mcp disconnect <alias> drops cached state. There's no long-lived HTTP connection to close (every RPC was already Connection: close).
  • On aish quit, sessions are just forgotten — nothing to clean up server-side.
  • An unreachable server simply contributes no tools to the broker request body — the model is not told that tools were "meant" to be available. If tools_schema() returns empty across all sessions, the broker omits the tools field entirely.

5. Tool-Call Bridge

Broker request body (delta from Phase 1)

{
  "model": "...",
  "messages": [...],
  "stream": true,
  "temperature": 0.2,
  "tools": [
    { "type":"function",
      "function": { "name":"<alias>.<tool>",
                    "description":"...",
                    "parameters": <inputSchema> } },
    ...
  ]
}

The tools array is assembled by mcp.tools_schema() — flattens tools/list results from every connected session, namespacing each tool as <alias>.<name>.

Response handling (streaming)

llama.cpp / OpenAI deltas may include:

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
                "function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
                "function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}

broker.chat_stream accumulates tool-call deltas keyed by index; the arguments field is a JSON-string that arrives chunked and is concatenated. On finish_reason: tool_calls, the accumulated calls are emitted to on_delta as kind="tool_call" with full payloads.

Index-absent fallback: per the OpenAI spec, index is REQUIRED on streaming tool_calls[] deltas — but some local llama.cpp builds have been reported to omit it for single-call streams. If a delta has tool_calls but no index, treat it as index = 0 and accumulate into the slot-0 buffer. Log a one-shot debug status the first time this is observed per stream.

Re-injection into context

The assistant turn carries whatever text was streamed before finish_reason: tool_calls (which may be non-empty — models often say "Sure, let me look that up" before calling). The renderer flushes that text first, then renders the tool-call frame around dispatch.

-- After tool execution
ctx:append({
  role = "assistant",
  content = accumulated_text,    -- may be "" if model emitted no prose
  tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
})
ctx:append({
  role = "tool",
  tool_call_id = "call_…",
  content = <tool-result-text-or-synthesized-error>,
})

to_messages() renders both shapes for the next broker request. The strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is handled differently here — tool turns ARE expected to follow assistant tool_calls per the OpenAI chat-template convention. If a model's template still rejects this shape, fall back to the [tool: X] prefix strategy used for exec output (Q18 below — fallback is plumbed via the context.use_tool_role flag; default true).

Re-issuing the broker request

After tool turns are appended, the broker is called again with the extended messages array. The model may emit more tool_calls, more text, or both. Loop until the response has no tool_calls (i.e. a plain text assistant turn).

Budget: a max-tool-call-depth setting (default 8) prevents runaway loops. Hit-cap surfaces as a status: [aish] tool-call depth limit reached.


6. Authorization (safety.lua Phase 2 surface)

-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
function M.confirm_tool_call(name, args, cfg)
    local policy = cfg.mcp and cfg.mcp.auto_approve or {}
    if policy[name] then return true end
    -- Per-server prefix check: "alias.*" entries
    local alias = name:match("^([^.]+)%.")
    if alias and policy[alias .. ".*"] then return true end
    -- Otherwise prompt
    local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
    local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
    return ans:lower():sub(1,1) == "y"
end

Config schema (analyze-revised — Bearer auth fields added):

mcp = {
    servers = {
        boltzmann = {
            url       = "http://boltzmann.fritz.box:8080/mcp",
            auth_env  = "BOLTZMANN_MCP_TOKEN",  -- read from env at startup
        },
        broglie = {
            url = "http://broglie.fritz.box:8080/mcp",
            -- no auth (LAN-only deployment)
        },
        nc = {
            url        = "https://nc.reauktion.de:8080/mcp",
            auth_token = "literal-token-if-not-using-env",  -- alternative
        },
    },
    auto_approve = {
        ["boltzmann.read_file"] = true,    -- specific tool
        ["broglie.*"]           = true,    -- whole server
    },
    max_tool_depth = 8,
}

Auth precedence per server: auth_token literal > auth_env indirection

nil (no Authorization header sent). Mirrors PHASE0 §10's key_env convention for cloud model API keys.

Norris mode (Phase 3) will extend this: when autonomous, the destructive-op heuristic decides; for non-destructive tools, auto_approve. Outside scope here.


7. Meta Commands (Phase 2 additions)

Command Action
:mcp connect <url> [<alias>] Open a session; perform initialize + tools/list; add to active set
:mcp disconnect <alias> Close one session
:mcp list Show connected sessions (alias, url, tool count, status)
:mcp tools List tools across all sessions (alias.name — short description)
:mcp tool <alias.name> Show one tool's full inputSchema (debug aid)

Existing :help updated to list these.


8. System Prompt Augmentation

broker.lua's default system prompt grows by ~4 lines:

You may have access to MCP tools — they appear in this request's `tools`
field. Call a tool by emitting a tool_call; the result will be supplied
in the next turn. Use tools for structured operations (file reads,
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
when available; fall back to `CMD:` for anything not exposed as a tool.

The actual tool list is in the tools request-body field, not the prompt. This avoids per-turn token bloat for the full schema.

§3 substrate invariants are unchanged. The CMD: extraction marker stays the local-shell route; tools are the additive structured route.


9. Migration from Phase 1

User-visible changes:

  • New :mcp … meta commands when MCP servers are configured or connected at runtime.
  • Assistant responses may now invoke tools — user sees a confirm prompt (similar to CMD: execution gate) followed by an indented tool-call frame with the result.
  • CMD: lines still work exactly as before for shell.

Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4) amended to add mcp.lua (no rename of any existing file). Adding a new file is additive and preserves the §3 module-stability invariant ("File names are stable across phases — later phases fill in bodies, not rename files"). The amendment ships in commit #1 of the §12 plan (C6 in the review).

config.lua: existing configs without an mcp section continue to work — no MCP servers means no tools sent in the broker request body, no auth checks, no behavior change.


10. Out of Scope (Phase 2)

Per PHASE0.md §11, these belong elsewhere:

  • Chuck Norris autonomous mode (Phase 3) — even though tool-calls enable richer autonomy, the autonomous policy is Phase 3's.
  • Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only implements the per-call confirm-prompt surface.
  • memory.jsonl summarization across sessions (Phase 4).
  • Multi-model routing / cloud fallback (Phase 5).
  • Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 2 scope despite proximity:

  • Stdio-transport MCP servers (Q17 below).
  • Parallel tool-call dispatch (Q20).
  • MCP resources/list and prompts/list capabilities — Phase 2 v1 only implements tools/*. Resources/prompts deferred (probably Phase 4 alongside memory).
  • Server-sent notifications/progress for long-running tool calls — ignored in v1; status surface comes later.

11. Open Questions

# Question Impact Resolve by
Q17 MCP transport abstraction: stdio vs HTTP+SSE mcp.lua API shape Resolved at analyze. Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and listChanged: false removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here.
Q18 Tool-result re-injection: standard OpenAI role:"tool" turn, or [tool: X] prefix to next user turn (matching the §6 exec-output pattern)? context.lua + broker.lua Partly resolved. Live probe (2026-05-12, hossenfelder) shows role:"tool" accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is blocked by the hossenfelder proxy routing all model field values to the loaded fast model — see open-end below. Default v1 path: role:"tool" (standard); fallback to [tool: X] prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify.
Q19 Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model? context.lua + executor of tool-result Phase 2 (plan); Phase 4 may refine with memory.jsonl
Q20 Parallel tool_calls: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only? mcp.lua dispatch Phase 2 (verify) — track for v2
Q21 MCP error mapping mcp.lua + broker.lua Resolved at analyze. lmcp distinguishes: result.isError=true (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC error (unknown method/tool, transport-level, surface as aish status). See §4.
Q22 aish's own command surface as an MCP server scope expansion Out of Phase 2. Parked for Phase 4+ if interest stays.

Open-end carried forward to Phase 7 (verify):

  • Hossenfelder proxy model-field bug (separate from aish): the proxy at :8082 routes all requests to the loaded fast model regardless of the request's model field — chunks return "model":"qwen2.5-coder-1.5b-q4_k_m.gguf" even when mistral-nemo-12b-instruct was asked for. This blocks live-verification of mistral-nemo's chat-template tool-role behavior. Tracked as aish#23 (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at aish#15 — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.

Resolved at formulate (above in §2 table):

  • Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
  • Q7 (MCP discovery) — both, config-declared default + runtime :mcp connect.
  • Q8 (authorization) — per-call confirm default, per-tool/per-server auto_approve policy.
  • Q9 (system-prompt augmentation) — hybrid: static frame + dynamic tools body field.
  • Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.

Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):

  • Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
  • Q21 (error mapping) — isError vs JSON-RPC error split per §4.

12. Implementation Plan (commit-by-commit)

Bottom-up — start with modules with the fewest dependencies, end with the REPL wiring that exercises everything together. Same shape as Phase 0 and Phase 1 implementation cadence.

Order

  1. mcp.lua (new file) — JSON-RPC client. M.connect(url, opts), session:initialize() + :list_tools() + :call_tool(name, args) + :close(). Uses Phase 1's ffi/curl.M.post for transport — same commit lands the M.post extension to return (body, status_code) per §3 row so mcp.lua can distinguish HTTP 401 (non-JSON-RPC body {"error":"unauthorized"}) from JSON-RPC envelope errors. Per-server Bearer auth (auth_token literal or auth_env indirection). :call_tool returns (result_table, kind) where kind ∈ {"ok","handler_error","rpc_error"} so callers route per §4. Test in isolation via luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'. Also amends PHASE0.md §4 to list mcp.lua between broker.lua and router.lua in the same commit (additive — preserves §3 module-stability invariant per §9).

  2. safety.lua — confirm-gate surface. Implement just M.confirm_tool_call(name, args, cfg) per §6. Reads cfg.mcp.auto_approve for exact-match and alias.* glob. Falls back to rl.readline prompt. Norris-mode hooks stay out (Phase 3). Test in isolation with mocked rl + various policy shapes.

  3. context.lua extensions. Three concrete edits per §3 row: (a) loosen Context:append's assert from "content required" to shape-per-role (assistant may have empty content if tool_calls present; tool requires tool_call_id + content); (b) preserve tool_calls / tool_call_id in stored turns (not just role+content); (c) extend to_messages() to emit those fields. Add alternation assert (N4 in review). pending_exec_output is unchanged: buffer persists across tool-call sub-loops; flushes on next genuine user turn (§3 row). Tests in isolation: (i) build a context with assistant+tool_calls + tool turns, round-trip through to_messages(), eyeball JSON shape; (ii) day-one fallback test (N8) — same context with use_tool_role = false must emit the [tool: alias.name]\n… prefix shape instead of a role:"tool" message.

  4. renderer.lua extensions. Add M.tool_call_begin(name, args) (top rule + name(json-snippet) indented dim) and M.tool_call_end(content, is_error) (bottom rule with dim/red status). Visual parity with the exec frame. Test visually with a one-liner.

  5. broker.lua extensions. Signature widens: chat_stream(cfg, msgs, on_delta, opts). opts.tools (optional array) is passed through to the request body; omitted entirely when nil or empty. The on_delta callback widens to on_delta(kind, payload) where kind ∈ {"text","tool_call"}. Text path unchanged. Tool-call path: accumulator keyed by index (default 0 if absent — C2), concatenates function.arguments until finish_reason: "tool_calls", then emits one on_delta("tool_call", {id,name,arguments}) per completed call. M.chat shape unchanged in this phase (C1 in review — no caller for a polymorphic return). Test against hossenfelder with tools declared + streaming.

  6. repl.lua wiring. New module-local mcp_sessions = {alias=session,...}, populated from config.mcp.servers at startup. Helpers:

    • tools_schema() → flatten tool lists across sessions, namespace alias.name
    • dispatch_tool_call(call) → split alias.tool, look up session, call, return content
    • ask_ai loop now: stream response → if any tool_calls completed, for each call: safety.confirm_tool_calldispatch_tool_call → append assistant-with-tool_calls + tool turn → re-call broker.chat_stream → repeat until pure-text response or max_tool_depth reached
    • New meta cmds: :mcp list, :mcp tools, :mcp tool <name>, :mcp connect <url> [alias], :mcp disconnect <alias> End-to-end test via the REPL against a real boltzmann lmcp + hossenfelder broker.
  7. config.lua example block. Add a commented-out mcp = { servers = { boltzmann = {...} }, auto_approve = {...} } example so users can see the shape. Not behavior-impacting; documentation only. Bundled with commit #6 if small or split if substantial.

Risk / non-obvious

  • Empty tools array. If config.mcp.servers is absent or all connects fail, the broker request body must omit tools entirely (some servers reject "tools": []). Don't send the field when empty.

  • Connect-at-startup blocking. N servers × ~30 ms init+list. For N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged per server, don't abort aish. Parallel via coroutines is out of scope here — sequential is fine for v1.

  • Content blocks beyond text. lmcp returns [{type:"text", text:...}]. The spec allows type:"image" | "resource". Phase 2 v1 flattens by concatenating all text blocks and ignoring non-text. Log a status warning if non-text blocks are seen. Adequate for boltzmann/hertz tools (all text); image/resource tools deferred.

  • isError: false on actual failure (baseline finding §3 of PHASE2-baseline.md). Pass content through unchanged; let the model read the error text. Do NOT short-circuit on the flag.

  • JSON-RPC error from tools/call. Surface as aish status AND synthesize a role:"tool" turn with content = "[aish] tool dispatch failed: <error.message>" and the matching tool_call_id. The alternation rationale (§4) requires this — leaving the assistant's tool_calls orphaned breaks strict chat templates exactly the way PHASE0.md §6 warned about. The model receives the error and can re-plan within the same turn.

  • Tool-call sub-loop bounds. max_tool_depth (default 8) per ask_ai invocation. When hit, surface as status and break — append the assistant's last text (if any) and let the user reply.

  • Argument JSON might be invalid. A model can stream malformed JSON in function.arguments. dkjson.decode failure → DO NOT execute on partial parse. Synthesize a role:"tool" turn with content = "[aish] tool arguments not parseable as JSON: <decode-err>" and the matching tool_call_id (same alternation rationale as JSON-RPC error above; C7 in review).

  • Q18 fallback path (strict templates rejecting role:"tool"). Plumb a context.use_tool_role flag (default true). If a real-world rejection appears at Phase 7, flip the flag and convert tool turns to [tool: alias.name]\n<content> prefix on the next user turn (same pattern as pending_exec_output). Day-one verification (N8 in review): commit #3 includes a small in-isolation test that builds a context with use_tool_role = false, appends an assistant+tool_calls turn followed by a tool result, and confirms to_messages() emits the prefix shape instead of a role:"tool" turn. Keeps the fallback alive rather than dead-coded until Phase 7 first runs it under stress.

Test checkpoints

After each commit, verify with a targeted probe before moving on:

Commit Verify
#1 mcp.lua luajit -e "local m=require('mcp'); ..." connects + lists tools against boltzmann lmcp
#2 safety.lua unit-test policy lookup with mock rl: exact match → true; * glob → true; miss → prompt invoked
#3 context.lua (i) round-trip a context with tool turns through to_messages(), eyeball JSON shape; (ii) day-one fallback test with use_tool_role = false emits the [tool: …] prefix shape (N8)
#4 renderer.lua one-liner emits frame around fake tool result
#5 broker.lua curl-compare: hand-built request body with tools matches broker.chat_stream(cfg, msgs, on_delta) body
#6 repl.lua full REPL: :mcp list shows boltzmann; question that triggers list_dir round-trips through confirm + execution + model continuation
#7 config.lua aish starts with example mcp section present; no MCP servers connected means no tools field sent

Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)

Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller in surface — single new file plus targeted extensions. Tracked to land in one working session if the boltzmann proxy bugs don't intrude.

Resolved at review (2026-05-12)

  • Q18 defaultuse_tool_role = true defaulted, fallback exercised day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if mistral-nemo (once #23 is fixed) rejects.
  • :mcp connect re-fetch policy — v1 trusts the listChanged: false capability; manual disconnect+reconnect is the workaround if a server's tools change. No automatic re-fetch.

Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)

Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against the formulate+analyze+plan draft. Resolutions applied in this revision:

  • B1 context.lua impact widened — assert loosening + field preservation + to_messages emit are now explicit in §3.
  • B2 ffi/curl.M.post extended to return (body, status_code) so mcp.lua distinguishes HTTP 401 from JSON-RPC envelope errors.
  • B3 inputSchema typo fixed in §3 mcp.lua row.
  • B4 pending_exec_output × tool-call sub-loop interaction specified (persists across; flushes on next user turn).
  • B5 §3/§12 dependency contradiction resolved — broker takes opts.tools from the caller; no layering inversion.
  • C1 M.chat polymorphic return dropped.
  • C2 Index-absent fallback specified (default to 0).
  • C3 Re-injection example now stores accumulated text in the assistant turn, not hard-coded empty string.
  • C4 :mcp connect failure semantics specified (no auto-retry).
  • C5/C7 Both orphan-tool_calls scenarios now synthesize a role:"tool" turn with [aish] tool dispatch failed: ... content to preserve alternation.
  • C6 §9 explicitly notes the §4 amendment is additive.
  • N3 protocolVersion fallback specified (lmcp doesn't negotiate).
  • N4 alternation assert added to context.lua row.
  • N7 model-routing bug filed as aish#23.
  • N8 day-one fallback test added to commit #3 checkpoints.

CONCERNs / NITs not folded (defended as wording-only, not load-bearing): N1, N2, N5, N6 — left as-is.


End of Phase 2 Manifest — aish