Files

T

marfrit d3570ccea4 docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs

Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:

BLOCKERs:
  B1 context.lua impact widened — Phase 1 :append asserts content and
     discards extra fields. Need (a) shape-per-role assert, (b) preserve
     tool_calls/tool_call_id on store, (c) emit from to_messages().
  B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
     401 returns a non-JSON-RPC body that would have been mis-decoded.
  B3 §3 typo schema -> inputSchema.
  B4 pending_exec_output × tool-call sub-loop interaction specified.
  B5 §3/§12 broker dependency contradiction — broker takes opts.tools
     from caller; no layering inversion.

CONCERNs:
  C1 M.chat return polymorphism dropped (no consumer).
  C2 tool_calls[].index absent fallback: default to 0.
  C3 Re-injection stores accumulated text, not hard-coded empty.
  C4 :mcp connect failure: no auto-retry, status-log once.
  C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
     role:"tool" turn — keeps strict-template alternation legal
     exactly the way PHASE0 §6 demanded for exec output.
  C6 §9 confirms §4 amendment is additive (preserves §3 invariant).

NITs:
  N3 protocolVersion fallback (lmcp doesn't negotiate).
  N4 Alternation assert in Context:append.
  N7 Model-routing bug filed as aish#23.
  N8 Day-one fallback test for use_tool_role=false in commit #3.

Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 13:00:07 +00:00

36 KiB

Raw Blame History

aish — Phase 2 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 2 Requirements, Architecture & Design Decisions Status: Plan (review pass folded in 2026-05-12) Date: 2026-05-12

PHASE0.md is the locked substrate; PHASE1.md is layered on top. This manifest specifies what Phase 2 adds. Section numbers reference back to PHASE0.md / PHASE1.md where relevant.

1. Scope of Phase 2

Three pillars per PHASE0.md §11 row 2:

MCP client (mcp.lua) — JSON-RPC 2.0 over HTTP+SSE transport. Target reference implementation: lmcp. Operations needed for v1: initialize, tools/list, tools/call. Multiple servers may be connected concurrently; tools are namespaced <server>.<tool>.
Tool-calling protocol bridge — the broker sends OpenAI-compatible tools in the request body; the model emits tool_calls in the response; mcp.lua dispatches each call to the right server; the tool result is fed back as a role:"tool" turn in context.lua and the chat continues.
Authorization gate — safety.lua (PHASE0.md §4 stub) finally gets implemented. Every tool call is confirmed by the user by default, with per-tool and per-server auto_approve policies in config.lua.

Phase 2 is done when:

aish can connect to at least one local lmcp server declared in config.lua and one connected via :mcp connect <url> at runtime.
:mcp list shows connected servers; :mcp tools shows discovered tools across all servers.
A model conversation can invoke a tool: the broker request carries the live tools schema; the response's tool_calls are confirmed by the user; each call dispatches to the right MCP server; the result re-enters the chat; the model continues with the result available.
CMD: extraction (PHASE0.md §6 substrate invariant) still works unchanged — Phase 2 is additive, not replacing.
A tool with auto_approve = true (in config) executes without the confirm prompt; a non-approved tool still prompts.

2. Technology Decisions (delta from Phase 1)

Decision	Choice	Rationale
MCP transport	HTTP POST per RPC, `Connection: close` per response, no long-lived SSE GET channel in v1	Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary.
MCP protocol version	`2025-03-26` (confirmed by live probe of boltzmann:8080/mcp)	lmcp pins this in `MCP_VERSION` and does not negotiate — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on.
MCP auth	Bearer token via `Authorization: Bearer <token>` header, per-server	Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil.
Tool-call wire format	OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results	Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant.
Tool namespacing	`<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing	Avoids name collisions across servers. The alias comes from the config key or the connect URL hash.
`CMD:` coexistence with tool-calls	Both stay live, no policy preference. Substrate invariant §3 unchanged.	Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce.
Authorization default	Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell)	Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8.
System prompt augmentation	Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body	Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9.
Tool-call streaming	Streaming-from-day-one — `broker.chat_stream`'s on_delta callback widens to handle `tool_calls` deltas in addition to text deltas	Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. Wire shape confirmed at analyze (2026-05-12 probe vs hossenfelder): `delta.tool_calls[]` arrives indexed; id+type+function.name appear on the opening delta; `function.arguments` is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5.
Tool-call concurrency	Sequential dispatch in Phase 2 v1 — process `tool_calls[0]` to completion, then `[1]`, etc.	Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20).
MCP server lifecycle	aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule)	Declared in config or connected by URL; aish is a client only.

3. Module Changes

File	State after Phase 1	Phase 2 changes
`mcp.lua`	New file (not in PHASE0 §4 layout; this Phase amends the layout to add it)	Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, inputSchema}]`, `session:call_tool(name, args) -> (result_table, kind)` where `kind ∈ {"ok","handler_error","rpc_error"}` so callers can route the response per §4's error split, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's `401 {"error":"unauthorized"}` body, which is NOT JSON-RPC-shaped — has no `jsonrpc`/`id` fields) from JSON-RPC envelope errors; needs `ffi/curl.M.post` extended to return status code (see ffi/curl.lua row).
`safety.lua`	Stub	Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3.
`broker.lua`	Streaming `chat_stream(cfg, msgs, on_delta)`	Signature widens to `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional array of `{type, function:{name, description, parameters}}`) is passed through to the request body; omitted entirely if absent or empty (some servers reject `"tools": []`). The on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`. `broker.lua` does NOT depend on `mcp.lua` — repl assembles the tools array and passes it in; broker stays a transport layer. `M.chat` (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it).
`context.lua`	turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped	Three concrete edits: (a) loosen `:append` so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) preserve `tool_calls` and `tool_call_id` in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). `pending_exec_output` interaction: the buffer persists across the tool-call sub-loop (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded.
`repl.lua`	meta cmds + ask_ai stream loop	After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`.
`renderer.lua`	streaming text + exec frame	Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame.
`config.lua`	example with models/shell/context/history	Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below.
`ffi/curl.lua`	post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string	One small extension: `M.post` returns `(body, status_code)` instead of just `body` (or surfaces status via a second slot). MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. Phase 1 callers (broker.chat in its current shape) are unaffected — they ignore the second return value. No SSE GET channel is added (analyze finding ruled it out for lmcp).
`history.lua`	JSONL session log	Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns.

§4 module-layout amendment: mcp.lua slots between broker.lua and router.lua in the §4 table. Same commit lands the manifest amendment.

4. MCP Transport (analyze findings — lmcp v0.5.4)

lmcp implements only the synchronous POST flavor of the MCP streamable-HTTP spec. Each RPC is one HTTP transaction:

client → server:   POST /mcp           Content-Type: application/json
                                       Accept: application/json
                                       Authorization: Bearer <token>
                                       Body: { jsonrpc:"2.0", id, method, params }
                                       Returns: { jsonrpc, id, result | error }
                                       Connection: close

lmcp's GET /mcp exists but only sends a one-shot event: endpoint announcing the POST URL, then closes — there is no held-open server→client channel. Combined with the listChanged: false capability lmcp announces in initialize, aish does not open a persistent SSE channel to lmcp servers in v1. Notifications-from-server are out of scope here; track for v2 if a richer server appears.

Handshake

initialize request: { protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }.
Server response (lmcp): { protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }.
Version mismatch: lmcp ignores client's protocolVersion and always returns its compiled-in MCP_VERSION (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status ([aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding) and continues. v1 has no version-gated behavior.
notifications/initialized POST (one-way; lmcp returns HTTP 202 with no body).

Tool discovery

tools/list RPC → { tools: [{ name, description, inputSchema }] }.
Cache per-session for the session lifetime — lmcp announces listChanged: false, so there's no need to refetch or listen for change notifications.

Tool invocation

tools/call with { name, arguments }. Failure has three flavors and all of them result in a role:"tool" turn being appended so the assistant's tool_calls is never left orphaned in context (strict templates reject assistant.tool_calls without a matching tool reply — same gotcha PHASE0.md §6 warned about):

Tool-handler exception → JSON-RPC result with isError: true and content: [{ type:"text", text: "Error: ..." }]. Feed content straight back as the role:"tool" turn body. Model-recoverable.
Baseline isError: false on actual failure (PHASE2-baseline.md §3 found this — boltzmann's read_file returns content text containing "Error: ..." but isError: false). Pass content through unchanged — let the model read the text. isError is advisory, not authoritative.
JSON-RPC envelope error (e.g. {code: -32601, message: "Tool not found"}) → synthesize a role:"tool" turn with content = "[aish] tool dispatch failed: <error.message>" and the matching tool_call_id. Also surface a status line for the user. This both keeps alternation legal and tells the model what happened so its next plan is informed.
HTTP-level failure (auth, unreachable, timeout) → same shape: synthesize a role:"tool" turn with content = "[aish] tool transport error: <reason>". Same alternation rationale.

This split resolves Q21 (with the C5/C7 review fix folded in).

Lifecycle

Connect on startup (from config.mcp.servers) — best effort; failures are status-logged once, don't abort aish, and the session is absent from mcp_sessions until manually reconnected via :mcp connect. No automatic retry. "Connect" here means: do the initialize round-trip + cache tools/list results.
:mcp connect <url> adds a session at runtime; alias auto-derived from hostname or supplied as second arg.
:mcp disconnect <alias> drops cached state. There's no long-lived HTTP connection to close (every RPC was already Connection: close).
On aish quit, sessions are just forgotten — nothing to clean up server-side.
An unreachable server simply contributes no tools to the broker request body — the model is not told that tools were "meant" to be available. If tools_schema() returns empty across all sessions, the broker omits the tools field entirely.

5. Tool-Call Bridge

Broker request body (delta from Phase 1)

{
  "model": "...",
  "messages": [...],
  "stream": true,
  "temperature": 0.2,
  "tools": [
    { "type":"function",
      "function": { "name":"<alias>.<tool>",
                    "description":"...",
                    "parameters": <inputSchema> } },
    ...
  ]
}

The tools array is assembled by mcp.tools_schema() — flattens tools/list results from every connected session, namespacing each tool as <alias>.<name>.

Response handling (streaming)

llama.cpp / OpenAI deltas may include:

data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
                "function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
                "function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}

broker.chat_stream accumulates tool-call deltas keyed by index; the arguments field is a JSON-string that arrives chunked and is concatenated. On finish_reason: tool_calls, the accumulated calls are emitted to on_delta as kind="tool_call" with full payloads.

Index-absent fallback: per the OpenAI spec, index is REQUIRED on streaming tool_calls[] deltas — but some local llama.cpp builds have been reported to omit it for single-call streams. If a delta has tool_calls but no index, treat it as index = 0 and accumulate into the slot-0 buffer. Log a one-shot debug status the first time this is observed per stream.

Re-injection into context

The assistant turn carries whatever text was streamed before finish_reason: tool_calls (which may be non-empty — models often say "Sure, let me look that up" before calling). The renderer flushes that text first, then renders the tool-call frame around dispatch.

-- After tool execution
ctx:append({
  role = "assistant",
  content = accumulated_text,    -- may be "" if model emitted no prose
  tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
})
ctx:append({
  role = "tool",
  tool_call_id = "call_…",
  content = <tool-result-text-or-synthesized-error>,
})

to_messages() renders both shapes for the next broker request. The strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is handled differently here — tool turns ARE expected to follow assistant tool_calls per the OpenAI chat-template convention. If a model's template still rejects this shape, fall back to the [tool: X] prefix strategy used for exec output (Q18 below — fallback is plumbed via the context.use_tool_role flag; default true).

Re-issuing the broker request

After tool turns are appended, the broker is called again with the extended messages array. The model may emit more tool_calls, more text, or both. Loop until the response has no tool_calls (i.e. a plain text assistant turn).

Budget: a max-tool-call-depth setting (default 8) prevents runaway loops. Hit-cap surfaces as a status: [aish] tool-call depth limit reached.

6. Authorization (safety.lua Phase 2 surface)

-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
function M.confirm_tool_call(name, args, cfg)
    local policy = cfg.mcp and cfg.mcp.auto_approve or {}
    if policy[name] then return true end
    -- Per-server prefix check: "alias.*" entries
    local alias = name:match("^([^.]+)%.")
    if alias and policy[alias .. ".*"] then return true end
    -- Otherwise prompt
    local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
    local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
    return ans:lower():sub(1,1) == "y"
end

Config schema (analyze-revised — Bearer auth fields added):

mcp = {
    servers = {
        boltzmann = {
            url       = "http://boltzmann.fritz.box:8080/mcp",
            auth_env  = "BOLTZMANN_MCP_TOKEN",  -- read from env at startup
        },
        broglie = {
            url = "http://broglie.fritz.box:8080/mcp",
            -- no auth (LAN-only deployment)
        },
        nc = {
            url        = "https://nc.reauktion.de:8080/mcp",
            auth_token = "literal-token-if-not-using-env",  -- alternative
        },
    },
    auto_approve = {
        ["boltzmann.read_file"] = true,    -- specific tool
        ["broglie.*"]           = true,    -- whole server
    },
    max_tool_depth = 8,
}

Auth precedence per server: auth_token literal > auth_env indirection

nil (no Authorization header sent). Mirrors PHASE0 §10's key_env convention for cloud model API keys.

Norris mode (Phase 3) will extend this: when autonomous, the destructive-op heuristic decides; for non-destructive tools, auto_approve. Outside scope here.

7. Meta Commands (Phase 2 additions)

Command	Action
`:mcp connect <url> [<alias>]`	Open a session; perform initialize + tools/list; add to active set
`:mcp disconnect <alias>`	Close one session
`:mcp list`	Show connected sessions (alias, url, tool count, status)
`:mcp tools`	List tools across all sessions (`alias.name` — short description)
`:mcp tool <alias.name>`	Show one tool's full inputSchema (debug aid)

Existing :help updated to list these.

8. System Prompt Augmentation

broker.lua's default system prompt grows by ~4 lines:

You may have access to MCP tools — they appear in this request's `tools`
field. Call a tool by emitting a tool_call; the result will be supplied
in the next turn. Use tools for structured operations (file reads,
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
when available; fall back to `CMD:` for anything not exposed as a tool.

The actual tool list is in the tools request-body field, not the prompt. This avoids per-turn token bloat for the full schema.

§3 substrate invariants are unchanged. The CMD: extraction marker stays the local-shell route; tools are the additive structured route.

9. Migration from Phase 1

User-visible changes:

New :mcp … meta commands when MCP servers are configured or connected at runtime.
Assistant responses may now invoke tools — user sees a confirm prompt (similar to CMD: execution gate) followed by an indented tool-call frame with the result.
CMD: lines still work exactly as before for shell.

Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4) amended to add mcp.lua (no rename of any existing file). Adding a new file is additive and preserves the §3 module-stability invariant ("File names are stable across phases — later phases fill in bodies, not rename files"). The amendment ships in commit #1 of the §12 plan (C6 in the review).

config.lua: existing configs without an mcp section continue to work — no MCP servers means no tools sent in the broker request body, no auth checks, no behavior change.

10. Out of Scope (Phase 2)

Per PHASE0.md §11, these belong elsewhere:

Chuck Norris autonomous mode (Phase 3) — even though tool-calls enable richer autonomy, the autonomous policy is Phase 3's.
Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only implements the per-call confirm-prompt surface.
memory.jsonl summarization across sessions (Phase 4).
Multi-model routing / cloud fallback (Phase 5).
Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 2 scope despite proximity:

Stdio-transport MCP servers (Q17 below).
Parallel tool-call dispatch (Q20).
MCP resources/list and prompts/list capabilities — Phase 2 v1 only implements tools/*. Resources/prompts deferred (probably Phase 4 alongside memory).
Server-sent notifications/progress for long-running tool calls — ignored in v1; status surface comes later.

11. Open Questions

#	Question	Impact	Resolve by
Q17	~~MCP transport abstraction: stdio vs HTTP+SSE~~	mcp.lua API shape	Resolved at analyze. Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and `listChanged: false` removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here.
Q18	Tool-result re-injection: standard OpenAI `role:"tool"` turn, or `[tool: X]` prefix to next user turn (matching the §6 exec-output pattern)?	context.lua + broker.lua	Partly resolved. Live probe (2026-05-12, hossenfelder) shows `role:"tool"` accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is blocked by the hossenfelder proxy routing all `model` field values to the loaded fast model — see open-end below. Default v1 path: `role:"tool"` (standard); fallback to `[tool: X]` prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify.
Q19	Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model?	context.lua + executor of tool-result	Phase 2 (plan); Phase 4 may refine with memory.jsonl
Q20	Parallel `tool_calls`: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only?	mcp.lua dispatch	Phase 2 (verify) — track for v2
Q21	~~MCP error mapping~~	mcp.lua + broker.lua	Resolved at analyze. lmcp distinguishes: `result.isError=true` (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC `error` (unknown method/tool, transport-level, surface as aish status). See §4.
Q22	aish's own command surface as an MCP server	scope expansion	Out of Phase 2. Parked for Phase 4+ if interest stays.

Open-end carried forward to Phase 7 (verify):

Hossenfelder proxy model-field bug (separate from aish): the proxy at :8082 routes all requests to the loaded fast model regardless of the request's model field — chunks return "model":"qwen2.5-coder-1.5b-q4_k_m.gguf" even when mistral-nemo-12b-instruct was asked for. This blocks live-verification of mistral-nemo's chat-template tool-role behavior. Tracked as aish#23 (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at aish#15 — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.

Resolved at formulate (above in §2 table):

Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
Q7 (MCP discovery) — both, config-declared default + runtime :mcp connect.
Q8 (authorization) — per-call confirm default, per-tool/per-server auto_approve policy.
Q9 (system-prompt augmentation) — hybrid: static frame + dynamic tools body field.
Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.

Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):

Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
Q21 (error mapping) — isError vs JSON-RPC error split per §4.

12. Implementation Plan (commit-by-commit)

Bottom-up — start with modules with the fewest dependencies, end with the REPL wiring that exercises everything together. Same shape as Phase 0 and Phase 1 implementation cadence.

Order

mcp.lua (new file) — JSON-RPC client. M.connect(url, opts), session:initialize() + :list_tools() + :call_tool(name, args) + :close(). Uses Phase 1's ffi/curl.M.post for transport — same commit lands the M.post extension to return (body, status_code) per §3 row so mcp.lua can distinguish HTTP 401 (non-JSON-RPC body {"error":"unauthorized"}) from JSON-RPC envelope errors. Per-server Bearer auth (auth_token literal or auth_env indirection). :call_tool returns (result_table, kind) where kind ∈ {"ok","handler_error","rpc_error"} so callers route per §4. Test in isolation via luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'. Also amends PHASE0.md §4 to list mcp.lua between broker.lua and router.lua in the same commit (additive — preserves §3 module-stability invariant per §9).
safety.lua — confirm-gate surface. Implement just M.confirm_tool_call(name, args, cfg) per §6. Reads cfg.mcp.auto_approve for exact-match and alias.* glob. Falls back to rl.readline prompt. Norris-mode hooks stay out (Phase 3). Test in isolation with mocked rl + various policy shapes.
context.lua extensions. Three concrete edits per §3 row: (a) loosen Context:append's assert from "content required" to shape-per-role (assistant may have empty content if tool_calls present; tool requires tool_call_id + content); (b) preserve tool_calls / tool_call_id in stored turns (not just role+content); (c) extend to_messages() to emit those fields. Add alternation assert (N4 in review). pending_exec_output is unchanged: buffer persists across tool-call sub-loops; flushes on next genuine user turn (§3 row). Tests in isolation: (i) build a context with assistant+tool_calls + tool turns, round-trip through to_messages(), eyeball JSON shape; (ii) day-one fallback test (N8) — same context with use_tool_role = false must emit the [tool: alias.name]\n… prefix shape instead of a role:"tool" message.
renderer.lua extensions. Add M.tool_call_begin(name, args) (top rule + name(json-snippet) indented dim) and M.tool_call_end(content, is_error) (bottom rule with dim/red status). Visual parity with the exec frame. Test visually with a one-liner.
broker.lua extensions. Signature widens: chat_stream(cfg, msgs, on_delta, opts). opts.tools (optional array) is passed through to the request body; omitted entirely when nil or empty. The on_delta callback widens to on_delta(kind, payload) where kind ∈ {"text","tool_call"}. Text path unchanged. Tool-call path: accumulator keyed by index (default 0 if absent — C2), concatenates function.arguments until finish_reason: "tool_calls", then emits one on_delta("tool_call", {id,name,arguments}) per completed call. M.chat shape unchanged in this phase (C1 in review — no caller for a polymorphic return). Test against hossenfelder with tools declared + streaming.
repl.lua wiring. New module-local mcp_sessions = {alias=session,...}, populated from config.mcp.servers at startup. Helpers:
- tools_schema() → flatten tool lists across sessions, namespace alias.name
- dispatch_tool_call(call) → split alias.tool, look up session, call, return content
- ask_ai loop now: stream response → if any tool_calls completed, for each call: safety.confirm_tool_call → dispatch_tool_call → append assistant-with-tool_calls + tool turn → re-call broker.chat_stream → repeat until pure-text response or max_tool_depth reached
- New meta cmds: :mcp list, :mcp tools, :mcp tool <name>, :mcp connect <url> [alias], :mcp disconnect <alias> End-to-end test via the REPL against a real boltzmann lmcp + hossenfelder broker.
config.lua example block. Add a commented-out mcp = { servers = { boltzmann = {...} }, auto_approve = {...} } example so users can see the shape. Not behavior-impacting; documentation only. Bundled with commit #6 if small or split if substantial.

Risk / non-obvious

Empty tools array. If config.mcp.servers is absent or all connects fail, the broker request body must omit tools entirely (some servers reject "tools": []). Don't send the field when empty.
Connect-at-startup blocking. N servers × ~30 ms init+list. For N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged per server, don't abort aish. Parallel via coroutines is out of scope here — sequential is fine for v1.
Content blocks beyond text. lmcp returns [{type:"text", text:...}]. The spec allows type:"image" | "resource". Phase 2 v1 flattens by concatenating all text blocks and ignoring non-text. Log a status warning if non-text blocks are seen. Adequate for boltzmann/hertz tools (all text); image/resource tools deferred.
isError: false on actual failure (baseline finding §3 of PHASE2-baseline.md). Pass content through unchanged; let the model read the error text. Do NOT short-circuit on the flag.
JSON-RPC error from tools/call. Surface as aish status AND synthesize a role:"tool" turn with content = "[aish] tool dispatch failed: <error.message>" and the matching tool_call_id. The alternation rationale (§4) requires this — leaving the assistant's tool_calls orphaned breaks strict chat templates exactly the way PHASE0.md §6 warned about. The model receives the error and can re-plan within the same turn.
Tool-call sub-loop bounds. max_tool_depth (default 8) per ask_ai invocation. When hit, surface as status and break — append the assistant's last text (if any) and let the user reply.
Argument JSON might be invalid. A model can stream malformed JSON in function.arguments. dkjson.decode failure → DO NOT execute on partial parse. Synthesize a role:"tool" turn with content = "[aish] tool arguments not parseable as JSON: <decode-err>" and the matching tool_call_id (same alternation rationale as JSON-RPC error above; C7 in review).
Q18 fallback path (strict templates rejecting role:"tool"). Plumb a context.use_tool_role flag (default true). If a real-world rejection appears at Phase 7, flip the flag and convert tool turns to [tool: alias.name]\n<content> prefix on the next user turn (same pattern as pending_exec_output). Day-one verification (N8 in review): commit #3 includes a small in-isolation test that builds a context with use_tool_role = false, appends an assistant+tool_calls turn followed by a tool result, and confirms to_messages() emits the prefix shape instead of a role:"tool" turn. Keeps the fallback alive rather than dead-coded until Phase 7 first runs it under stress.

Test checkpoints

After each commit, verify with a targeted probe before moving on:

Commit	Verify
#1 `mcp.lua`	`luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp
#2 `safety.lua`	unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked
#3 `context.lua`	(i) round-trip a context with tool turns through `to_messages()`, eyeball JSON shape; (ii) day-one fallback test with `use_tool_role = false` emits the `[tool: …]` prefix shape (N8)
#4 `renderer.lua`	one-liner emits frame around fake tool result
#5 `broker.lua`	curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body
#6 `repl.lua`	full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation
#7 `config.lua`	aish starts with example mcp section present; no MCP servers connected means no `tools` field sent

Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)

Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller in surface — single new file plus targeted extensions. Tracked to land in one working session if the boltzmann proxy bugs don't intrude.

Resolved at review (2026-05-12)

Q18 default — use_tool_role = true defaulted, fallback exercised day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if mistral-nemo (once #23 is fixed) rejects.
:mcp connect re-fetch policy — v1 trusts the listChanged: false capability; manual disconnect+reconnect is the workaround if a server's tools change. No automatic re-fetch.

Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)

Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against the formulate+analyze+plan draft. Resolutions applied in this revision:

B1 context.lua impact widened — assert loosening + field preservation + to_messages emit are now explicit in §3.
B2 ffi/curl.M.post extended to return (body, status_code) so mcp.lua distinguishes HTTP 401 from JSON-RPC envelope errors.
B3 inputSchema typo fixed in §3 mcp.lua row.
B4 pending_exec_output × tool-call sub-loop interaction specified (persists across; flushes on next user turn).
B5 §3/§12 dependency contradiction resolved — broker takes opts.tools from the caller; no layering inversion.
C1 M.chat polymorphic return dropped.
C2 Index-absent fallback specified (default to 0).
C3 Re-injection example now stores accumulated text in the assistant turn, not hard-coded empty string.
C4 :mcp connect failure semantics specified (no auto-retry).
C5/C7 Both orphan-tool_calls scenarios now synthesize a role:"tool" turn with [aish] tool dispatch failed: ... content to preserve alternation.
C6 §9 explicitly notes the §4 amendment is additive.
N3 protocolVersion fallback specified (lmcp doesn't negotiate).
N4 alternation assert added to context.lua row.
N7 model-routing bug filed as aish#23.
N8 day-one fallback test added to commit #3 checkpoints.

CONCERNs / NITs not folded (defended as wording-only, not load-bearing): N1, N2, N5, N6 — left as-is.

End of Phase 2 Manifest — aish

36 KiB Raw Blame History Unescape Escape