Files
aish/docs/PHASE2.md
T
marfrit f5daa6afc0 docs/PHASE2: re-review NITs — M.post shape, getinfo cdef, content flattening normative
Three follow-up NITs from the post-fold-in review:

  (1) Disambiguate M.post return shape: (body, status_code) on transport
      success regardless of status; (nil, errmsg) on libcurl failure
      stays unchanged. Phase 1 callers reading only the first slot are
      unaffected.

  (2) Note that the M.post extension requires extending ffi.cdef to
      include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal
      2097154, CURLINFOTYPE_LONG | 2) and a long[1] out-param shim.
      Implementation detail the commit #1 author will need.

  (3) Move the tool-result content-flattening rule from §12 risk note
      into §4 normative spec (forward-referenced both ways) — §4 is
      where a future reader looking for the tool-invocation contract
      will scan.

No design changes; clarifications only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:02:35 +00:00

617 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# aish — Phase 2 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 2 Requirements, Architecture & Design Decisions
**Status:** Plan (review pass folded in 2026-05-12)
**Date:** 2026-05-12
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
manifest specifies what Phase 2 adds. Section numbers reference back to
PHASE0.md / PHASE1.md where relevant.
---
## 1. Scope of Phase 2
Three pillars per PHASE0.md §11 row 2:
1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
Target reference implementation: `lmcp`. Operations needed for v1:
`initialize`, `tools/list`, `tools/call`. Multiple servers may be
connected concurrently; tools are namespaced `<server>.<tool>`.
2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
`tools` in the request body; the model emits `tool_calls` in the
response; `mcp.lua` dispatches each call to the right server; the
tool result is fed back as a `role:"tool"` turn in `context.lua` and
the chat continues.
3. **Authorization gate**`safety.lua` (PHASE0.md §4 stub) finally gets
implemented. Every tool call is confirmed by the user by default,
with per-tool and per-server `auto_approve` policies in `config.lua`.
**Phase 2 is done when:**
- aish can connect to at least one local `lmcp` server declared in
`config.lua` and one connected via `:mcp connect <url>` at runtime.
- `:mcp list` shows connected servers; `:mcp tools` shows discovered
tools across all servers.
- A model conversation can invoke a tool: the broker request carries
the live tools schema; the response's `tool_calls` are confirmed by
the user; each call dispatches to the right MCP server; the result
re-enters the chat; the model continues with the result available.
- `CMD:` extraction (PHASE0.md §6 substrate invariant) still works
unchanged — Phase 2 is additive, not replacing.
- A tool with `auto_approve = true` (in config) executes without the
confirm prompt; a non-approved tool still prompts.
---
## 2. Technology Decisions (delta from Phase 1)
| Decision | Choice | Rationale |
|---|---|---|
| MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
| `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
| Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
| System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
| Tool-call streaming | Streaming-from-day-one — `broker.chat_stream`'s on_delta callback widens to handle `tool_calls` deltas in addition to text deltas | Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. **Wire shape confirmed at analyze** (2026-05-12 probe vs hossenfelder): `delta.tool_calls[]` arrives indexed; id+type+function.name appear on the opening delta; `function.arguments` is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5. |
| Tool-call concurrency | Sequential dispatch in Phase 2 v1 — process `tool_calls[0]` to completion, then `[1]`, etc. | Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20). |
| MCP server lifecycle | aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) | Declared in config or connected by URL; aish is a client only. |
---
## 3. Module Changes
| File | State after Phase 1 | Phase 2 changes |
|---|---|---|
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, inputSchema}]`, `session:call_tool(name, args) -> (result_table, kind)` where `kind ∈ {"ok","handler_error","rpc_error"}` so callers can route the response per §4's error split, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's `401 {"error":"unauthorized"}` body, which is NOT JSON-RPC-shaped — has no `jsonrpc`/`id` fields) from JSON-RPC envelope errors; needs `ffi/curl.M.post` extended to return status code (see ffi/curl.lua row). |
| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Signature widens to `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional array of `{type, function:{name, description, parameters}}`) is passed through to the request body; **omitted entirely if absent or empty** (some servers reject `"tools": []`). The on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`. **`broker.lua` does NOT depend on `mcp.lua`** — repl assembles the tools array and passes it in; broker stays a transport layer. `M.chat` (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it). |
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
| `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string. `ffi.cdef` exposes only `curl_easy_setopt` — no `curl_easy_getinfo` (cdef block at curl.lua:11-28). | **One small extension**: `M.post` returns **`(body, status_code)` on transport success** (status_code may be non-2xx — caller decides what to do; mcp.lua treats `>= 400` as transport failure). `(nil, errmsg)` on libcurl-level failure is **unchanged** — Phase 1 callers that read only the first slot stay correct. Requires adding `curl_easy_getinfo` + `CURLINFO_RESPONSE_CODE` (decimal 2097154, `CURLINFOTYPE_LONG | 2`) to the `ffi.cdef` block, plus a `long[1]` out-param shim. MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
`router.lua` in the §4 table. Same commit lands the manifest amendment.
---
## 4. MCP Transport (analyze findings — lmcp v0.5.4)
lmcp implements only the **synchronous POST** flavor of the MCP
streamable-HTTP spec. Each RPC is one HTTP transaction:
```
client → server: POST /mcp Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>
Body: { jsonrpc:"2.0", id, method, params }
Returns: { jsonrpc, id, result | error }
Connection: close
```
lmcp's `GET /mcp` exists but only sends a one-shot `event: endpoint`
announcing the POST URL, then closes — there is no held-open
server→client channel. Combined with the `listChanged: false`
capability lmcp announces in `initialize`, **aish does not open a
persistent SSE channel** to lmcp servers in v1. Notifications-from-server
are out of scope here; track for v2 if a richer server appears.
### Handshake
1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`.
2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`.
3. **Version mismatch**: lmcp ignores client's `protocolVersion` and always returns its compiled-in `MCP_VERSION` (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status (`[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding`) and continues. v1 has no version-gated behavior.
4. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
### Tool discovery
1. `tools/list` RPC → `{ tools: [{ name, description, inputSchema }] }`.
2. Cache per-session **for the session lifetime** — lmcp announces
`listChanged: false`, so there's no need to refetch or listen for
change notifications.
### Tool invocation
**Content flattening**: tool results return `content: [{type, ...}, ...]`.
lmcp v0.5.4 only emits `type: "text"`, but the spec also allows
`"image"` and `"resource"`. Phase 2 v1 **concatenates all `text` blocks**
into a single string for the `role:"tool"` turn body and **ignores
non-text blocks**, logging a one-shot status warning when a non-text
block is observed. Image/resource handling is deferred. See §12
"Content blocks beyond text" for the corresponding risk note.
`tools/call` with `{ name, arguments }`. Failure has three flavors and
all of them result in **a `role:"tool"` turn being appended** so the
assistant's `tool_calls` is never left orphaned in context (strict
templates reject `assistant.tool_calls` without a matching `tool`
reply — same gotcha PHASE0.md §6 warned about):
- **Tool-handler exception** → JSON-RPC `result` with `isError: true`
and `content: [{ type:"text", text: "Error: ..." }]`. Feed
`content` straight back as the `role:"tool"` turn body. Model-recoverable.
- **Baseline `isError: false` on actual failure** (PHASE2-baseline.md §3
found this — boltzmann's `read_file` returns content text containing
"Error: ..." but `isError: false`). Pass content through unchanged —
let the model read the text. `isError` is advisory, not authoritative.
- **JSON-RPC envelope error** (e.g. `{code: -32601, message: "Tool not
found"}`) → synthesize a `role:"tool"` turn with
`content = "[aish] tool dispatch failed: <error.message>"` and the
matching `tool_call_id`. Also surface a status line for the user.
This both keeps alternation legal and tells the model what happened
so its next plan is informed.
- **HTTP-level failure** (auth, unreachable, timeout) → same shape:
synthesize a `role:"tool"` turn with
`content = "[aish] tool transport error: <reason>"`. Same alternation
rationale.
This split resolves Q21 (with the C5/C7 review fix folded in).
### Lifecycle
- Connect on startup (from `config.mcp.servers`) — best effort; failures
are status-logged once, don't abort aish, and the session is **absent
from `mcp_sessions` until manually reconnected via `:mcp connect`**.
No automatic retry. "Connect" here means: do the `initialize`
round-trip + cache `tools/list` results.
- `:mcp connect <url>` adds a session at runtime; alias auto-derived
from hostname or supplied as second arg.
- `:mcp disconnect <alias>` drops cached state. There's no long-lived
HTTP connection to close (every RPC was already `Connection: close`).
- On aish quit, sessions are just forgotten — nothing to clean up
server-side.
- An unreachable server simply contributes no tools to the broker
request body — the model is not told that tools were "meant" to be
available. If `tools_schema()` returns empty across all sessions, the
broker omits the `tools` field entirely.
---
## 5. Tool-Call Bridge
### Broker request body (delta from Phase 1)
```json
{
"model": "...",
"messages": [...],
"stream": true,
"temperature": 0.2,
"tools": [
{ "type":"function",
"function": { "name":"<alias>.<tool>",
"description":"...",
"parameters": <inputSchema> } },
...
]
}
```
The `tools` array is assembled by `mcp.tools_schema()` — flattens
`tools/list` results from every connected session, namespacing each tool
as `<alias>.<name>`.
### Response handling (streaming)
llama.cpp / OpenAI deltas may include:
```json
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
"function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
"function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}
```
`broker.chat_stream` accumulates tool-call deltas keyed by `index`; the
`arguments` field is a JSON-string that arrives chunked and is concatenated.
On `finish_reason: tool_calls`, the accumulated calls are emitted to
on_delta as `kind="tool_call"` with full payloads.
**Index-absent fallback**: per the OpenAI spec, `index` is REQUIRED on
streaming `tool_calls[]` deltas — but some local llama.cpp builds have
been reported to omit it for single-call streams. If a delta has
`tool_calls` but no `index`, treat it as `index = 0` and accumulate
into the slot-0 buffer. Log a one-shot debug status the first time
this is observed per stream.
### Re-injection into context
The assistant turn carries **whatever text was streamed before
`finish_reason: tool_calls`** (which may be non-empty — models often
say "Sure, let me look that up" before calling). The renderer flushes
that text first, then renders the tool-call frame around dispatch.
```lua
-- After tool execution
ctx:append({
role = "assistant",
content = accumulated_text, -- may be "" if model emitted no prose
tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
})
ctx:append({
role = "tool",
tool_call_id = "call_…",
content = <tool-result-text-or-synthesized-error>,
})
```
`to_messages()` renders both shapes for the next broker request. The
strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
handled differently here — tool turns ARE expected to follow assistant
tool_calls per the OpenAI chat-template convention. If a model's
template still rejects this shape, fall back to the `[tool: X]` prefix
strategy used for exec output (Q18 below — fallback is plumbed via the
`context.use_tool_role` flag; default `true`).
### Re-issuing the broker request
After tool turns are appended, the broker is called again with the
extended messages array. The model may emit more `tool_calls`, more
text, or both. Loop until the response has no `tool_calls` (i.e. a
plain text assistant turn).
Budget: a max-tool-call-depth setting (default 8) prevents runaway loops.
Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
---
## 6. Authorization (safety.lua Phase 2 surface)
```lua
-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
function M.confirm_tool_call(name, args, cfg)
local policy = cfg.mcp and cfg.mcp.auto_approve or {}
if policy[name] then return true end
-- Per-server prefix check: "alias.*" entries
local alias = name:match("^([^.]+)%.")
if alias and policy[alias .. ".*"] then return true end
-- Otherwise prompt
local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
return ans:lower():sub(1,1) == "y"
end
```
Config schema (analyze-revised — Bearer auth fields added):
```lua
mcp = {
servers = {
boltzmann = {
url = "http://boltzmann.fritz.box:8080/mcp",
auth_env = "BOLTZMANN_MCP_TOKEN", -- read from env at startup
},
broglie = {
url = "http://broglie.fritz.box:8080/mcp",
-- no auth (LAN-only deployment)
},
nc = {
url = "https://nc.reauktion.de:8080/mcp",
auth_token = "literal-token-if-not-using-env", -- alternative
},
},
auto_approve = {
["boltzmann.read_file"] = true, -- specific tool
["broglie.*"] = true, -- whole server
},
max_tool_depth = 8,
}
```
Auth precedence per server: `auth_token` literal > `auth_env` indirection
> nil (no Authorization header sent). Mirrors PHASE0 §10's `key_env`
convention for cloud model API keys.
Norris mode (Phase 3) will extend this: when autonomous, the destructive-op
heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
---
## 7. Meta Commands (Phase 2 additions)
| Command | Action |
|---|---|
| `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
| `:mcp disconnect <alias>` | Close one session |
| `:mcp list` | Show connected sessions (alias, url, tool count, status) |
| `:mcp tools` | List tools across all sessions (`alias.name` — short description) |
| `:mcp tool <alias.name>` | Show one tool's full inputSchema (debug aid) |
Existing `:help` updated to list these.
---
## 8. System Prompt Augmentation
`broker.lua`'s default system prompt grows by ~4 lines:
```
You may have access to MCP tools — they appear in this request's `tools`
field. Call a tool by emitting a tool_call; the result will be supplied
in the next turn. Use tools for structured operations (file reads,
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
when available; fall back to `CMD:` for anything not exposed as a tool.
```
The actual tool list is in the `tools` request-body field, not the
prompt. This avoids per-turn token bloat for the full schema.
§3 substrate invariants are unchanged. The `CMD:` extraction marker stays
the local-shell route; tools are the additive structured route.
---
## 9. Migration from Phase 1
User-visible changes:
- New `:mcp …` meta commands when MCP servers are configured or
connected at runtime.
- Assistant responses may now invoke tools — user sees a confirm prompt
(similar to `CMD:` execution gate) followed by an indented tool-call
frame with the result.
- `CMD:` lines still work exactly as before for shell.
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
amended to **add** `mcp.lua` (no rename of any existing file). Adding
a new file is additive and preserves the §3 module-stability invariant
("File names are stable across phases — later phases fill in bodies,
not rename files"). The amendment ships in commit #1 of the §12 plan
(C6 in the review).
`config.lua`: existing configs without an `mcp` section continue to work
— no MCP servers means no tools sent in the broker request body, no
auth checks, no behavior change.
---
## 10. Out of Scope (Phase 2)
Per PHASE0.md §11, these belong elsewhere:
- Chuck Norris autonomous mode (Phase 3) — even though tool-calls
enable richer autonomy, the *autonomous policy* is Phase 3's.
- Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only
implements the per-call confirm-prompt surface.
- `memory.jsonl` summarization across sessions (Phase 4).
- Multi-model routing / cloud fallback (Phase 5).
- Tree-sitter syntax highlighting (Phase 6).
Specifically out of Phase 2 scope despite proximity:
- Stdio-transport MCP servers (Q17 below).
- Parallel tool-call dispatch (Q20).
- MCP `resources/list` and `prompts/list` capabilities — Phase 2
v1 only implements `tools/*`. Resources/prompts deferred (probably
Phase 4 alongside memory).
- Server-sent `notifications/progress` for long-running tool calls —
ignored in v1; status surface comes later.
---
## 11. Open Questions
| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q17 | ~~MCP transport abstraction: stdio vs HTTP+SSE~~ | mcp.lua API shape | **Resolved at analyze.** Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and `listChanged: false` removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here. |
| Q18 | Tool-result re-injection: standard OpenAI `role:"tool"` turn, or `[tool: X]` prefix to next user turn (matching the §6 exec-output pattern)? | context.lua + broker.lua | **Partly resolved.** Live probe (2026-05-12, hossenfelder) shows `role:"tool"` accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is **blocked** by the hossenfelder proxy routing all `model` field values to the loaded fast model — see open-end below. Default v1 path: `role:"tool"` (standard); fallback to `[tool: X]` prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify. |
| Q19 | Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model? | context.lua + executor of tool-result | Phase 2 (plan); Phase 4 may refine with memory.jsonl |
| Q20 | Parallel `tool_calls`: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only? | mcp.lua dispatch | Phase 2 (verify) — track for v2 |
| Q21 | ~~MCP error mapping~~ | mcp.lua + broker.lua | **Resolved at analyze.** lmcp distinguishes: `result.isError=true` (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC `error` (unknown method/tool, transport-level, surface as aish status). See §4. |
| Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. |
Open-end carried forward to Phase 7 (verify):
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Tracked as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23) (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15) — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.
Resolved at formulate (above in §2 table):
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
- Q7 (MCP discovery) — both, config-declared default + runtime `:mcp connect`.
- Q8 (authorization) — per-call confirm default, per-tool/per-server `auto_approve` policy.
- Q9 (system-prompt augmentation) — hybrid: static frame + dynamic `tools` body field.
- Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.
Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
- Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
- Q21 (error mapping) — isError vs JSON-RPC error split per §4.
---
## 12. Implementation Plan (commit-by-commit)
Bottom-up — start with modules with the fewest dependencies, end with the
REPL wiring that exercises everything together. Same shape as Phase 0
and Phase 1 implementation cadence.
### Order
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport — **same
commit lands the `M.post` extension to return `(body, status_code)`
per §3 row** so `mcp.lua` can distinguish HTTP `401` (non-JSON-RPC
body `{"error":"unauthorized"}`) from JSON-RPC envelope errors.
Per-server Bearer auth (`auth_token` literal or `auth_env`
indirection). `:call_tool` returns `(result_table, kind)` where
`kind ∈ {"ok","handler_error","rpc_error"}` so callers route per
§4. **Test in isolation** via
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
`router.lua` in the same commit (additive — preserves §3
module-stability invariant per §9).
2. **`safety.lua` — confirm-gate surface.** Implement just
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
`cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
in isolation** with mocked rl + various policy shapes.
3. **`context.lua` extensions.** Three concrete edits per §3 row:
(a) loosen `Context:append`'s assert from "content required" to
shape-per-role (assistant may have empty content if `tool_calls`
present; `tool` requires `tool_call_id` + `content`); (b) preserve
`tool_calls` / `tool_call_id` in stored turns (not just role+content);
(c) extend `to_messages()` to emit those fields. Add alternation
assert (N4 in review). `pending_exec_output` is **unchanged**:
buffer persists across tool-call sub-loops; flushes on next genuine
user turn (§3 row). **Tests in isolation**: (i) build a context with
assistant+tool_calls + tool turns, round-trip through `to_messages()`,
eyeball JSON shape; (ii) day-one fallback test (N8) — same context
with `use_tool_role = false` must emit the `[tool: alias.name]\n…`
prefix shape instead of a `role:"tool"` message.
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
(top rule + `name(json-snippet)` indented dim) and
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
Visual parity with the exec frame. **Test visually** with a one-liner.
5. **`broker.lua` extensions.** Signature widens:
`chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional
array) is passed through to the request body; **omitted entirely
when nil or empty**. The on_delta callback widens to
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`.
Text path unchanged. Tool-call path: accumulator keyed by `index`
(default 0 if absent — C2), concatenates `function.arguments` until
`finish_reason: "tool_calls"`, then emits one
`on_delta("tool_call", {id,name,arguments})` per completed call.
**`M.chat` shape unchanged** in this phase (C1 in review — no
caller for a polymorphic return). **Test against hossenfelder**
with `tools` declared + streaming.
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
populated from `config.mcp.servers` at startup. Helpers:
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
- `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
- `ask_ai` loop now: stream response → if any tool_calls completed,
for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
→ repeat until pure-text response or `max_tool_depth` reached
- New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
`:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
**End-to-end test** via the REPL against a real boltzmann lmcp +
hossenfelder broker.
7. **`config.lua` example block.** Add a commented-out `mcp = { servers
= { boltzmann = {...} }, auto_approve = {...} }` example so users can
see the shape. Not behavior-impacting; documentation only. Bundled
with commit #6 if small or split if substantial.
### Risk / non-obvious
- **Empty tools array.** If `config.mcp.servers` is absent or all
connects fail, the broker request body must **omit** `tools`
entirely (some servers reject `"tools": []`). Don't send the field
when empty.
- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
per server, don't abort aish. Parallel via coroutines is out of scope
here — sequential is fine for v1.
- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
concatenating all `text` blocks and ignoring non-text. Log a status
warning if non-text blocks are seen. Adequate for boltzmann/hertz
tools (all text); image/resource tools deferred.
- **`isError: false` on actual failure** (baseline finding §3 of
PHASE2-baseline.md). Pass content through unchanged; let the model
read the error text. Do NOT short-circuit on the flag.
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
AND synthesize a `role:"tool"` turn with
`content = "[aish] tool dispatch failed: <error.message>"` and the
matching `tool_call_id`. The alternation rationale (§4) requires
this — leaving the assistant's `tool_calls` orphaned breaks strict
chat templates exactly the way PHASE0.md §6 warned about. The model
receives the error and can re-plan within the same turn.
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
invocation. When hit, surface as status and break — append the
assistant's last text (if any) and let the user reply.
- **Argument JSON might be invalid.** A model can stream malformed JSON
in `function.arguments`. `dkjson.decode` failure → DO NOT execute on
partial parse. Synthesize a `role:"tool"` turn with
`content = "[aish] tool arguments not parseable as JSON: <decode-err>"`
and the matching `tool_call_id` (same alternation rationale as
JSON-RPC error above; C7 in review).
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
Plumb a `context.use_tool_role` flag (default true). If a real-world
rejection appears at Phase 7, flip the flag and convert tool turns to
`[tool: alias.name]\n<content>` prefix on the next user turn (same
pattern as `pending_exec_output`). **Day-one verification** (N8 in
review): commit #3 includes a small in-isolation test that builds a
context with `use_tool_role = false`, appends an assistant+tool_calls
turn followed by a tool result, and confirms `to_messages()` emits
the prefix shape instead of a `role:"tool"` turn. Keeps the fallback
alive rather than dead-coded until Phase 7 first runs it under stress.
### Test checkpoints
After each commit, verify with a targeted probe before moving on:
| Commit | Verify |
|---|---|
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
| #3 `context.lua` | (i) round-trip a context with tool turns through `to_messages()`, eyeball JSON shape; (ii) day-one fallback test with `use_tool_role = false` emits the `[tool: …]` prefix shape (N8) |
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
### Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
in surface — single new file plus targeted extensions. Tracked to land
in one working session if the boltzmann proxy bugs don't intrude.
### Resolved at review (2026-05-12)
- **Q18 default** — `use_tool_role = true` defaulted, fallback exercised
day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if
mistral-nemo (once #23 is fixed) rejects.
- **`:mcp connect` re-fetch policy** — v1 trusts the `listChanged: false`
capability; manual disconnect+reconnect is the workaround if a server's
tools change. No automatic re-fetch.
### Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)
Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against
the formulate+analyze+plan draft. Resolutions applied in this revision:
- **B1** context.lua impact widened — assert loosening + field
preservation + `to_messages` emit are now explicit in §3.
- **B2** `ffi/curl.M.post` extended to return `(body, status_code)` so
`mcp.lua` distinguishes HTTP `401` from JSON-RPC envelope errors.
- **B3** `inputSchema` typo fixed in §3 mcp.lua row.
- **B4** `pending_exec_output` × tool-call sub-loop interaction
specified (persists across; flushes on next user turn).
- **B5** §3/§12 dependency contradiction resolved — broker takes
`opts.tools` from the caller; no layering inversion.
- **C1** `M.chat` polymorphic return dropped.
- **C2** Index-absent fallback specified (default to 0).
- **C3** Re-injection example now stores accumulated text in the
assistant turn, not hard-coded empty string.
- **C4** `:mcp connect` failure semantics specified (no auto-retry).
- **C5/C7** Both orphan-tool_calls scenarios now synthesize a
`role:"tool"` turn with `[aish] tool dispatch failed: ...` content
to preserve alternation.
- **C6** §9 explicitly notes the §4 amendment is additive.
- **N3** protocolVersion fallback specified (lmcp doesn't negotiate).
- **N4** alternation assert added to context.lua row.
- **N7** model-routing bug filed as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23).
- **N8** day-one fallback test added to commit #3 checkpoints.
CONCERNs / NITs not folded (defended as wording-only, not load-bearing):
N1, N2, N5, N6 — left as-is.
---
*End of Phase 2 Manifest — aish*