Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
38 KiB
aish — Phase 2 Manifest
Project: aish — AI-augmented conversational shell Document: Phase 2 Requirements, Architecture & Design Decisions Status: Verify (Phase 7) — implementation complete; live testing in progress Date: 2026-05-12
Amendments since formulate:
- 2026-05-12 (review fold-in): see §12 "Review fold-in" subsection.
- 2026-05-12 (Phase 7 verify, separator switch): tool-name namespace
delimiter changed from
.to__because Anthropic via Bedrock validates tool names against^[a-zA-Z0-9_-]{1,128}$— dots are rejected withHTTP 400 tools.0.custom.name: String should match pattern '...'. Discovered when:model cloudexercised TC #26 against the real cloud path. Internal API matches on-wire shape so there's no transformation layer. Constraint: aliases must not themselves contain__so the parse stays unambiguous (leftmost__is the split point). Tool names from MCP servers may contain underscores freely. All §3/§5/§6/§7/§12 references updated below.
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This manifest specifies what Phase 2 adds. Section numbers reference back to PHASE0.md / PHASE1.md where relevant.
1. Scope of Phase 2
Three pillars per PHASE0.md §11 row 2:
- MCP client (
mcp.lua) — JSON-RPC 2.0 over HTTP+SSE transport. Target reference implementation:lmcp. Operations needed for v1:initialize,tools/list,tools/call. Multiple servers may be connected concurrently; tools are namespaced<server>__<tool>. - Tool-calling protocol bridge — the broker sends OpenAI-compatible
toolsin the request body; the model emitstool_callsin the response;mcp.luadispatches each call to the right server; the tool result is fed back as arole:"tool"turn incontext.luaand the chat continues. - Authorization gate —
safety.lua(PHASE0.md §4 stub) finally gets implemented. Every tool call is confirmed by the user by default, with per-tool and per-serverauto_approvepolicies inconfig.lua.
Phase 2 is done when:
- aish can connect to at least one local
lmcpserver declared inconfig.luaand one connected via:mcp connect <url>at runtime. :mcp listshows connected servers;:mcp toolsshows discovered tools across all servers.- A model conversation can invoke a tool: the broker request carries
the live tools schema; the response's
tool_callsare confirmed by the user; each call dispatches to the right MCP server; the result re-enters the chat; the model continues with the result available. CMD:extraction (PHASE0.md §6 substrate invariant) still works unchanged — Phase 2 is additive, not replacing.- A tool with
auto_approve = true(in config) executes without the confirm prompt; a non-approved tool still prompts.
2. Technology Decisions (delta from Phase 1)
| Decision | Choice | Rationale |
|---|---|---|
| MCP transport | HTTP POST per RPC, Connection: close per response, no long-lived SSE GET channel in v1 |
Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's capabilities.tools.listChanged = false, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
| MCP protocol version | 2025-03-26 (confirmed by live probe of boltzmann:8080/mcp) |
lmcp pins this in MCP_VERSION and does not negotiate — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends 2025-03-26 in initialize and accepts whatever the server returns; on mismatch it logs [aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding and continues. v1 has no version-gated behavior to abort on. |
| MCP auth | Bearer token via Authorization: Bearer <token> header, per-server |
Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports auth_token literal and auth_env env-var indirection per server (mirrors key_env in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
| Tool-call wire format | OpenAI tools field on /v1/chat/completions body; tool_calls on assistant deltas; role:"tool" turn with tool_call_id for results |
Standard, supported by llama.cpp and OpenRouter. Aligns with the existing /v1/chat/completions substrate invariant. |
| Tool namespacing | <server-alias>__<tool-name> for both the wire-level tool name and :mcp tools listing (was . at formulate; switched 2026-05-12 — see Amendments above) |
Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. __ (two underscores) is within Bedrock's tool-name regex ^[a-zA-Z0-9_-]{1,128}$ whereas . is not. Aliases must not themselves contain __. |
CMD: coexistence with tool-calls |
Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). CMD: is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
| Authorization default | Per-call confirm (mirrors PHASE0.md §10 confirm_cmd for shell) |
Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
| System prompt augmentation | Hybrid: static frame in broker.lua system prompt + dynamic tools array in the request body |
Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
| Tool-call streaming | Streaming-from-day-one — broker.chat_stream's on_delta callback widens to handle tool_calls deltas in addition to text deltas |
Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. Wire shape confirmed at analyze (2026-05-12 probe vs hossenfelder): delta.tool_calls[] arrives indexed; id+type+function.name appear on the opening delta; function.arguments is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5. |
| Tool-call concurrency | Sequential dispatch in Phase 2 v1 — process tool_calls[0] to completion, then [1], etc. |
Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20). |
| MCP server lifecycle | aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) | Declared in config or connected by URL; aish is a client only. |
3. Module Changes
| File | State after Phase 1 | Phase 2 changes |
|---|---|---|
mcp.lua |
New file (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: M.connect(url, opts) -> session (opts: alias, auth_token, auth_env), session:initialize(), session:list_tools() -> [{name, description, inputSchema}], session:call_tool(name, args) -> (result_table, kind) where kind ∈ {"ok","handler_error","rpc_error"} so callers can route the response per §4's error split, session:close(). JSON-RPC 2.0 over HTTP POST (Content-Type: application/json, Accept: application/json, Authorization: Bearer <token>). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's 401 {"error":"unauthorized"} body, which is NOT JSON-RPC-shaped — has no jsonrpc/id fields) from JSON-RPC envelope errors; needs ffi/curl.M.post extended to return status code (see ffi/curl.lua row). |
safety.lua |
Stub | Implement Phase 2 surface only: M.confirm_tool_call(tool_name, args, policy) -> bool. Reads config.mcp.auto_approve (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
broker.lua |
Streaming chat_stream(cfg, msgs, on_delta) |
Signature widens to chat_stream(cfg, msgs, on_delta, opts). opts.tools (optional array of {type, function:{name, description, parameters}}) is passed through to the request body; omitted entirely if absent or empty (some servers reject "tools": []). The on_delta callback widens to on_delta(kind, payload) where kind ∈ {"text", "tool_call"}. broker.lua does NOT depend on mcp.lua — repl assembles the tools array and passes it in; broker stays a transport layer. M.chat (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it). |
context.lua |
turns = {{role, content}, ...} + pending_exec_output; Context:append asserts turn.content and rebuilds the entry as {role, content} only — extra fields are dropped |
Three concrete edits: (a) loosen :append so role == "assistant" can carry tool_calls = [{id, name, arguments}] with content allowed empty, and role == "tool" requires tool_call_id + content (the assert moves from "content required" to "shape per role"); (b) preserve tool_calls and tool_call_id in the stored turn (not just role+content); (c) to_messages() emits tool_calls on assistant turns and tool_call_id on tool turns. Add a debug assertion that role == "tool" follows an assistant turn with non-empty tool_calls (catches design bugs early; N4 in review). pending_exec_output interaction: the buffer persists across the tool-call sub-loop (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
repl.lua |
meta cmds + ask_ai stream loop | After ask_ai sees tool_calls, enter a tool-execution sub-loop: confirm-gate each call via safety.confirm_tool_call, dispatch via mcp.session:call_tool, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: :mcp connect <url> [alias], :mcp list, :mcp tools, :mcp disconnect <alias>. |
renderer.lua |
streaming text + exec frame | Add tool_call_begin(name, args), tool_call_end(result, ok). Visual style: indented, dim, parallel to the exec frame. |
config.lua |
example with models/shell/context/history | Schema additions: mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias__tool"] = true } }. Documented in §6 below. |
ffi/curl.lua |
post + post_sse; M.post does not set FAILONERROR, so non-2xx responses return the body as a normal string. ffi.cdef exposes only curl_easy_setopt — no curl_easy_getinfo (cdef block at curl.lua:11-28). |
One small extension: M.post returns (body, status_code) on transport success (status_code may be non-2xx — caller decides what to do; mcp.lua treats >= 400 as transport failure). (nil, errmsg) on libcurl-level failure is unchanged — Phase 1 callers that read only the first slot stay correct. Requires adding curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal 2097154, `CURLINFOTYPE_LONG |
history.lua |
JSONL session log | Tool turns are logged like any other turn — {role:"tool", tool_call_id:"...", content:"..."}. Resume reconstructs them via ctx:append like user/assistant turns. |
§4 module-layout amendment: mcp.lua slots between broker.lua and
router.lua in the §4 table. Same commit lands the manifest amendment.
4. MCP Transport (analyze findings — lmcp v0.5.4)
lmcp implements only the synchronous POST flavor of the MCP streamable-HTTP spec. Each RPC is one HTTP transaction:
client → server: POST /mcp Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>
Body: { jsonrpc:"2.0", id, method, params }
Returns: { jsonrpc, id, result | error }
Connection: close
lmcp's GET /mcp exists but only sends a one-shot event: endpoint
announcing the POST URL, then closes — there is no held-open
server→client channel. Combined with the listChanged: false
capability lmcp announces in initialize, aish does not open a
persistent SSE channel to lmcp servers in v1. Notifications-from-server
are out of scope here; track for v2 if a richer server appears.
Handshake
initializerequest:{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }.- Server response (lmcp):
{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }. - Version mismatch: lmcp ignores client's
protocolVersionand always returns its compiled-inMCP_VERSION(lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status ([aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding) and continues. v1 has no version-gated behavior. notifications/initializedPOST (one-way; lmcp returns HTTP 202 with no body).
Tool discovery
tools/listRPC →{ tools: [{ name, description, inputSchema }] }.- Cache per-session for the session lifetime — lmcp announces
listChanged: false, so there's no need to refetch or listen for change notifications.
Tool invocation
Content flattening: tool results return content: [{type, ...}, ...].
lmcp v0.5.4 only emits type: "text", but the spec also allows
"image" and "resource". Phase 2 v1 concatenates all text blocks
into a single string for the role:"tool" turn body and ignores
non-text blocks, logging a one-shot status warning when a non-text
block is observed. Image/resource handling is deferred. See §12
"Content blocks beyond text" for the corresponding risk note.
tools/call with { name, arguments }. Failure has three flavors and
all of them result in a role:"tool" turn being appended so the
assistant's tool_calls is never left orphaned in context (strict
templates reject assistant.tool_calls without a matching tool
reply — same gotcha PHASE0.md §6 warned about):
- Tool-handler exception → JSON-RPC
resultwithisError: trueandcontent: [{ type:"text", text: "Error: ..." }]. Feedcontentstraight back as therole:"tool"turn body. Model-recoverable. - Baseline
isError: falseon actual failure (PHASE2-baseline.md §3 found this — boltzmann'sread_filereturns content text containing "Error: ..." butisError: false). Pass content through unchanged — let the model read the text.isErroris advisory, not authoritative. - JSON-RPC envelope error (e.g.
{code: -32601, message: "Tool not found"}) → synthesize arole:"tool"turn withcontent = "[aish] tool dispatch failed: <error.message>"and the matchingtool_call_id. Also surface a status line for the user. This both keeps alternation legal and tells the model what happened so its next plan is informed. - HTTP-level failure (auth, unreachable, timeout) → same shape:
synthesize a
role:"tool"turn withcontent = "[aish] tool transport error: <reason>". Same alternation rationale.
This split resolves Q21 (with the C5/C7 review fix folded in).
Lifecycle
- Connect on startup (from
config.mcp.servers) — best effort; failures are status-logged once, don't abort aish, and the session is absent frommcp_sessionsuntil manually reconnected via:mcp connect. No automatic retry. "Connect" here means: do theinitializeround-trip + cachetools/listresults. :mcp connect <url>adds a session at runtime; alias auto-derived from hostname or supplied as second arg.:mcp disconnect <alias>drops cached state. There's no long-lived HTTP connection to close (every RPC was alreadyConnection: close).- On aish quit, sessions are just forgotten — nothing to clean up server-side.
- An unreachable server simply contributes no tools to the broker
request body — the model is not told that tools were "meant" to be
available. If
tools_schema()returns empty across all sessions, the broker omits thetoolsfield entirely.
5. Tool-Call Bridge
Broker request body (delta from Phase 1)
{
"model": "...",
"messages": [...],
"stream": true,
"temperature": 0.2,
"tools": [
{ "type":"function",
"function": { "name":"<alias>__<tool>",
"description":"...",
"parameters": <inputSchema> } },
...
]
}
The tools array is assembled by mcp.tools_schema() — flattens
tools/list results from every connected session, namespacing each tool
as <alias>__<name>.
Response handling (streaming)
llama.cpp / OpenAI deltas may include:
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
"function":{"name":"alias__tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
"function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}
broker.chat_stream accumulates tool-call deltas keyed by index; the
arguments field is a JSON-string that arrives chunked and is concatenated.
On finish_reason: tool_calls, the accumulated calls are emitted to
on_delta as kind="tool_call" with full payloads.
Index-absent fallback: per the OpenAI spec, index is REQUIRED on
streaming tool_calls[] deltas — but some local llama.cpp builds have
been reported to omit it for single-call streams. If a delta has
tool_calls but no index, treat it as index = 0 and accumulate
into the slot-0 buffer. Log a one-shot debug status the first time
this is observed per stream.
Re-injection into context
The assistant turn carries whatever text was streamed before
finish_reason: tool_calls (which may be non-empty — models often
say "Sure, let me look that up" before calling). The renderer flushes
that text first, then renders the tool-call frame around dispatch.
-- After tool execution
ctx:append({
role = "assistant",
content = accumulated_text, -- may be "" if model emitted no prose
tool_calls = { {id="call_…", name="alias__tool", arguments=<json-string>} },
})
ctx:append({
role = "tool",
tool_call_id = "call_…",
content = <tool-result-text-or-synthesized-error>,
})
to_messages() renders both shapes for the next broker request. The
strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
handled differently here — tool turns ARE expected to follow assistant
tool_calls per the OpenAI chat-template convention. If a model's
template still rejects this shape, fall back to the [tool: X] prefix
strategy used for exec output (Q18 below — fallback is plumbed via the
context.use_tool_role flag; default true).
Re-issuing the broker request
After tool turns are appended, the broker is called again with the
extended messages array. The model may emit more tool_calls, more
text, or both. Loop until the response has no tool_calls (i.e. a
plain text assistant turn).
Budget: a max-tool-call-depth setting (default 8) prevents runaway loops.
Hit-cap surfaces as a status: [aish] tool-call depth limit reached.
6. Authorization (safety.lua Phase 2 surface)
-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
function M.confirm_tool_call(name, args, cfg)
local policy = cfg.mcp and cfg.mcp.auto_approve or {}
if policy[name] then return true end
-- Per-server prefix check: "alias__*" entries
local alias = name:match("^([^.]+)%.")
if alias and policy[alias .. ".*"] then return true end
-- Otherwise prompt
local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
return ans:lower():sub(1,1) == "y"
end
Config schema (analyze-revised — Bearer auth fields added):
mcp = {
servers = {
boltzmann = {
url = "http://boltzmann.fritz.box:8080/mcp",
auth_env = "BOLTZMANN_MCP_TOKEN", -- read from env at startup
},
broglie = {
url = "http://broglie.fritz.box:8080/mcp",
-- no auth (LAN-only deployment)
},
nc = {
url = "https://nc.reauktion.de:8080/mcp",
auth_token = "literal-token-if-not-using-env", -- alternative
},
},
auto_approve = {
["boltzmann__read_file"] = true, -- specific tool
["broglie__*"] = true, -- whole server
},
max_tool_depth = 8,
}
Auth precedence per server: auth_token literal > auth_env indirection
nil (no Authorization header sent). Mirrors PHASE0 §10's
key_envconvention for cloud model API keys.
Norris mode (Phase 3) will extend this: when autonomous, the destructive-op heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
7. Meta Commands (Phase 2 additions)
| Command | Action |
|---|---|
:mcp connect <url> [<alias>] |
Open a session; perform initialize + tools/list; add to active set |
:mcp disconnect <alias> |
Close one session |
:mcp list |
Show connected sessions (alias, url, tool count, status) |
:mcp tools |
List tools across all sessions (alias__name — short description) |
:mcp tool <alias__name> |
Show one tool's full inputSchema (debug aid) |
Existing :help updated to list these.
8. System Prompt Augmentation
broker.lua's default system prompt grows by ~4 lines:
You may have access to MCP tools — they appear in this request's `tools`
field. Call a tool by emitting a tool_call; the result will be supplied
in the next turn. Use tools for structured operations (file reads,
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
when available; fall back to `CMD:` for anything not exposed as a tool.
The actual tool list is in the tools request-body field, not the
prompt. This avoids per-turn token bloat for the full schema.
§3 substrate invariants are unchanged. The CMD: extraction marker stays
the local-shell route; tools are the additive structured route.
9. Migration from Phase 1
User-visible changes:
- New
:mcp …meta commands when MCP servers are configured or connected at runtime. - Assistant responses may now invoke tools — user sees a confirm prompt
(similar to
CMD:execution gate) followed by an indented tool-call frame with the result. CMD:lines still work exactly as before for shell.
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
amended to add mcp.lua (no rename of any existing file). Adding
a new file is additive and preserves the §3 module-stability invariant
("File names are stable across phases — later phases fill in bodies,
not rename files"). The amendment ships in commit #1 of the §12 plan
(C6 in the review).
config.lua: existing configs without an mcp section continue to work
— no MCP servers means no tools sent in the broker request body, no
auth checks, no behavior change.
10. Out of Scope (Phase 2)
Per PHASE0.md §11, these belong elsewhere:
- Chuck Norris autonomous mode (Phase 3) — even though tool-calls enable richer autonomy, the autonomous policy is Phase 3's.
- Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only implements the per-call confirm-prompt surface.
memory.jsonlsummarization across sessions (Phase 4).- Multi-model routing / cloud fallback (Phase 5).
- Tree-sitter syntax highlighting (Phase 6).
Specifically out of Phase 2 scope despite proximity:
- Stdio-transport MCP servers (Q17 below).
- Parallel tool-call dispatch (Q20).
- MCP
resources/listandprompts/listcapabilities — Phase 2 v1 only implementstools/*. Resources/prompts deferred (probably Phase 4 alongside memory). - Server-sent
notifications/progressfor long-running tool calls — ignored in v1; status surface comes later.
11. Open Questions
| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q17 | mcp.lua API shape | Resolved at analyze. Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and listChanged: false removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here. |
|
| Q18 | Tool-result re-injection: standard OpenAI role:"tool" turn, or [tool: X] prefix to next user turn (matching the §6 exec-output pattern)? |
context.lua + broker.lua | Partly resolved. Live probe (2026-05-12, hossenfelder) shows role:"tool" accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is blocked by the hossenfelder proxy routing all model field values to the loaded fast model — see open-end below. Default v1 path: role:"tool" (standard); fallback to [tool: X] prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify. |
| Q19 | Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model? | context.lua + executor of tool-result | Phase 2 (plan); Phase 4 may refine with memory.jsonl |
| Q20 | Parallel tool_calls: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only? |
mcp.lua dispatch | Phase 2 (verify) — track for v2 |
| Q21 | mcp.lua + broker.lua | Resolved at analyze. lmcp distinguishes: result.isError=true (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC error (unknown method/tool, transport-level, surface as aish status). See §4. |
|
| Q22 | aish's own command surface as an MCP server | scope expansion | Out of Phase 2. Parked for Phase 4+ if interest stays. |
Open-end carried forward to Phase 7 (verify):
- Hossenfelder proxy
model-field bug (separate from aish): the proxy at:8082routes all requests to the loaded fast model regardless of the request'smodelfield — chunks return"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"even whenmistral-nemo-12b-instructwas asked for. This blocks live-verification of mistral-nemo's chat-template tool-role behavior. Tracked as aish#23 (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at aish#15 — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.
Resolved at formulate (above in §2 table):
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
- Q7 (MCP discovery) — both, config-declared default + runtime
:mcp connect. - Q8 (authorization) — per-call confirm default, per-tool/per-server
auto_approvepolicy. - Q9 (system-prompt augmentation) — hybrid: static frame + dynamic
toolsbody field. - Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.
Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
- Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
- Q21 (error mapping) — isError vs JSON-RPC error split per §4.
12. Implementation Plan (commit-by-commit)
Bottom-up — start with modules with the fewest dependencies, end with the REPL wiring that exercises everything together. Same shape as Phase 0 and Phase 1 implementation cadence.
Order
-
mcp.lua(new file) — JSON-RPC client.M.connect(url, opts),session:initialize()+:list_tools()+:call_tool(name, args)+:close(). Uses Phase 1'sffi/curl.M.postfor transport — same commit lands theM.postextension to return(body, status_code)per §3 row somcp.luacan distinguish HTTP401(non-JSON-RPC body{"error":"unauthorized"}) from JSON-RPC envelope errors. Per-server Bearer auth (auth_tokenliteral orauth_envindirection).:call_toolreturns(result_table, kind)wherekind ∈ {"ok","handler_error","rpc_error"}so callers route per §4. Test in isolation vialuajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'. Also amends PHASE0.md §4 to listmcp.luabetweenbroker.luaandrouter.luain the same commit (additive — preserves §3 module-stability invariant per §9). -
safety.lua— confirm-gate surface. Implement justM.confirm_tool_call(name, args, cfg)per §6. Readscfg.mcp.auto_approvefor exact-match andalias__*glob. Falls back torl.readlineprompt. Norris-mode hooks stay out (Phase 3). Test in isolation with mocked rl + various policy shapes. -
context.luaextensions. Three concrete edits per §3 row: (a) loosenContext:append's assert from "content required" to shape-per-role (assistant may have empty content iftool_callspresent;toolrequirestool_call_id+content); (b) preservetool_calls/tool_call_idin stored turns (not just role+content); (c) extendto_messages()to emit those fields. Add alternation assert (N4 in review).pending_exec_outputis unchanged: buffer persists across tool-call sub-loops; flushes on next genuine user turn (§3 row). Tests in isolation: (i) build a context with assistant+tool_calls + tool turns, round-trip throughto_messages(), eyeball JSON shape; (ii) day-one fallback test (N8) — same context withuse_tool_role = falsemust emit the[tool: alias__name]\n…prefix shape instead of arole:"tool"message. -
renderer.luaextensions. AddM.tool_call_begin(name, args)(top rule +name(json-snippet)indented dim) andM.tool_call_end(content, is_error)(bottom rule with dim/red status). Visual parity with the exec frame. Test visually with a one-liner. -
broker.luaextensions. Signature widens:chat_stream(cfg, msgs, on_delta, opts).opts.tools(optional array) is passed through to the request body; omitted entirely when nil or empty. The on_delta callback widens toon_delta(kind, payload)wherekind ∈ {"text","tool_call"}. Text path unchanged. Tool-call path: accumulator keyed byindex(default 0 if absent — C2), concatenatesfunction.argumentsuntilfinish_reason: "tool_calls", then emits oneon_delta("tool_call", {id,name,arguments})per completed call.M.chatshape unchanged in this phase (C1 in review — no caller for a polymorphic return). Test against hossenfelder withtoolsdeclared + streaming. -
repl.luawiring. New module-localmcp_sessions = {alias=session,...}, populated fromconfig.mcp.serversat startup. Helpers:tools_schema()→ flattentoollists across sessions, namespacealias__namedispatch_tool_call(call)→ splitalias__tool, look up session, call, return contentask_ailoop now: stream response → if any tool_calls completed, for each call:safety.confirm_tool_call→dispatch_tool_call→ append assistant-with-tool_calls + tool turn → re-callbroker.chat_stream→ repeat until pure-text response ormax_tool_depthreached- New meta cmds:
:mcp list,:mcp tools,:mcp tool <name>,:mcp connect <url> [alias],:mcp disconnect <alias>End-to-end test via the REPL against a real boltzmann lmcp + hossenfelder broker.
-
config.luaexample block. Add a commented-outmcp = { servers = { boltzmann = {...} }, auto_approve = {...} }example so users can see the shape. Not behavior-impacting; documentation only. Bundled with commit #6 if small or split if substantial.
Risk / non-obvious
-
Empty tools array. If
config.mcp.serversis absent or all connects fail, the broker request body must omittoolsentirely (some servers reject"tools": []). Don't send the field when empty. -
Connect-at-startup blocking. N servers × ~30 ms init+list. For N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged per server, don't abort aish. Parallel via coroutines is out of scope here — sequential is fine for v1.
-
Content blocks beyond text. lmcp returns
[{type:"text", text:...}]. The spec allowstype:"image" | "resource". Phase 2 v1 flattens by concatenating alltextblocks and ignoring non-text. Log a status warning if non-text blocks are seen. Adequate for boltzmann/hertz tools (all text); image/resource tools deferred. -
isError: falseon actual failure (baseline finding §3 of PHASE2-baseline.md). Pass content through unchanged; let the model read the error text. Do NOT short-circuit on the flag. -
JSON-RPC
errorfromtools/call. Surface as aish status AND synthesize arole:"tool"turn withcontent = "[aish] tool dispatch failed: <error.message>"and the matchingtool_call_id. The alternation rationale (§4) requires this — leaving the assistant'stool_callsorphaned breaks strict chat templates exactly the way PHASE0.md §6 warned about. The model receives the error and can re-plan within the same turn. -
Tool-call sub-loop bounds.
max_tool_depth(default 8) per ask_ai invocation. When hit, surface as status and break — append the assistant's last text (if any) and let the user reply. -
Argument JSON might be invalid. A model can stream malformed JSON in
function.arguments.dkjson.decodefailure → DO NOT execute on partial parse. Synthesize arole:"tool"turn withcontent = "[aish] tool arguments not parseable as JSON: <decode-err>"and the matchingtool_call_id(same alternation rationale as JSON-RPC error above; C7 in review). -
Q18 fallback path (strict templates rejecting
role:"tool"). Plumb acontext.use_tool_roleflag (default true). If a real-world rejection appears at Phase 7, flip the flag and convert tool turns to[tool: alias__name]\n<content>prefix on the next user turn (same pattern aspending_exec_output). Day-one verification (N8 in review): commit #3 includes a small in-isolation test that builds a context withuse_tool_role = false, appends an assistant+tool_calls turn followed by a tool result, and confirmsto_messages()emits the prefix shape instead of arole:"tool"turn. Keeps the fallback alive rather than dead-coded until Phase 7 first runs it under stress.
Test checkpoints
After each commit, verify with a targeted probe before moving on:
| Commit | Verify |
|---|---|
#1 mcp.lua |
luajit -e "local m=require('mcp'); ..." connects + lists tools against boltzmann lmcp |
#2 safety.lua |
unit-test policy lookup with mock rl: exact match → true; * glob → true; miss → prompt invoked |
#3 context.lua |
(i) round-trip a context with tool turns through to_messages(), eyeball JSON shape; (ii) day-one fallback test with use_tool_role = false emits the [tool: …] prefix shape (N8) |
#4 renderer.lua |
one-liner emits frame around fake tool result |
#5 broker.lua |
curl-compare: hand-built request body with tools matches broker.chat_stream(cfg, msgs, on_delta) body |
#6 repl.lua |
full REPL: :mcp list shows boltzmann; question that triggers list_dir round-trips through confirm + execution + model continuation |
#7 config.lua |
aish starts with example mcp section present; no MCP servers connected means no tools field sent |
Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller in surface — single new file plus targeted extensions. Tracked to land in one working session if the boltzmann proxy bugs don't intrude.
Resolved at review (2026-05-12)
- Q18 default —
use_tool_role = truedefaulted, fallback exercised day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if mistral-nemo (once #23 is fixed) rejects. :mcp connectre-fetch policy — v1 trusts thelistChanged: falsecapability; manual disconnect+reconnect is the workaround if a server's tools change. No automatic re-fetch.
Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)
Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against the formulate+analyze+plan draft. Resolutions applied in this revision:
- B1 context.lua impact widened — assert loosening + field
preservation +
to_messagesemit are now explicit in §3. - B2
ffi/curl.M.postextended to return(body, status_code)somcp.luadistinguishes HTTP401from JSON-RPC envelope errors. - B3
inputSchematypo fixed in §3 mcp.lua row. - B4
pending_exec_output× tool-call sub-loop interaction specified (persists across; flushes on next user turn). - B5 §3/§12 dependency contradiction resolved — broker takes
opts.toolsfrom the caller; no layering inversion. - C1
M.chatpolymorphic return dropped. - C2 Index-absent fallback specified (default to 0).
- C3 Re-injection example now stores accumulated text in the assistant turn, not hard-coded empty string.
- C4
:mcp connectfailure semantics specified (no auto-retry). - C5/C7 Both orphan-tool_calls scenarios now synthesize a
role:"tool"turn with[aish] tool dispatch failed: ...content to preserve alternation. - C6 §9 explicitly notes the §4 amendment is additive.
- N3 protocolVersion fallback specified (lmcp doesn't negotiate).
- N4 alternation assert added to context.lua row.
- N7 model-routing bug filed as aish#23.
- N8 day-one fallback test added to commit #3 checkpoints.
CONCERNs / NITs not folded (defended as wording-only, not load-bearing): N1, N2, N5, N6 — left as-is.
End of Phase 2 Manifest — aish