f26cbd9a3a
Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
630 lines
38 KiB
Markdown
630 lines
38 KiB
Markdown
# aish — Phase 2 Manifest
|
||
|
||
**Project:** aish — AI-augmented conversational shell
|
||
**Document:** Phase 2 Requirements, Architecture & Design Decisions
|
||
**Status:** Verify (Phase 7) — implementation complete; live testing in progress
|
||
**Date:** 2026-05-12
|
||
|
||
**Amendments since formulate:**
|
||
- 2026-05-12 (review fold-in): see §12 "Review fold-in" subsection.
|
||
- 2026-05-12 (Phase 7 verify, separator switch): tool-name namespace
|
||
delimiter changed from `.` to `__` because Anthropic via Bedrock
|
||
validates tool names against `^[a-zA-Z0-9_-]{1,128}$` — dots are
|
||
rejected with `HTTP 400 tools.0.custom.name: String should match
|
||
pattern '...'`. Discovered when `:model cloud` exercised TC #26
|
||
against the real cloud path. Internal API matches on-wire shape so
|
||
there's no transformation layer. Constraint: aliases must not
|
||
themselves contain `__` so the parse stays unambiguous (leftmost
|
||
`__` is the split point). Tool names from MCP servers may contain
|
||
underscores freely. All §3/§5/§6/§7/§12 references updated below.
|
||
|
||
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
|
||
manifest specifies what Phase 2 adds. Section numbers reference back to
|
||
PHASE0.md / PHASE1.md where relevant.
|
||
|
||
---
|
||
|
||
## 1. Scope of Phase 2
|
||
|
||
Three pillars per PHASE0.md §11 row 2:
|
||
|
||
1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
|
||
Target reference implementation: `lmcp`. Operations needed for v1:
|
||
`initialize`, `tools/list`, `tools/call`. Multiple servers may be
|
||
connected concurrently; tools are namespaced `<server>__<tool>`.
|
||
2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
|
||
`tools` in the request body; the model emits `tool_calls` in the
|
||
response; `mcp.lua` dispatches each call to the right server; the
|
||
tool result is fed back as a `role:"tool"` turn in `context.lua` and
|
||
the chat continues.
|
||
3. **Authorization gate** — `safety.lua` (PHASE0.md §4 stub) finally gets
|
||
implemented. Every tool call is confirmed by the user by default,
|
||
with per-tool and per-server `auto_approve` policies in `config.lua`.
|
||
|
||
**Phase 2 is done when:**
|
||
|
||
- aish can connect to at least one local `lmcp` server declared in
|
||
`config.lua` and one connected via `:mcp connect <url>` at runtime.
|
||
- `:mcp list` shows connected servers; `:mcp tools` shows discovered
|
||
tools across all servers.
|
||
- A model conversation can invoke a tool: the broker request carries
|
||
the live tools schema; the response's `tool_calls` are confirmed by
|
||
the user; each call dispatches to the right MCP server; the result
|
||
re-enters the chat; the model continues with the result available.
|
||
- `CMD:` extraction (PHASE0.md §6 substrate invariant) still works
|
||
unchanged — Phase 2 is additive, not replacing.
|
||
- A tool with `auto_approve = true` (in config) executes without the
|
||
confirm prompt; a non-approved tool still prompts.
|
||
|
||
---
|
||
|
||
## 2. Technology Decisions (delta from Phase 1)
|
||
|
||
| Decision | Choice | Rationale |
|
||
|---|---|---|
|
||
| MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
|
||
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
|
||
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
|
||
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
|
||
| Tool namespacing | `<server-alias>__<tool-name>` for both the wire-level tool name and `:mcp tools` listing (was `.` at formulate; switched 2026-05-12 — see Amendments above) | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. `__` (two underscores) is within Bedrock's tool-name regex `^[a-zA-Z0-9_-]{1,128}$` whereas `.` is not. Aliases must not themselves contain `__`. |
|
||
| `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
|
||
| Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
|
||
| System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
|
||
| Tool-call streaming | Streaming-from-day-one — `broker.chat_stream`'s on_delta callback widens to handle `tool_calls` deltas in addition to text deltas | Resolves Q10. Phase 1 SSE landed first, so we're not retrofitting; we just extend the parser. **Wire shape confirmed at analyze** (2026-05-12 probe vs hossenfelder): `delta.tool_calls[]` arrives indexed; id+type+function.name appear on the opening delta; `function.arguments` is a JSON-string that arrives in character-fragment chunks; finish_reason "tool_calls" closes the call. Accumulator strategy matches §5. |
|
||
| Tool-call concurrency | Sequential dispatch in Phase 2 v1 — process `tool_calls[0]` to completion, then `[1]`, etc. | Simpler error handling; tool effects often order-dependent (e.g. write-then-read). Parallel dispatch deferred (see Q20). |
|
||
| MCP server lifecycle | aish does not manage MCP server processes (parallel to PHASE0.md §12 llama.cpp rule) | Declared in config or connected by URL; aish is a client only. |
|
||
|
||
---
|
||
|
||
## 3. Module Changes
|
||
|
||
| File | State after Phase 1 | Phase 2 changes |
|
||
|---|---|---|
|
||
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, inputSchema}]`, `session:call_tool(name, args) -> (result_table, kind)` where `kind ∈ {"ok","handler_error","rpc_error"}` so callers can route the response per §4's error split, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's `401 {"error":"unauthorized"}` body, which is NOT JSON-RPC-shaped — has no `jsonrpc`/`id` fields) from JSON-RPC envelope errors; needs `ffi/curl.M.post` extended to return status code (see ffi/curl.lua row). |
|
||
| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
|
||
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Signature widens to `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional array of `{type, function:{name, description, parameters}}`) is passed through to the request body; **omitted entirely if absent or empty** (some servers reject `"tools": []`). The on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`. **`broker.lua` does NOT depend on `mcp.lua`** — repl assembles the tools array and passes it in; broker stays a transport layer. `M.chat` (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it). |
|
||
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
|
||
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
|
||
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
|
||
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias__tool"] = true } }`. Documented in §6 below. |
|
||
| `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string. `ffi.cdef` exposes only `curl_easy_setopt` — no `curl_easy_getinfo` (cdef block at curl.lua:11-28). | **One small extension**: `M.post` returns **`(body, status_code)` on transport success** (status_code may be non-2xx — caller decides what to do; mcp.lua treats `>= 400` as transport failure). `(nil, errmsg)` on libcurl-level failure is **unchanged** — Phase 1 callers that read only the first slot stay correct. Requires adding `curl_easy_getinfo` + `CURLINFO_RESPONSE_CODE` (decimal 2097154, `CURLINFOTYPE_LONG | 2`) to the `ffi.cdef` block, plus a `long[1]` out-param shim. MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
|
||
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
|
||
|
||
§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
|
||
`router.lua` in the §4 table. Same commit lands the manifest amendment.
|
||
|
||
---
|
||
|
||
## 4. MCP Transport (analyze findings — lmcp v0.5.4)
|
||
|
||
lmcp implements only the **synchronous POST** flavor of the MCP
|
||
streamable-HTTP spec. Each RPC is one HTTP transaction:
|
||
|
||
```
|
||
client → server: POST /mcp Content-Type: application/json
|
||
Accept: application/json
|
||
Authorization: Bearer <token>
|
||
Body: { jsonrpc:"2.0", id, method, params }
|
||
Returns: { jsonrpc, id, result | error }
|
||
Connection: close
|
||
```
|
||
|
||
lmcp's `GET /mcp` exists but only sends a one-shot `event: endpoint`
|
||
announcing the POST URL, then closes — there is no held-open
|
||
server→client channel. Combined with the `listChanged: false`
|
||
capability lmcp announces in `initialize`, **aish does not open a
|
||
persistent SSE channel** to lmcp servers in v1. Notifications-from-server
|
||
are out of scope here; track for v2 if a richer server appears.
|
||
|
||
### Handshake
|
||
|
||
1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`.
|
||
2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`.
|
||
3. **Version mismatch**: lmcp ignores client's `protocolVersion` and always returns its compiled-in `MCP_VERSION` (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status (`[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding`) and continues. v1 has no version-gated behavior.
|
||
4. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
|
||
|
||
### Tool discovery
|
||
|
||
1. `tools/list` RPC → `{ tools: [{ name, description, inputSchema }] }`.
|
||
2. Cache per-session **for the session lifetime** — lmcp announces
|
||
`listChanged: false`, so there's no need to refetch or listen for
|
||
change notifications.
|
||
|
||
### Tool invocation
|
||
|
||
**Content flattening**: tool results return `content: [{type, ...}, ...]`.
|
||
lmcp v0.5.4 only emits `type: "text"`, but the spec also allows
|
||
`"image"` and `"resource"`. Phase 2 v1 **concatenates all `text` blocks**
|
||
into a single string for the `role:"tool"` turn body and **ignores
|
||
non-text blocks**, logging a one-shot status warning when a non-text
|
||
block is observed. Image/resource handling is deferred. See §12
|
||
"Content blocks beyond text" for the corresponding risk note.
|
||
|
||
`tools/call` with `{ name, arguments }`. Failure has three flavors and
|
||
all of them result in **a `role:"tool"` turn being appended** so the
|
||
assistant's `tool_calls` is never left orphaned in context (strict
|
||
templates reject `assistant.tool_calls` without a matching `tool`
|
||
reply — same gotcha PHASE0.md §6 warned about):
|
||
|
||
- **Tool-handler exception** → JSON-RPC `result` with `isError: true`
|
||
and `content: [{ type:"text", text: "Error: ..." }]`. Feed
|
||
`content` straight back as the `role:"tool"` turn body. Model-recoverable.
|
||
- **Baseline `isError: false` on actual failure** (PHASE2-baseline.md §3
|
||
found this — boltzmann's `read_file` returns content text containing
|
||
"Error: ..." but `isError: false`). Pass content through unchanged —
|
||
let the model read the text. `isError` is advisory, not authoritative.
|
||
- **JSON-RPC envelope error** (e.g. `{code: -32601, message: "Tool not
|
||
found"}`) → synthesize a `role:"tool"` turn with
|
||
`content = "[aish] tool dispatch failed: <error.message>"` and the
|
||
matching `tool_call_id`. Also surface a status line for the user.
|
||
This both keeps alternation legal and tells the model what happened
|
||
so its next plan is informed.
|
||
- **HTTP-level failure** (auth, unreachable, timeout) → same shape:
|
||
synthesize a `role:"tool"` turn with
|
||
`content = "[aish] tool transport error: <reason>"`. Same alternation
|
||
rationale.
|
||
|
||
This split resolves Q21 (with the C5/C7 review fix folded in).
|
||
|
||
### Lifecycle
|
||
|
||
- Connect on startup (from `config.mcp.servers`) — best effort; failures
|
||
are status-logged once, don't abort aish, and the session is **absent
|
||
from `mcp_sessions` until manually reconnected via `:mcp connect`**.
|
||
No automatic retry. "Connect" here means: do the `initialize`
|
||
round-trip + cache `tools/list` results.
|
||
- `:mcp connect <url>` adds a session at runtime; alias auto-derived
|
||
from hostname or supplied as second arg.
|
||
- `:mcp disconnect <alias>` drops cached state. There's no long-lived
|
||
HTTP connection to close (every RPC was already `Connection: close`).
|
||
- On aish quit, sessions are just forgotten — nothing to clean up
|
||
server-side.
|
||
- An unreachable server simply contributes no tools to the broker
|
||
request body — the model is not told that tools were "meant" to be
|
||
available. If `tools_schema()` returns empty across all sessions, the
|
||
broker omits the `tools` field entirely.
|
||
|
||
---
|
||
|
||
## 5. Tool-Call Bridge
|
||
|
||
### Broker request body (delta from Phase 1)
|
||
|
||
```json
|
||
{
|
||
"model": "...",
|
||
"messages": [...],
|
||
"stream": true,
|
||
"temperature": 0.2,
|
||
"tools": [
|
||
{ "type":"function",
|
||
"function": { "name":"<alias>__<tool>",
|
||
"description":"...",
|
||
"parameters": <inputSchema> } },
|
||
...
|
||
]
|
||
}
|
||
```
|
||
|
||
The `tools` array is assembled by `mcp.tools_schema()` — flattens
|
||
`tools/list` results from every connected session, namespacing each tool
|
||
as `<alias>__<name>`.
|
||
|
||
### Response handling (streaming)
|
||
|
||
llama.cpp / OpenAI deltas may include:
|
||
|
||
```json
|
||
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
|
||
"function":{"name":"alias__tool","arguments":"{\"a\":"}}]}}]}
|
||
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
|
||
"function":{"arguments":"1}"}}]}}]}
|
||
data: {"choices":[{"finish_reason":"tool_calls",...}]}
|
||
```
|
||
|
||
`broker.chat_stream` accumulates tool-call deltas keyed by `index`; the
|
||
`arguments` field is a JSON-string that arrives chunked and is concatenated.
|
||
On `finish_reason: tool_calls`, the accumulated calls are emitted to
|
||
on_delta as `kind="tool_call"` with full payloads.
|
||
|
||
**Index-absent fallback**: per the OpenAI spec, `index` is REQUIRED on
|
||
streaming `tool_calls[]` deltas — but some local llama.cpp builds have
|
||
been reported to omit it for single-call streams. If a delta has
|
||
`tool_calls` but no `index`, treat it as `index = 0` and accumulate
|
||
into the slot-0 buffer. Log a one-shot debug status the first time
|
||
this is observed per stream.
|
||
|
||
### Re-injection into context
|
||
|
||
The assistant turn carries **whatever text was streamed before
|
||
`finish_reason: tool_calls`** (which may be non-empty — models often
|
||
say "Sure, let me look that up" before calling). The renderer flushes
|
||
that text first, then renders the tool-call frame around dispatch.
|
||
|
||
```lua
|
||
-- After tool execution
|
||
ctx:append({
|
||
role = "assistant",
|
||
content = accumulated_text, -- may be "" if model emitted no prose
|
||
tool_calls = { {id="call_…", name="alias__tool", arguments=<json-string>} },
|
||
})
|
||
ctx:append({
|
||
role = "tool",
|
||
tool_call_id = "call_…",
|
||
content = <tool-result-text-or-synthesized-error>,
|
||
})
|
||
```
|
||
|
||
`to_messages()` renders both shapes for the next broker request. The
|
||
strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
|
||
handled differently here — tool turns ARE expected to follow assistant
|
||
tool_calls per the OpenAI chat-template convention. If a model's
|
||
template still rejects this shape, fall back to the `[tool: X]` prefix
|
||
strategy used for exec output (Q18 below — fallback is plumbed via the
|
||
`context.use_tool_role` flag; default `true`).
|
||
|
||
### Re-issuing the broker request
|
||
|
||
After tool turns are appended, the broker is called again with the
|
||
extended messages array. The model may emit more `tool_calls`, more
|
||
text, or both. Loop until the response has no `tool_calls` (i.e. a
|
||
plain text assistant turn).
|
||
|
||
Budget: a max-tool-call-depth setting (default 8) prevents runaway loops.
|
||
Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
|
||
|
||
---
|
||
|
||
## 6. Authorization (safety.lua Phase 2 surface)
|
||
|
||
```lua
|
||
-- safety.confirm_tool_call(tool_name, args_table, config) -> bool
|
||
function M.confirm_tool_call(name, args, cfg)
|
||
local policy = cfg.mcp and cfg.mcp.auto_approve or {}
|
||
if policy[name] then return true end
|
||
-- Per-server prefix check: "alias__*" entries
|
||
local alias = name:match("^([^.]+)%.")
|
||
if alias and policy[alias .. ".*"] then return true end
|
||
-- Otherwise prompt
|
||
local pretty = name .. "(" .. (#args > 0 and "..." or "") .. ")"
|
||
local ans = rl.readline(("call '%s'? [y/N] "):format(pretty)) or ""
|
||
return ans:lower():sub(1,1) == "y"
|
||
end
|
||
```
|
||
|
||
Config schema (analyze-revised — Bearer auth fields added):
|
||
|
||
```lua
|
||
mcp = {
|
||
servers = {
|
||
boltzmann = {
|
||
url = "http://boltzmann.fritz.box:8080/mcp",
|
||
auth_env = "BOLTZMANN_MCP_TOKEN", -- read from env at startup
|
||
},
|
||
broglie = {
|
||
url = "http://broglie.fritz.box:8080/mcp",
|
||
-- no auth (LAN-only deployment)
|
||
},
|
||
nc = {
|
||
url = "https://nc.reauktion.de:8080/mcp",
|
||
auth_token = "literal-token-if-not-using-env", -- alternative
|
||
},
|
||
},
|
||
auto_approve = {
|
||
["boltzmann__read_file"] = true, -- specific tool
|
||
["broglie__*"] = true, -- whole server
|
||
},
|
||
max_tool_depth = 8,
|
||
}
|
||
```
|
||
|
||
Auth precedence per server: `auth_token` literal > `auth_env` indirection
|
||
> nil (no Authorization header sent). Mirrors PHASE0 §10's `key_env`
|
||
convention for cloud model API keys.
|
||
|
||
Norris mode (Phase 3) will extend this: when autonomous, the destructive-op
|
||
heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
|
||
|
||
---
|
||
|
||
## 7. Meta Commands (Phase 2 additions)
|
||
|
||
| Command | Action |
|
||
|---|---|
|
||
| `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
|
||
| `:mcp disconnect <alias>` | Close one session |
|
||
| `:mcp list` | Show connected sessions (alias, url, tool count, status) |
|
||
| `:mcp tools` | List tools across all sessions (`alias__name` — short description) |
|
||
| `:mcp tool <alias__name>` | Show one tool's full inputSchema (debug aid) |
|
||
|
||
Existing `:help` updated to list these.
|
||
|
||
---
|
||
|
||
## 8. System Prompt Augmentation
|
||
|
||
`broker.lua`'s default system prompt grows by ~4 lines:
|
||
|
||
```
|
||
You may have access to MCP tools — they appear in this request's `tools`
|
||
field. Call a tool by emitting a tool_call; the result will be supplied
|
||
in the next turn. Use tools for structured operations (file reads,
|
||
queries, etc.) and `CMD:` lines for local shell commands. Prefer tools
|
||
when available; fall back to `CMD:` for anything not exposed as a tool.
|
||
```
|
||
|
||
The actual tool list is in the `tools` request-body field, not the
|
||
prompt. This avoids per-turn token bloat for the full schema.
|
||
|
||
§3 substrate invariants are unchanged. The `CMD:` extraction marker stays
|
||
the local-shell route; tools are the additive structured route.
|
||
|
||
---
|
||
|
||
## 9. Migration from Phase 1
|
||
|
||
User-visible changes:
|
||
- New `:mcp …` meta commands when MCP servers are configured or
|
||
connected at runtime.
|
||
- Assistant responses may now invoke tools — user sees a confirm prompt
|
||
(similar to `CMD:` execution gate) followed by an indented tool-call
|
||
frame with the result.
|
||
- `CMD:` lines still work exactly as before for shell.
|
||
|
||
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
|
||
amended to **add** `mcp.lua` (no rename of any existing file). Adding
|
||
a new file is additive and preserves the §3 module-stability invariant
|
||
("File names are stable across phases — later phases fill in bodies,
|
||
not rename files"). The amendment ships in commit #1 of the §12 plan
|
||
(C6 in the review).
|
||
|
||
`config.lua`: existing configs without an `mcp` section continue to work
|
||
— no MCP servers means no tools sent in the broker request body, no
|
||
auth checks, no behavior change.
|
||
|
||
---
|
||
|
||
## 10. Out of Scope (Phase 2)
|
||
|
||
Per PHASE0.md §11, these belong elsewhere:
|
||
- Chuck Norris autonomous mode (Phase 3) — even though tool-calls
|
||
enable richer autonomy, the *autonomous policy* is Phase 3's.
|
||
- Destructive-op heuristic in safety.lua (Phase 3) — Phase 2 only
|
||
implements the per-call confirm-prompt surface.
|
||
- `memory.jsonl` summarization across sessions (Phase 4).
|
||
- Multi-model routing / cloud fallback (Phase 5).
|
||
- Tree-sitter syntax highlighting (Phase 6).
|
||
|
||
Specifically out of Phase 2 scope despite proximity:
|
||
- Stdio-transport MCP servers (Q17 below).
|
||
- Parallel tool-call dispatch (Q20).
|
||
- MCP `resources/list` and `prompts/list` capabilities — Phase 2
|
||
v1 only implements `tools/*`. Resources/prompts deferred (probably
|
||
Phase 4 alongside memory).
|
||
- Server-sent `notifications/progress` for long-running tool calls —
|
||
ignored in v1; status surface comes later.
|
||
|
||
---
|
||
|
||
## 11. Open Questions
|
||
|
||
| # | Question | Impact | Resolve by |
|
||
|---|---|---|---|
|
||
| Q17 | ~~MCP transport abstraction: stdio vs HTTP+SSE~~ | mcp.lua API shape | **Resolved at analyze.** Hard-code POST-only HTTP for v1. lmcp doesn't use the long-lived SSE channel and `listChanged: false` removes any v1 need for it. Stdio transport tracked as Phase 2.1 / out-of-scope here. |
|
||
| Q18 | Tool-result re-injection: standard OpenAI `role:"tool"` turn, or `[tool: X]` prefix to next user turn (matching the §6 exec-output pattern)? | context.lua + broker.lua | **Partly resolved.** Live probe (2026-05-12, hossenfelder) shows `role:"tool"` accepted by the proxy + the loaded model (qwen2.5-coder-1.5b). Mistral-nemo-specific template testing is **blocked** by the hossenfelder proxy routing all `model` field values to the loaded fast model — see open-end below. Default v1 path: `role:"tool"` (standard); fallback to `[tool: X]` prefix is plumbed but unused unless a strict template rejects it during Phase 7 verify. |
|
||
| Q19 | Large tool-result payloads: pass-through, truncate at N chars, or summarize via fast model? | context.lua + executor of tool-result | Phase 2 (plan); Phase 4 may refine with memory.jsonl |
|
||
| Q20 | Parallel `tool_calls`: sequential v1 is safe; spec allows parallel. Move to parallel when both calls are read-only? | mcp.lua dispatch | Phase 2 (verify) — track for v2 |
|
||
| Q21 | ~~MCP error mapping~~ | mcp.lua + broker.lua | **Resolved at analyze.** lmcp distinguishes: `result.isError=true` (handler exception, model-recoverable, feed back as tool turn content) vs JSON-RPC `error` (unknown method/tool, transport-level, surface as aish status). See §4. |
|
||
| Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. |
|
||
|
||
Open-end carried forward to Phase 7 (verify):
|
||
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Tracked as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23) (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15) — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.
|
||
|
||
Resolved at formulate (above in §2 table):
|
||
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
|
||
- Q7 (MCP discovery) — both, config-declared default + runtime `:mcp connect`.
|
||
- Q8 (authorization) — per-call confirm default, per-tool/per-server `auto_approve` policy.
|
||
- Q9 (system-prompt augmentation) — hybrid: static frame + dynamic `tools` body field.
|
||
- Q10 (tool-call streaming) — streaming-from-day-one on top of Phase 1 SSE.
|
||
|
||
Resolved at analyze (2026-05-12, live probes vs lmcp v0.5.4 + hossenfelder):
|
||
- Q17 (transport abstraction) — POST-only, no SSE channel needed for lmcp.
|
||
- Q21 (error mapping) — isError vs JSON-RPC error split per §4.
|
||
|
||
---
|
||
|
||
## 12. Implementation Plan (commit-by-commit)
|
||
|
||
Bottom-up — start with modules with the fewest dependencies, end with the
|
||
REPL wiring that exercises everything together. Same shape as Phase 0
|
||
and Phase 1 implementation cadence.
|
||
|
||
### Order
|
||
|
||
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
|
||
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
|
||
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport — **same
|
||
commit lands the `M.post` extension to return `(body, status_code)`
|
||
per §3 row** so `mcp.lua` can distinguish HTTP `401` (non-JSON-RPC
|
||
body `{"error":"unauthorized"}`) from JSON-RPC envelope errors.
|
||
Per-server Bearer auth (`auth_token` literal or `auth_env`
|
||
indirection). `:call_tool` returns `(result_table, kind)` where
|
||
`kind ∈ {"ok","handler_error","rpc_error"}` so callers route per
|
||
§4. **Test in isolation** via
|
||
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
|
||
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
|
||
`router.lua` in the same commit (additive — preserves §3
|
||
module-stability invariant per §9).
|
||
|
||
2. **`safety.lua` — confirm-gate surface.** Implement just
|
||
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
|
||
`cfg.mcp.auto_approve` for exact-match and `alias__*` glob. Falls back
|
||
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
|
||
in isolation** with mocked rl + various policy shapes.
|
||
|
||
3. **`context.lua` extensions.** Three concrete edits per §3 row:
|
||
(a) loosen `Context:append`'s assert from "content required" to
|
||
shape-per-role (assistant may have empty content if `tool_calls`
|
||
present; `tool` requires `tool_call_id` + `content`); (b) preserve
|
||
`tool_calls` / `tool_call_id` in stored turns (not just role+content);
|
||
(c) extend `to_messages()` to emit those fields. Add alternation
|
||
assert (N4 in review). `pending_exec_output` is **unchanged**:
|
||
buffer persists across tool-call sub-loops; flushes on next genuine
|
||
user turn (§3 row). **Tests in isolation**: (i) build a context with
|
||
assistant+tool_calls + tool turns, round-trip through `to_messages()`,
|
||
eyeball JSON shape; (ii) day-one fallback test (N8) — same context
|
||
with `use_tool_role = false` must emit the `[tool: alias__name]\n…`
|
||
prefix shape instead of a `role:"tool"` message.
|
||
|
||
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
|
||
(top rule + `name(json-snippet)` indented dim) and
|
||
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
|
||
Visual parity with the exec frame. **Test visually** with a one-liner.
|
||
|
||
5. **`broker.lua` extensions.** Signature widens:
|
||
`chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional
|
||
array) is passed through to the request body; **omitted entirely
|
||
when nil or empty**. The on_delta callback widens to
|
||
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`.
|
||
Text path unchanged. Tool-call path: accumulator keyed by `index`
|
||
(default 0 if absent — C2), concatenates `function.arguments` until
|
||
`finish_reason: "tool_calls"`, then emits one
|
||
`on_delta("tool_call", {id,name,arguments})` per completed call.
|
||
**`M.chat` shape unchanged** in this phase (C1 in review — no
|
||
caller for a polymorphic return). **Test against hossenfelder**
|
||
with `tools` declared + streaming.
|
||
|
||
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
|
||
populated from `config.mcp.servers` at startup. Helpers:
|
||
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias__name`
|
||
- `dispatch_tool_call(call)` → split `alias__tool`, look up session, call, return content
|
||
- `ask_ai` loop now: stream response → if any tool_calls completed,
|
||
for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
|
||
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
|
||
→ repeat until pure-text response or `max_tool_depth` reached
|
||
- New meta cmds: `:mcp list`, `:mcp tools`, `:mcp tool <name>`,
|
||
`:mcp connect <url> [alias]`, `:mcp disconnect <alias>`
|
||
**End-to-end test** via the REPL against a real boltzmann lmcp +
|
||
hossenfelder broker.
|
||
|
||
7. **`config.lua` example block.** Add a commented-out `mcp = { servers
|
||
= { boltzmann = {...} }, auto_approve = {...} }` example so users can
|
||
see the shape. Not behavior-impacting; documentation only. Bundled
|
||
with commit #6 if small or split if substantial.
|
||
|
||
### Risk / non-obvious
|
||
|
||
- **Empty tools array.** If `config.mcp.servers` is absent or all
|
||
connects fail, the broker request body must **omit** `tools`
|
||
entirely (some servers reject `"tools": []`). Don't send the field
|
||
when empty.
|
||
|
||
- **Connect-at-startup blocking.** N servers × ~30 ms init+list. For
|
||
N ≤ 3 (typical) the 90 ms is acceptable. Failures are status-logged
|
||
per server, don't abort aish. Parallel via coroutines is out of scope
|
||
here — sequential is fine for v1.
|
||
|
||
- **Content blocks beyond text.** lmcp returns `[{type:"text", text:...}]`.
|
||
The spec allows `type:"image" | "resource"`. Phase 2 v1 flattens by
|
||
concatenating all `text` blocks and ignoring non-text. Log a status
|
||
warning if non-text blocks are seen. Adequate for boltzmann/hertz
|
||
tools (all text); image/resource tools deferred.
|
||
|
||
- **`isError: false` on actual failure** (baseline finding §3 of
|
||
PHASE2-baseline.md). Pass content through unchanged; let the model
|
||
read the error text. Do NOT short-circuit on the flag.
|
||
|
||
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
|
||
AND synthesize a `role:"tool"` turn with
|
||
`content = "[aish] tool dispatch failed: <error.message>"` and the
|
||
matching `tool_call_id`. The alternation rationale (§4) requires
|
||
this — leaving the assistant's `tool_calls` orphaned breaks strict
|
||
chat templates exactly the way PHASE0.md §6 warned about. The model
|
||
receives the error and can re-plan within the same turn.
|
||
|
||
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
|
||
invocation. When hit, surface as status and break — append the
|
||
assistant's last text (if any) and let the user reply.
|
||
|
||
- **Argument JSON might be invalid.** A model can stream malformed JSON
|
||
in `function.arguments`. `dkjson.decode` failure → DO NOT execute on
|
||
partial parse. Synthesize a `role:"tool"` turn with
|
||
`content = "[aish] tool arguments not parseable as JSON: <decode-err>"`
|
||
and the matching `tool_call_id` (same alternation rationale as
|
||
JSON-RPC error above; C7 in review).
|
||
|
||
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
|
||
Plumb a `context.use_tool_role` flag (default true). If a real-world
|
||
rejection appears at Phase 7, flip the flag and convert tool turns to
|
||
`[tool: alias__name]\n<content>` prefix on the next user turn (same
|
||
pattern as `pending_exec_output`). **Day-one verification** (N8 in
|
||
review): commit #3 includes a small in-isolation test that builds a
|
||
context with `use_tool_role = false`, appends an assistant+tool_calls
|
||
turn followed by a tool result, and confirms `to_messages()` emits
|
||
the prefix shape instead of a `role:"tool"` turn. Keeps the fallback
|
||
alive rather than dead-coded until Phase 7 first runs it under stress.
|
||
|
||
### Test checkpoints
|
||
|
||
After each commit, verify with a targeted probe before moving on:
|
||
|
||
| Commit | Verify |
|
||
|---|---|
|
||
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
|
||
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
|
||
| #3 `context.lua` | (i) round-trip a context with tool turns through `to_messages()`, eyeball JSON shape; (ii) day-one fallback test with `use_tool_role = false` emits the `[tool: …]` prefix shape (N8) |
|
||
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
|
||
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
|
||
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
|
||
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
|
||
|
||
### Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)
|
||
|
||
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
|
||
in surface — single new file plus targeted extensions. Tracked to land
|
||
in one working session if the boltzmann proxy bugs don't intrude.
|
||
|
||
### Resolved at review (2026-05-12)
|
||
|
||
- **Q18 default** — `use_tool_role = true` defaulted, fallback exercised
|
||
day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if
|
||
mistral-nemo (once #23 is fixed) rejects.
|
||
- **`:mcp connect` re-fetch policy** — v1 trusts the `listChanged: false`
|
||
capability; manual disconnect+reconnect is the workaround if a server's
|
||
tools change. No automatic re-fetch.
|
||
|
||
### Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)
|
||
|
||
Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against
|
||
the formulate+analyze+plan draft. Resolutions applied in this revision:
|
||
|
||
- **B1** context.lua impact widened — assert loosening + field
|
||
preservation + `to_messages` emit are now explicit in §3.
|
||
- **B2** `ffi/curl.M.post` extended to return `(body, status_code)` so
|
||
`mcp.lua` distinguishes HTTP `401` from JSON-RPC envelope errors.
|
||
- **B3** `inputSchema` typo fixed in §3 mcp.lua row.
|
||
- **B4** `pending_exec_output` × tool-call sub-loop interaction
|
||
specified (persists across; flushes on next user turn).
|
||
- **B5** §3/§12 dependency contradiction resolved — broker takes
|
||
`opts.tools` from the caller; no layering inversion.
|
||
- **C1** `M.chat` polymorphic return dropped.
|
||
- **C2** Index-absent fallback specified (default to 0).
|
||
- **C3** Re-injection example now stores accumulated text in the
|
||
assistant turn, not hard-coded empty string.
|
||
- **C4** `:mcp connect` failure semantics specified (no auto-retry).
|
||
- **C5/C7** Both orphan-tool_calls scenarios now synthesize a
|
||
`role:"tool"` turn with `[aish] tool dispatch failed: ...` content
|
||
to preserve alternation.
|
||
- **C6** §9 explicitly notes the §4 amendment is additive.
|
||
- **N3** protocolVersion fallback specified (lmcp doesn't negotiate).
|
||
- **N4** alternation assert added to context.lua row.
|
||
- **N7** model-routing bug filed as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23).
|
||
- **N8** day-one fallback test added to commit #3 checkpoints.
|
||
|
||
CONCERNs / NITs not folded (defended as wording-only, not load-bearing):
|
||
N1, N2, N5, N6 — left as-is.
|
||
|
||
---
|
||
|
||
*End of Phase 2 Manifest — aish*
|