docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs
Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:
BLOCKERs:
B1 context.lua impact widened — Phase 1 :append asserts content and
discards extra fields. Need (a) shape-per-role assert, (b) preserve
tool_calls/tool_call_id on store, (c) emit from to_messages().
B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
401 returns a non-JSON-RPC body that would have been mis-decoded.
B3 §3 typo schema -> inputSchema.
B4 pending_exec_output × tool-call sub-loop interaction specified.
B5 §3/§12 broker dependency contradiction — broker takes opts.tools
from caller; no layering inversion.
CONCERNs:
C1 M.chat return polymorphism dropped (no consumer).
C2 tool_calls[].index absent fallback: default to 0.
C3 Re-injection stores accumulated text, not hard-coded empty.
C4 :mcp connect failure: no auto-retry, status-log once.
C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
role:"tool" turn — keeps strict-template alternation legal
exactly the way PHASE0 §6 demanded for exec output.
C6 §9 confirms §4 amendment is additive (preserves §3 invariant).
NITs:
N3 protocolVersion fallback (lmcp doesn't negotiate).
N4 Alternation assert in Context:append.
N7 Model-routing bug filed as aish#23.
N8 Day-one fallback test for use_tool_role=false in commit #3.
Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+151
-66
@@ -2,7 +2,7 @@
|
||||
|
||||
**Project:** aish — AI-augmented conversational shell
|
||||
**Document:** Phase 2 Requirements, Architecture & Design Decisions
|
||||
**Status:** Analyze (formulate complete; live-probed against lmcp v0.5.4 + hossenfelder proxy)
|
||||
**Status:** Plan (review pass folded in 2026-05-12)
|
||||
**Date:** 2026-05-12
|
||||
|
||||
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
|
||||
@@ -50,7 +50,7 @@ Three pillars per PHASE0.md §11 row 2:
|
||||
| Decision | Choice | Rationale |
|
||||
|---|---|---|
|
||||
| MCP transport | HTTP POST per RPC, `Connection: close` per response, **no long-lived SSE GET channel** in v1 | Analyze finding (2026-05-12): lmcp v0.5.4 only implements the trivial POST-and-respond flavor of the spec's streamable-HTTP transport. Its GET /mcp endpoint announces the POST endpoint then closes — there's no server→client notification channel to listen on. Combined with lmcp's `capabilities.tools.listChanged = false`, aish doesn't need an SSE GET listener at all for lmcp. Stdio transport is left for a possible Phase 2.1 if a stdio-only MCP server becomes necessary. |
|
||||
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION`. aish sends the same in `initialize`; future model bumps are negotiated at connect time. |
|
||||
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
|
||||
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
|
||||
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
|
||||
| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
|
||||
@@ -67,14 +67,14 @@ Three pillars per PHASE0.md §11 row 2:
|
||||
|
||||
| File | State after Phase 1 | Phase 2 changes |
|
||||
|---|---|---|
|
||||
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, schema}]`, `session:call_tool(name, args) -> result | tool_error`, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. |
|
||||
| `mcp.lua` | **New file** (not in PHASE0 §4 layout; this Phase amends the layout to add it) | Implement: `M.connect(url, opts) -> session` (opts: `alias`, `auth_token`, `auth_env`), `session:initialize()`, `session:list_tools() -> [{name, description, inputSchema}]`, `session:call_tool(name, args) -> (result_table, kind)` where `kind ∈ {"ok","handler_error","rpc_error"}` so callers can route the response per §4's error split, `session:close()`. JSON-RPC 2.0 over HTTP POST (`Content-Type: application/json`, `Accept: application/json`, `Authorization: Bearer <token>`). Per-session state: alias, base-url, auth, tools-cache, request-ID counter. No persistent SSE channel — POST is one-shot per RPC. Distinguishes HTTP-level failure (e.g. lmcp's `401 {"error":"unauthorized"}` body, which is NOT JSON-RPC-shaped — has no `jsonrpc`/`id` fields) from JSON-RPC envelope errors; needs `ffi/curl.M.post` extended to return status code (see ffi/curl.lua row). |
|
||||
| `safety.lua` | Stub | Implement Phase 2 surface only: `M.confirm_tool_call(tool_name, args, policy) -> bool`. Reads `config.mcp.auto_approve` (per-tool and per-server) before prompting. Norris destructive-op heuristic and HALT gate stay Phase 3. |
|
||||
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Request body grows `tools = mcp.tools_schema()` (assembled from all connected sessions). on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`; tool_call payload includes id+name+arguments-delta. `M.chat` wrapper updates to buffer both. |
|
||||
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output` | New role: `"tool"`. Assistant turns may carry `tool_calls = [{id, name, arguments}]`. `to_messages()` flattens these into OpenAI-shape messages. Alternation rules: assistant-with-tool_calls is followed by N tool turns (one per call), then assistant text. |
|
||||
| `broker.lua` | Streaming `chat_stream(cfg, msgs, on_delta)` | Signature widens to `chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional array of `{type, function:{name, description, parameters}}`) is passed through to the request body; **omitted entirely if absent or empty** (some servers reject `"tools": []`). The on_delta callback widens to `on_delta(kind, payload)` where `kind ∈ {"text", "tool_call"}`. **`broker.lua` does NOT depend on `mcp.lua`** — repl assembles the tools array and passes it in; broker stays a transport layer. `M.chat` (non-streaming wrapper) is unchanged in this phase (no tool consumers go through it). |
|
||||
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
|
||||
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
|
||||
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
|
||||
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
|
||||
| `ffi/curl.lua` | post + post_sse | **No additions in v1** — analyze finding ruled out the long-lived SSE GET channel for lmcp. Phase 1 `M.post` (already does sync POST with response capture) is sufficient for MCP JSON-RPC. |
|
||||
| `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string | **One small extension**: `M.post` returns `(body, status_code)` instead of just `body` (or surfaces status via a second slot). MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. Phase 1 callers (broker.chat in its current shape) are unaffected — they ignore the second return value. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
|
||||
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
|
||||
|
||||
§4 module-layout amendment: `mcp.lua` slots between `broker.lua` and
|
||||
@@ -107,7 +107,8 @@ are out of scope here; track for v2 if a richer server appears.
|
||||
|
||||
1. `initialize` request: `{ protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "aish", version: "..." } }`.
|
||||
2. Server response (lmcp): `{ protocolVersion: "2025-03-26", capabilities: { tools: { listChanged: false } }, serverInfo: { name, version } }`.
|
||||
3. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
|
||||
3. **Version mismatch**: lmcp ignores client's `protocolVersion` and always returns its compiled-in `MCP_VERSION` (lmcp.lua:80-91). aish accepts whatever lmcp returns; on mismatch it logs a status (`[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding`) and continues. v1 has no version-gated behavior.
|
||||
4. `notifications/initialized` POST (one-way; lmcp returns HTTP 202 with no body).
|
||||
|
||||
### Tool discovery
|
||||
|
||||
@@ -118,30 +119,49 @@ are out of scope here; track for v2 if a richer server appears.
|
||||
|
||||
### Tool invocation
|
||||
|
||||
`tools/call` with `{ name, arguments }`. lmcp distinguishes two failure
|
||||
modes:
|
||||
`tools/call` with `{ name, arguments }`. Failure has three flavors and
|
||||
all of them result in **a `role:"tool"` turn being appended** so the
|
||||
assistant's `tool_calls` is never left orphaned in context (strict
|
||||
templates reject `assistant.tool_calls` without a matching `tool`
|
||||
reply — same gotcha PHASE0.md §6 warned about):
|
||||
|
||||
- **Tool-handler exception** → JSON-RPC `result` with `isError: true`
|
||||
and `content: [{ type:"text", text: "Error: ..." }]`. **Model-recoverable**:
|
||||
feed it back to the model as the next `role:"tool"` turn content and
|
||||
let it react.
|
||||
- **Unknown method or unknown tool** → JSON-RPC `error` with
|
||||
`code: -32601` ("Method not found" / "Tool not found"). **Transport
|
||||
level**: surface as an aish status, do not feed to model.
|
||||
and `content: [{ type:"text", text: "Error: ..." }]`. Feed
|
||||
`content` straight back as the `role:"tool"` turn body. Model-recoverable.
|
||||
- **Baseline `isError: false` on actual failure** (PHASE2-baseline.md §3
|
||||
found this — boltzmann's `read_file` returns content text containing
|
||||
"Error: ..." but `isError: false`). Pass content through unchanged —
|
||||
let the model read the text. `isError` is advisory, not authoritative.
|
||||
- **JSON-RPC envelope error** (e.g. `{code: -32601, message: "Tool not
|
||||
found"}`) → synthesize a `role:"tool"` turn with
|
||||
`content = "[aish] tool dispatch failed: <error.message>"` and the
|
||||
matching `tool_call_id`. Also surface a status line for the user.
|
||||
This both keeps alternation legal and tells the model what happened
|
||||
so its next plan is informed.
|
||||
- **HTTP-level failure** (auth, unreachable, timeout) → same shape:
|
||||
synthesize a `role:"tool"` turn with
|
||||
`content = "[aish] tool transport error: <reason>"`. Same alternation
|
||||
rationale.
|
||||
|
||||
This split resolves Q21.
|
||||
This split resolves Q21 (with the C5/C7 review fix folded in).
|
||||
|
||||
### Lifecycle
|
||||
|
||||
- Connect on startup (from `config.mcp.servers`) — best effort; failures
|
||||
are status-logged, don't abort aish. "Connect" here means: do the
|
||||
`initialize` round-trip + cache `tools/list` results.
|
||||
are status-logged once, don't abort aish, and the session is **absent
|
||||
from `mcp_sessions` until manually reconnected via `:mcp connect`**.
|
||||
No automatic retry. "Connect" here means: do the `initialize`
|
||||
round-trip + cache `tools/list` results.
|
||||
- `:mcp connect <url>` adds a session at runtime; alias auto-derived
|
||||
from hostname or supplied as second arg.
|
||||
- `:mcp disconnect <alias>` drops cached state. There's no long-lived
|
||||
HTTP connection to close (every RPC was already `Connection: close`).
|
||||
- On aish quit, sessions are just forgotten — nothing to clean up
|
||||
server-side.
|
||||
- An unreachable server simply contributes no tools to the broker
|
||||
request body — the model is not told that tools were "meant" to be
|
||||
available. If `tools_schema()` returns empty across all sessions, the
|
||||
broker omits the `tools` field entirely.
|
||||
|
||||
---
|
||||
|
||||
@@ -186,19 +206,31 @@ data: {"choices":[{"finish_reason":"tool_calls",...}]}
|
||||
On `finish_reason: tool_calls`, the accumulated calls are emitted to
|
||||
on_delta as `kind="tool_call"` with full payloads.
|
||||
|
||||
**Index-absent fallback**: per the OpenAI spec, `index` is REQUIRED on
|
||||
streaming `tool_calls[]` deltas — but some local llama.cpp builds have
|
||||
been reported to omit it for single-call streams. If a delta has
|
||||
`tool_calls` but no `index`, treat it as `index = 0` and accumulate
|
||||
into the slot-0 buffer. Log a one-shot debug status the first time
|
||||
this is observed per stream.
|
||||
|
||||
### Re-injection into context
|
||||
|
||||
The assistant turn carries **whatever text was streamed before
|
||||
`finish_reason: tool_calls`** (which may be non-empty — models often
|
||||
say "Sure, let me look that up" before calling). The renderer flushes
|
||||
that text first, then renders the tool-call frame around dispatch.
|
||||
|
||||
```lua
|
||||
-- After tool execution
|
||||
ctx:append({
|
||||
role = "assistant",
|
||||
content = "", -- or any model-emitted text
|
||||
content = accumulated_text, -- may be "" if model emitted no prose
|
||||
tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
|
||||
})
|
||||
ctx:append({
|
||||
role = "tool",
|
||||
tool_call_id = "call_…",
|
||||
content = <tool-result-text>,
|
||||
content = <tool-result-text-or-synthesized-error>,
|
||||
})
|
||||
```
|
||||
|
||||
@@ -207,7 +239,8 @@ strict-alternation issue from PHASE0.md §6 (mistral-nemo Jinja) is
|
||||
handled differently here — tool turns ARE expected to follow assistant
|
||||
tool_calls per the OpenAI chat-template convention. If a model's
|
||||
template still rejects this shape, fall back to the `[tool: X]` prefix
|
||||
strategy used for exec output (Q18 below).
|
||||
strategy used for exec output (Q18 below — fallback is plumbed via the
|
||||
`context.use_tool_role` flag; default `true`).
|
||||
|
||||
### Re-issuing the broker request
|
||||
|
||||
@@ -318,7 +351,11 @@ User-visible changes:
|
||||
- `CMD:` lines still work exactly as before for shell.
|
||||
|
||||
Substrate (PHASE0.md §3) invariants: unchanged. Module layout (§4)
|
||||
amended to add `mcp.lua`; that amendment ships in the manifest commit.
|
||||
amended to **add** `mcp.lua` (no rename of any existing file). Adding
|
||||
a new file is additive and preserves the §3 module-stability invariant
|
||||
("File names are stable across phases — later phases fill in bodies,
|
||||
not rename files"). The amendment ships in commit #1 of the §12 plan
|
||||
(C6 in the review).
|
||||
|
||||
`config.lua`: existing configs without an `mcp` section continue to work
|
||||
— no MCP servers means no tools sent in the broker request body, no
|
||||
@@ -360,7 +397,7 @@ Specifically out of Phase 2 scope despite proximity:
|
||||
| Q22 | aish's own command surface as an MCP server | scope expansion | **Out of Phase 2.** Parked for Phase 4+ if interest stays. |
|
||||
|
||||
Open-end carried forward to Phase 7 (verify):
|
||||
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Fix lives in boltzmann (parallel to the SSE-buffering bug tracked at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15)). Phase 7 needs the proxy fix to fully close Q18.
|
||||
- **Hossenfelder proxy `model`-field bug** (separate from aish): the proxy at `:8082` routes all requests to the loaded fast model regardless of the request's `model` field — chunks return `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"` even when `mistral-nemo-12b-instruct` was asked for. This **blocks live-verification of mistral-nemo's chat-template tool-role behavior**. Tracked as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23) (filed 2026-05-12 at review). Sibling to the SSE-buffering bug at [aish#15](https://git.reauktion.de/marfrit/aish/issues/15) — both live in the boltzmann proxy code. Phase 7 needs at least #23 fixed to fully close Q18.
|
||||
|
||||
Resolved at formulate (above in §2 table):
|
||||
- Q6 (CMD: vs tools coexistence) — both, no policy preference, substrate unchanged.
|
||||
@@ -385,13 +422,18 @@ and Phase 1 implementation cadence.
|
||||
|
||||
1. **`mcp.lua` (new file) — JSON-RPC client.** `M.connect(url, opts)`,
|
||||
`session:initialize()` + `:list_tools()` + `:call_tool(name, args)` +
|
||||
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport. Per-server
|
||||
Bearer auth (`auth_token` literal or `auth_env` indirection). Returns
|
||||
the raw `result` table for tool calls (caller distinguishes
|
||||
`isError`/content vs JSON-RPC `error`). **Test in isolation** via
|
||||
`:close()`. Uses Phase 1's `ffi/curl.M.post` for transport — **same
|
||||
commit lands the `M.post` extension to return `(body, status_code)`
|
||||
per §3 row** so `mcp.lua` can distinguish HTTP `401` (non-JSON-RPC
|
||||
body `{"error":"unauthorized"}`) from JSON-RPC envelope errors.
|
||||
Per-server Bearer auth (`auth_token` literal or `auth_env`
|
||||
indirection). `:call_tool` returns `(result_table, kind)` where
|
||||
`kind ∈ {"ok","handler_error","rpc_error"}` so callers route per
|
||||
§4. **Test in isolation** via
|
||||
`luajit -e 'local mcp=require("mcp"); local s=mcp.connect("http://boltzmann.fritz.box:8080/mcp",{auth_env="BOLTZMANN_MCP_TOKEN"}); s:initialize(); print(#s:list_tools())'`.
|
||||
Also amends PHASE0.md §4 to list `mcp.lua` between `broker.lua` and
|
||||
`router.lua` in the same commit.
|
||||
`router.lua` in the same commit (additive — preserves §3
|
||||
module-stability invariant per §9).
|
||||
|
||||
2. **`safety.lua` — confirm-gate surface.** Implement just
|
||||
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
|
||||
@@ -399,31 +441,37 @@ and Phase 1 implementation cadence.
|
||||
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
|
||||
in isolation** with mocked rl + various policy shapes.
|
||||
|
||||
3. **`context.lua` extensions.** Add `role:"tool"` support: turns can
|
||||
carry `tool_call_id`; assistant turns can carry `tool_calls = [...]`.
|
||||
`to_messages()` rendering: an assistant turn with `tool_calls` keeps
|
||||
its `content` plus the `tool_calls` field; tool turns render with
|
||||
`role:"tool"` + `tool_call_id` + `content`. Alternation policy
|
||||
widens: assistant-with-tool_calls is followed by N tool turns. No
|
||||
change to `pending_exec_output` behavior. **Test in isolation** by
|
||||
building a context and round-tripping `to_messages()`.
|
||||
3. **`context.lua` extensions.** Three concrete edits per §3 row:
|
||||
(a) loosen `Context:append`'s assert from "content required" to
|
||||
shape-per-role (assistant may have empty content if `tool_calls`
|
||||
present; `tool` requires `tool_call_id` + `content`); (b) preserve
|
||||
`tool_calls` / `tool_call_id` in stored turns (not just role+content);
|
||||
(c) extend `to_messages()` to emit those fields. Add alternation
|
||||
assert (N4 in review). `pending_exec_output` is **unchanged**:
|
||||
buffer persists across tool-call sub-loops; flushes on next genuine
|
||||
user turn (§3 row). **Tests in isolation**: (i) build a context with
|
||||
assistant+tool_calls + tool turns, round-trip through `to_messages()`,
|
||||
eyeball JSON shape; (ii) day-one fallback test (N8) — same context
|
||||
with `use_tool_role = false` must emit the `[tool: alias.name]\n…`
|
||||
prefix shape instead of a `role:"tool"` message.
|
||||
|
||||
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
|
||||
(top rule + `name(json-snippet)` indented dim) and
|
||||
`M.tool_call_end(content, is_error)` (bottom rule with dim/red status).
|
||||
Visual parity with the exec frame. **Test visually** with a one-liner.
|
||||
|
||||
5. **`broker.lua` extensions.** `chat_stream` request body grows
|
||||
`tools = caller-supplied`; we don't reach into mcp from here, the
|
||||
caller passes the assembled schema. The on_delta callback widens to
|
||||
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`. Text
|
||||
path unchanged. Tool-call path: accumulator keyed by `index`,
|
||||
concatenates `function.arguments` until `finish_reason: "tool_calls"`,
|
||||
then emits one `on_delta("tool_call", {id,name,arguments})` per
|
||||
completed call. `M.chat` (the buffering wrapper) keeps its current
|
||||
string-return shape for non-tool callers but optionally returns a
|
||||
`{text, tool_calls}` table when tools were involved. **Test against
|
||||
hossenfelder** with `tools` declared + streaming.
|
||||
5. **`broker.lua` extensions.** Signature widens:
|
||||
`chat_stream(cfg, msgs, on_delta, opts)`. `opts.tools` (optional
|
||||
array) is passed through to the request body; **omitted entirely
|
||||
when nil or empty**. The on_delta callback widens to
|
||||
`on_delta(kind, payload)` where `kind ∈ {"text","tool_call"}`.
|
||||
Text path unchanged. Tool-call path: accumulator keyed by `index`
|
||||
(default 0 if absent — C2), concatenates `function.arguments` until
|
||||
`finish_reason: "tool_calls"`, then emits one
|
||||
`on_delta("tool_call", {id,name,arguments})` per completed call.
|
||||
**`M.chat` shape unchanged** in this phase (C1 in review — no
|
||||
caller for a polymorphic return). **Test against hossenfelder**
|
||||
with `tools` declared + streaming.
|
||||
|
||||
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
|
||||
populated from `config.mcp.servers` at startup. Helpers:
|
||||
@@ -466,24 +514,34 @@ and Phase 1 implementation cadence.
|
||||
read the error text. Do NOT short-circuit on the flag.
|
||||
|
||||
- **JSON-RPC `error` from `tools/call`.** Surface as aish status
|
||||
(`[aish] mcp: alias.tool: <error.message>`); do not feed to model.
|
||||
Do NOT append a tool turn — there was no result. The model will
|
||||
re-plan on the next user turn.
|
||||
AND synthesize a `role:"tool"` turn with
|
||||
`content = "[aish] tool dispatch failed: <error.message>"` and the
|
||||
matching `tool_call_id`. The alternation rationale (§4) requires
|
||||
this — leaving the assistant's `tool_calls` orphaned breaks strict
|
||||
chat templates exactly the way PHASE0.md §6 warned about. The model
|
||||
receives the error and can re-plan within the same turn.
|
||||
|
||||
- **Tool-call sub-loop bounds.** `max_tool_depth` (default 8) per ask_ai
|
||||
invocation. When hit, surface as status and break — append the
|
||||
assistant's last text (if any) and let the user reply.
|
||||
|
||||
- **Argument JSON might be invalid.** A model can stream malformed JSON
|
||||
in `function.arguments`. `dkjson.decode` failure → status warning +
|
||||
skip that call; do NOT execute on partial parse.
|
||||
in `function.arguments`. `dkjson.decode` failure → DO NOT execute on
|
||||
partial parse. Synthesize a `role:"tool"` turn with
|
||||
`content = "[aish] tool arguments not parseable as JSON: <decode-err>"`
|
||||
and the matching `tool_call_id` (same alternation rationale as
|
||||
JSON-RPC error above; C7 in review).
|
||||
|
||||
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
|
||||
Plumb a `context.use_tool_role` flag (default true). If a real-world
|
||||
rejection appears at Phase 7, flip the flag and convert tool turns to
|
||||
`[tool: alias.name]\n<content>` prefix on the next user turn (same
|
||||
pattern as `pending_exec_output`). v1 doesn't need to test the
|
||||
fallback — just have the switch.
|
||||
pattern as `pending_exec_output`). **Day-one verification** (N8 in
|
||||
review): commit #3 includes a small in-isolation test that builds a
|
||||
context with `use_tool_role = false`, appends an assistant+tool_calls
|
||||
turn followed by a tool result, and confirms `to_messages()` emits
|
||||
the prefix shape instead of a `role:"tool"` turn. Keeps the fallback
|
||||
alive rather than dead-coded until Phase 7 first runs it under stress.
|
||||
|
||||
### Test checkpoints
|
||||
|
||||
@@ -493,30 +551,57 @@ After each commit, verify with a targeted probe before moving on:
|
||||
|---|---|
|
||||
| #1 `mcp.lua` | `luajit -e "local m=require('mcp'); ..."` connects + lists tools against boltzmann lmcp |
|
||||
| #2 `safety.lua` | unit-test policy lookup with mock rl: exact match → true; `*` glob → true; miss → prompt invoked |
|
||||
| #3 `context.lua` | round-trip a context with tool turns through `to_messages()`, eyeball JSON shape |
|
||||
| #3 `context.lua` | (i) round-trip a context with tool turns through `to_messages()`, eyeball JSON shape; (ii) day-one fallback test with `use_tool_role = false` emits the `[tool: …]` prefix shape (N8) |
|
||||
| #4 `renderer.lua` | one-liner emits frame around fake tool result |
|
||||
| #5 `broker.lua` | curl-compare: hand-built request body with tools matches `broker.chat_stream(cfg, msgs, on_delta)` body |
|
||||
| #6 `repl.lua` | full REPL: `:mcp list` shows boltzmann; question that triggers `list_dir` round-trips through confirm + execution + model continuation |
|
||||
| #7 `config.lua` | aish starts with example mcp section present; no MCP servers connected means no `tools` field sent |
|
||||
|
||||
### Commits expected: 7 (plus 1 manifest amendment if scope creeps)
|
||||
### Commits expected: 7 (commit #1 carries the PHASE0.md §4 amendment)
|
||||
|
||||
Per Phase 1's cadence (10 commits + 1 BLOCKER fix), Phase 2 is smaller
|
||||
in surface — single new file plus targeted extensions. Tracked to land
|
||||
in one working session if the boltzmann proxy bugs don't intrude.
|
||||
|
||||
### Open at plan; resolve at review
|
||||
### Resolved at review (2026-05-12)
|
||||
|
||||
- **Q18 plumbing only or default-flip?** Default `use_tool_role = true`
|
||||
and ship the fallback as dead code? Or default `false` (prefix
|
||||
injection) for safety until verify confirms? Recommend `true` —
|
||||
protocol-layer probe in analyze said it worked; Phase 7 can flip if
|
||||
reality differs.
|
||||
- **Q18 default** — `use_tool_role = true` defaulted, fallback exercised
|
||||
day-one in commit #3 test (ii) so it's not dead code. Phase 7 flips if
|
||||
mistral-nemo (once #23 is fixed) rejects.
|
||||
- **`:mcp connect` re-fetch policy** — v1 trusts the `listChanged: false`
|
||||
capability; manual disconnect+reconnect is the workaround if a server's
|
||||
tools change. No automatic re-fetch.
|
||||
|
||||
- **Should `:mcp connect` re-fetch tools on subsequent calls** (handles
|
||||
servers that misreport `listChanged:false`)? v1: no — trust the
|
||||
capability. Manual `:mcp disconnect` + `:mcp connect` is the
|
||||
workaround if a server's tools change.
|
||||
### Review fold-in (2026-05-12, all BLOCKERs + relevant CONCERNs/NITs)
|
||||
|
||||
Independent review surfaced 5 BLOCKERs / 7 CONCERNs / 8 NITs against
|
||||
the formulate+analyze+plan draft. Resolutions applied in this revision:
|
||||
|
||||
- **B1** context.lua impact widened — assert loosening + field
|
||||
preservation + `to_messages` emit are now explicit in §3.
|
||||
- **B2** `ffi/curl.M.post` extended to return `(body, status_code)` so
|
||||
`mcp.lua` distinguishes HTTP `401` from JSON-RPC envelope errors.
|
||||
- **B3** `inputSchema` typo fixed in §3 mcp.lua row.
|
||||
- **B4** `pending_exec_output` × tool-call sub-loop interaction
|
||||
specified (persists across; flushes on next user turn).
|
||||
- **B5** §3/§12 dependency contradiction resolved — broker takes
|
||||
`opts.tools` from the caller; no layering inversion.
|
||||
- **C1** `M.chat` polymorphic return dropped.
|
||||
- **C2** Index-absent fallback specified (default to 0).
|
||||
- **C3** Re-injection example now stores accumulated text in the
|
||||
assistant turn, not hard-coded empty string.
|
||||
- **C4** `:mcp connect` failure semantics specified (no auto-retry).
|
||||
- **C5/C7** Both orphan-tool_calls scenarios now synthesize a
|
||||
`role:"tool"` turn with `[aish] tool dispatch failed: ...` content
|
||||
to preserve alternation.
|
||||
- **C6** §9 explicitly notes the §4 amendment is additive.
|
||||
- **N3** protocolVersion fallback specified (lmcp doesn't negotiate).
|
||||
- **N4** alternation assert added to context.lua row.
|
||||
- **N7** model-routing bug filed as [aish#23](https://git.reauktion.de/marfrit/aish/issues/23).
|
||||
- **N8** day-one fallback test added to commit #3 checkpoints.
|
||||
|
||||
CONCERNs / NITs not folded (defended as wording-only, not load-bearing):
|
||||
N1, N2, N5, N6 — left as-is.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user