phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics

Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 20:04:57 +00:00
parent 3fa6279f5b
commit f26cbd9a3a
5 changed files with 89 additions and 49 deletions
+31 -18
View File
@@ -2,9 +2,22 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 2 Requirements, Architecture & Design Decisions
**Status:** Plan (review pass folded in 2026-05-12)
**Status:** Verify (Phase 7) — implementation complete; live testing in progress
**Date:** 2026-05-12
**Amendments since formulate:**
- 2026-05-12 (review fold-in): see §12 "Review fold-in" subsection.
- 2026-05-12 (Phase 7 verify, separator switch): tool-name namespace
delimiter changed from `.` to `__` because Anthropic via Bedrock
validates tool names against `^[a-zA-Z0-9_-]{1,128}$` — dots are
rejected with `HTTP 400 tools.0.custom.name: String should match
pattern '...'`. Discovered when `:model cloud` exercised TC #26
against the real cloud path. Internal API matches on-wire shape so
there's no transformation layer. Constraint: aliases must not
themselves contain `__` so the parse stays unambiguous (leftmost
`__` is the split point). Tool names from MCP servers may contain
underscores freely. All §3/§5/§6/§7/§12 references updated below.
PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
manifest specifies what Phase 2 adds. Section numbers reference back to
PHASE0.md / PHASE1.md where relevant.
@@ -18,7 +31,7 @@ Three pillars per PHASE0.md §11 row 2:
1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
Target reference implementation: `lmcp`. Operations needed for v1:
`initialize`, `tools/list`, `tools/call`. Multiple servers may be
connected concurrently; tools are namespaced `<server>.<tool>`.
connected concurrently; tools are namespaced `<server>__<tool>`.
2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
`tools` in the request body; the model emits `tool_calls` in the
response; `mcp.lua` dispatches each call to the right server; the
@@ -53,7 +66,7 @@ Three pillars per PHASE0.md §11 row 2:
| MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
| MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
| Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
| Tool namespacing | `<server-alias>__<tool-name>` for both the wire-level tool name and `:mcp tools` listing (was `.` at formulate; switched 2026-05-12 — see Amendments above) | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. `__` (two underscores) is within Bedrock's tool-name regex `^[a-zA-Z0-9_-]{1,128}$` whereas `.` is not. Aliases must not themselves contain `__`. |
| `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
| Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
| System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
@@ -73,7 +86,7 @@ Three pillars per PHASE0.md §11 row 2:
| `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
| `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
| `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias__tool"] = true } }`. Documented in §6 below. |
| `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string. `ffi.cdef` exposes only `curl_easy_setopt` — no `curl_easy_getinfo` (cdef block at curl.lua:11-28). | **One small extension**: `M.post` returns **`(body, status_code)` on transport success** (status_code may be non-2xx — caller decides what to do; mcp.lua treats `>= 400` as transport failure). `(nil, errmsg)` on libcurl-level failure is **unchanged** — Phase 1 callers that read only the first slot stay correct. Requires adding `curl_easy_getinfo` + `CURLINFO_RESPONSE_CODE` (decimal 2097154, `CURLINFOTYPE_LONG | 2`) to the `ffi.cdef` block, plus a `long[1]` out-param shim. MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
| `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |
@@ -185,7 +198,7 @@ This split resolves Q21 (with the C5/C7 review fix folded in).
"temperature": 0.2,
"tools": [
{ "type":"function",
"function": { "name":"<alias>.<tool>",
"function": { "name":"<alias>__<tool>",
"description":"...",
"parameters": <inputSchema> } },
...
@@ -195,7 +208,7 @@ This split resolves Q21 (with the C5/C7 review fix folded in).
The `tools` array is assembled by `mcp.tools_schema()` — flattens
`tools/list` results from every connected session, namespacing each tool
as `<alias>.<name>`.
as `<alias>__<name>`.
### Response handling (streaming)
@@ -203,7 +216,7 @@ llama.cpp / OpenAI deltas may include:
```json
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
"function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
"function":{"name":"alias__tool","arguments":"{\"a\":"}}]}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,
"function":{"arguments":"1}"}}]}}]}
data: {"choices":[{"finish_reason":"tool_calls",...}]}
@@ -233,7 +246,7 @@ that text first, then renders the tool-call frame around dispatch.
ctx:append({
role = "assistant",
content = accumulated_text, -- may be "" if model emitted no prose
tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
tool_calls = { {id="call_…", name="alias__tool", arguments=<json-string>} },
})
ctx:append({
role = "tool",
@@ -269,7 +282,7 @@ Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
function M.confirm_tool_call(name, args, cfg)
local policy = cfg.mcp and cfg.mcp.auto_approve or {}
if policy[name] then return true end
-- Per-server prefix check: "alias.*" entries
-- Per-server prefix check: "alias__*" entries
local alias = name:match("^([^.]+)%.")
if alias and policy[alias .. ".*"] then return true end
-- Otherwise prompt
@@ -298,8 +311,8 @@ mcp = {
},
},
auto_approve = {
["boltzmann.read_file"] = true, -- specific tool
["broglie.*"] = true, -- whole server
["boltzmann__read_file"] = true, -- specific tool
["broglie__*"] = true, -- whole server
},
max_tool_depth = 8,
}
@@ -321,8 +334,8 @@ heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
| `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
| `:mcp disconnect <alias>` | Close one session |
| `:mcp list` | Show connected sessions (alias, url, tool count, status) |
| `:mcp tools` | List tools across all sessions (`alias.name` — short description) |
| `:mcp tool <alias.name>` | Show one tool's full inputSchema (debug aid) |
| `:mcp tools` | List tools across all sessions (`alias__name` — short description) |
| `:mcp tool <alias__name>` | Show one tool's full inputSchema (debug aid) |
Existing `:help` updated to list these.
@@ -445,7 +458,7 @@ and Phase 1 implementation cadence.
2. **`safety.lua` — confirm-gate surface.** Implement just
`M.confirm_tool_call(name, args, cfg)` per §6. Reads
`cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
`cfg.mcp.auto_approve` for exact-match and `alias__*` glob. Falls back
to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
in isolation** with mocked rl + various policy shapes.
@@ -460,7 +473,7 @@ and Phase 1 implementation cadence.
user turn (§3 row). **Tests in isolation**: (i) build a context with
assistant+tool_calls + tool turns, round-trip through `to_messages()`,
eyeball JSON shape; (ii) day-one fallback test (N8) — same context
with `use_tool_role = false` must emit the `[tool: alias.name]\n…`
with `use_tool_role = false` must emit the `[tool: alias__name]\n…`
prefix shape instead of a `role:"tool"` message.
4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
@@ -483,8 +496,8 @@ and Phase 1 implementation cadence.
6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
populated from `config.mcp.servers` at startup. Helpers:
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
- `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
- `tools_schema()` → flatten `tool` lists across sessions, namespace `alias__name`
- `dispatch_tool_call(call)` → split `alias__tool`, look up session, call, return content
- `ask_ai` loop now: stream response → if any tool_calls completed,
for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
@@ -543,7 +556,7 @@ and Phase 1 implementation cadence.
- **Q18 fallback path** (strict templates rejecting `role:"tool"`).
Plumb a `context.use_tool_role` flag (default true). If a real-world
rejection appears at Phase 7, flip the flag and convert tool turns to
`[tool: alias.name]\n<content>` prefix on the next user turn (same
`[tool: alias__name]\n<content>` prefix on the next user turn (same
pattern as `pending_exec_output`). **Day-one verification** (N8 in
review): commit #3 includes a small in-isolation test that builds a
context with `use_tool_role = false`, appends an assistant+tool_calls