phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics

Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
parent 3fa6279f5b
commit f26cbd9a3a
5 changed files with 89 additions and 49 deletions
@@ -2,9 +2,22 @@

 **Project:** aish — AI-augmented conversational shell
 **Document:** Phase 2 Requirements, Architecture & Design Decisions
-**Status:** Plan (review pass folded in 2026-05-12)
+**Status:** Verify (Phase 7) — implementation complete; live testing in progress
 **Date:** 2026-05-12

+**Amendments since formulate:**
+- 2026-05-12 (review fold-in): see §12 "Review fold-in" subsection.
+- 2026-05-12 (Phase 7 verify, separator switch): tool-name namespace
+  delimiter changed from `.` to `__` because Anthropic via Bedrock
+  validates tool names against `^[a-zA-Z0-9_-]{1,128}$` — dots are
+  rejected with `HTTP 400 tools.0.custom.name: String should match
+  pattern '...'`. Discovered when `:model cloud` exercised TC #26
+  against the real cloud path. Internal API matches on-wire shape so
+  there's no transformation layer. Constraint: aliases must not
+  themselves contain `__` so the parse stays unambiguous (leftmost
+  `__` is the split point). Tool names from MCP servers may contain
+  underscores freely. All §3/§5/§6/§7/§12 references updated below.
+
 PHASE0.md is the locked substrate; PHASE1.md is layered on top. This
 manifest specifies what Phase 2 adds. Section numbers reference back to
 PHASE0.md / PHASE1.md where relevant.
@@ -18,7 +31,7 @@ Three pillars per PHASE0.md §11 row 2:
 1. **MCP client** (`mcp.lua`) — JSON-RPC 2.0 over HTTP+SSE transport.
   Target reference implementation: `lmcp`. Operations needed for v1:
   `initialize`, `tools/list`, `tools/call`. Multiple servers may be
-   connected concurrently; tools are namespaced `<server>.<tool>`.
+   connected concurrently; tools are namespaced `<server>__<tool>`.
 2. **Tool-calling protocol bridge** — the broker sends OpenAI-compatible
   `tools` in the request body; the model emits `tool_calls` in the
   response; `mcp.lua` dispatches each call to the right server; the
@@ -53,7 +66,7 @@ Three pillars per PHASE0.md §11 row 2:
 | MCP protocol version | `2025-03-26` (confirmed by live probe of boltzmann:8080/mcp) | lmcp pins this in `MCP_VERSION` and **does not negotiate** — it returns its compiled-in version regardless of what the client sends (lmcp.lua:80-91). aish sends `2025-03-26` in `initialize` and accepts whatever the server returns; on mismatch it logs `[aish] mcp <alias>: protocol version mismatch (sent X, got Y); proceeding` and continues. v1 has no version-gated behavior to abort on. |
 | MCP auth | Bearer token via `Authorization: Bearer <token>` header, per-server | Analyze finding: every lmcp deployment in mfritsche's fleet (boltzmann/hertz/pve*/nc/etc.) requires Bearer auth. Phase 2 config supports `auth_token` literal and `auth_env` env-var indirection per server (mirrors `key_env` in the models registry). lmcp servers without auth (broglie/higgs LAN-only) just leave the field nil. |
 | Tool-call wire format | OpenAI `tools` field on `/v1/chat/completions` body; `tool_calls` on assistant deltas; `role:"tool"` turn with `tool_call_id` for results | Standard, supported by llama.cpp and OpenRouter. Aligns with the existing `/v1/chat/completions` substrate invariant. |
-| Tool namespacing | `<server-alias>.<tool-name>` for both the wire-level tool name and `:mcp tools` listing | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. |
+| Tool namespacing | `<server-alias>__<tool-name>` for both the wire-level tool name and `:mcp tools` listing (was `.` at formulate; switched 2026-05-12 — see Amendments above) | Avoids name collisions across servers. The alias comes from the config key or the connect URL hash. `__` (two underscores) is within Bedrock's tool-name regex `^[a-zA-Z0-9_-]{1,128}$` whereas `.` is not. Aliases must not themselves contain `__`. |
 | `CMD:` coexistence with tool-calls | Both stay live, no policy preference. Substrate invariant §3 unchanged. | Resolves Q6 (see §10). `CMD:` is the local-shell route; MCP tools are structured-API routes; they serve different purposes. Future phases (Norris, Phase 3) may prefer tools when both are available, but Phase 2 doesn't enforce. |
 | Authorization default | Per-call confirm (mirrors PHASE0.md §10 `confirm_cmd` for shell) | Conservative default; user can opt into auto-approval per tool or per server via config. Resolves Q8. |
 | System prompt augmentation | Hybrid: static frame in `broker.lua` system prompt + dynamic `tools` array in the request body | Tool list goes in the API field where it belongs; the system prompt only mentions that tools exist and how to use them. Per-request body cost is bounded (tools change rarely; small schemas). Resolves Q9. |
@@ -73,7 +86,7 @@ Three pillars per PHASE0.md §11 row 2:
 | `context.lua` | turns = {{role, content}, ...} + `pending_exec_output`; `Context:append` asserts `turn.content` and rebuilds the entry as `{role, content}` only — extra fields are dropped | Three concrete edits: (a) **loosen `:append`** so `role == "assistant"` can carry `tool_calls = [{id, name, arguments}]` with `content` allowed empty, and `role == "tool"` requires `tool_call_id` + `content` (the assert moves from "content required" to "shape per role"); (b) **preserve `tool_calls` and `tool_call_id`** in the stored turn (not just role+content); (c) `to_messages()` emits `tool_calls` on assistant turns and `tool_call_id` on tool turns. Add a debug assertion that `role == "tool"` follows an assistant turn with non-empty `tool_calls` (catches design bugs early; N4 in review). **`pending_exec_output` interaction**: the buffer **persists across the tool-call sub-loop** (the loop is internal — no user input happens — so there's no append_user to flush against). It flushes on the next genuine user turn, regardless of how many tool-call iterations preceded. |
 | `repl.lua` | meta cmds + ask_ai stream loop | After ask_ai sees `tool_calls`, enter a tool-execution sub-loop: confirm-gate each call via `safety.confirm_tool_call`, dispatch via `mcp.session:call_tool`, append tool turn to context, re-issue the broker request. Loop until assistant emits text without tool_calls. New meta: `:mcp connect <url> [alias]`, `:mcp list`, `:mcp tools`, `:mcp disconnect <alias>`. |
 | `renderer.lua` | streaming text + exec frame | Add `tool_call_begin(name, args)`, `tool_call_end(result, ok)`. Visual style: indented, dim, parallel to the exec frame. |
-| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias.tool"] = true } }`. Documented in §10 below. |
+| `config.lua` | example with models/shell/context/history | Schema additions: `mcp = { servers = { alias = { url = "..." } }, auto_approve = { ["alias__tool"] = true } }`. Documented in §6 below. |
 | `ffi/curl.lua` | post + post_sse; `M.post` does not set `FAILONERROR`, so non-2xx responses return the body as a normal string. `ffi.cdef` exposes only `curl_easy_setopt` — no `curl_easy_getinfo` (cdef block at curl.lua:11-28). | **One small extension**: `M.post` returns **`(body, status_code)` on transport success** (status_code may be non-2xx — caller decides what to do; mcp.lua treats `>= 400` as transport failure). `(nil, errmsg)` on libcurl-level failure is **unchanged** — Phase 1 callers that read only the first slot stay correct. Requires adding `curl_easy_getinfo` + `CURLINFO_RESPONSE_CODE` (decimal 2097154, `CURLINFOTYPE_LONG | 2`) to the `ffi.cdef` block, plus a `long[1]` out-param shim. MCP auth failures from lmcp arrive as HTTP `401` with a non-JSON-RPC body (`{"error":"unauthorized"}`); `mcp.lua` must distinguish HTTP-level failure from JSON-RPC envelope errors. No SSE GET channel is added (analyze finding ruled it out for lmcp). |
 | `history.lua` | JSONL session log | Tool turns are logged like any other turn — `{role:"tool", tool_call_id:"...", content:"..."}`. Resume reconstructs them via `ctx:append` like user/assistant turns. |

@@ -185,7 +198,7 @@ This split resolves Q21 (with the C5/C7 review fix folded in).
  "temperature": 0.2,
  "tools": [
    { "type":"function",
-      "function": { "name":"<alias>.<tool>",
+      "function": { "name":"<alias>__<tool>",
                    "description":"...",
                    "parameters": <inputSchema> } },
    ...
@@ -195,7 +208,7 @@ This split resolves Q21 (with the C5/C7 review fix folded in).

 The `tools` array is assembled by `mcp.tools_schema()` — flattens
 `tools/list` results from every connected session, namespacing each tool
-as `<alias>.<name>`.
+as `<alias>__<name>`.

 ### Response handling (streaming)

@@ -203,7 +216,7 @@ llama.cpp / OpenAI deltas may include:

 ```json
 data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_…",
-                "function":{"name":"alias.tool","arguments":"{\"a\":"}}]}}]}
+                "function":{"name":"alias__tool","arguments":"{\"a\":"}}]}}]}
 data: {"choices":[{"delta":{"tool_calls":[{"index":0,
                "function":{"arguments":"1}"}}]}}]}
 data: {"choices":[{"finish_reason":"tool_calls",...}]}
@@ -233,7 +246,7 @@ that text first, then renders the tool-call frame around dispatch.
 ctx:append({
  role = "assistant",
  content = accumulated_text,    -- may be "" if model emitted no prose
-  tool_calls = { {id="call_…", name="alias.tool", arguments=<json-string>} },
+  tool_calls = { {id="call_…", name="alias__tool", arguments=<json-string>} },
 })
 ctx:append({
  role = "tool",
@@ -269,7 +282,7 @@ Hit-cap surfaces as a status: `[aish] tool-call depth limit reached`.
 function M.confirm_tool_call(name, args, cfg)
    local policy = cfg.mcp and cfg.mcp.auto_approve or {}
    if policy[name] then return true end
-    -- Per-server prefix check: "alias.*" entries
+    -- Per-server prefix check: "alias__*" entries
    local alias = name:match("^([^.]+)%.")
    if alias and policy[alias .. ".*"] then return true end
    -- Otherwise prompt
@@ -298,8 +311,8 @@ mcp = {
        },
    },
    auto_approve = {
-        ["boltzmann.read_file"] = true,    -- specific tool
-        ["broglie.*"]           = true,    -- whole server
+        ["boltzmann__read_file"] = true,    -- specific tool
+        ["broglie__*"]           = true,    -- whole server
    },
    max_tool_depth = 8,
 }
@@ -321,8 +334,8 @@ heuristic decides; for non-destructive tools, auto_approve. Outside scope here.
 | `:mcp connect <url> [<alias>]` | Open a session; perform initialize + tools/list; add to active set |
 | `:mcp disconnect <alias>` | Close one session |
 | `:mcp list` | Show connected sessions (alias, url, tool count, status) |
-| `:mcp tools` | List tools across all sessions (`alias.name` — short description) |
-| `:mcp tool <alias.name>` | Show one tool's full inputSchema (debug aid) |
+| `:mcp tools` | List tools across all sessions (`alias__name` — short description) |
+| `:mcp tool <alias__name>` | Show one tool's full inputSchema (debug aid) |

 Existing `:help` updated to list these.

@@ -445,7 +458,7 @@ and Phase 1 implementation cadence.

 2. **`safety.lua` — confirm-gate surface.** Implement just
   `M.confirm_tool_call(name, args, cfg)` per §6. Reads
-   `cfg.mcp.auto_approve` for exact-match and `alias.*` glob. Falls back
+   `cfg.mcp.auto_approve` for exact-match and `alias__*` glob. Falls back
   to `rl.readline` prompt. Norris-mode hooks stay out (Phase 3). **Test
   in isolation** with mocked rl + various policy shapes.

@@ -460,7 +473,7 @@ and Phase 1 implementation cadence.
   user turn (§3 row). **Tests in isolation**: (i) build a context with
   assistant+tool_calls + tool turns, round-trip through `to_messages()`,
   eyeball JSON shape; (ii) day-one fallback test (N8) — same context
-   with `use_tool_role = false` must emit the `[tool: alias.name]\n…`
+   with `use_tool_role = false` must emit the `[tool: alias__name]\n…`
   prefix shape instead of a `role:"tool"` message.

 4. **`renderer.lua` extensions.** Add `M.tool_call_begin(name, args)`
@@ -483,8 +496,8 @@ and Phase 1 implementation cadence.

 6. **`repl.lua` wiring.** New module-local `mcp_sessions = {alias=session,...}`,
   populated from `config.mcp.servers` at startup. Helpers:
-   - `tools_schema()` → flatten `tool` lists across sessions, namespace `alias.name`
-   - `dispatch_tool_call(call)` → split `alias.tool`, look up session, call, return content
+   - `tools_schema()` → flatten `tool` lists across sessions, namespace `alias__name`
+   - `dispatch_tool_call(call)` → split `alias__tool`, look up session, call, return content
   - `ask_ai` loop now: stream response → if any tool_calls completed,
     for each call: `safety.confirm_tool_call` → `dispatch_tool_call` →
     append assistant-with-tool_calls + tool turn → re-call `broker.chat_stream`
@@ -543,7 +556,7 @@ and Phase 1 implementation cadence.
 - **Q18 fallback path** (strict templates rejecting `role:"tool"`).
  Plumb a `context.use_tool_role` flag (default true). If a real-world
  rejection appears at Phase 7, flip the flag and convert tool turns to
-  `[tool: alias.name]\n<content>` prefix on the next user turn (same
+  `[tool: alias__name]\n<content>` prefix on the next user turn (same
  pattern as `pending_exec_output`). **Day-one verification** (N8 in
  review): commit #3 includes a small in-isolation test that builds a
  context with `use_tool_role = false`, appends an assistant+tool_calls