From c5116bf1293b9763e25fa9837ab2af05694c2511 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 12:34:32 +0000 Subject: [PATCH] docs/PHASE2-baseline: pre-implementation measurements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 7 (verify) anchor. Captures: - MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms on LAN; LLM is the latency floor, not the transport). - 6 fixture responses saved to /tmp/aish-baseline/ covering initialize, notifications/initialized, tools/list, tools/call success, isError, and JSON-RPC unknown-tool error. - Baseline design finding: boltzmann's read_file returns isError:false even on failure (error text in content). aish should treat content as authoritative, isError as advisory; feed both to the model. PHASE2.md §4's "pass-through" stance already accommodates; no manifest amendment needed. - Streaming tool_calls delta shape verified against hossenfelder; matches PHASE2.md §5. - Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence ignoring the CMD: contract — once MCP tools exist the model gets a structured path that doesn't depend on prose-formatting compliance. - Module pre-state at Phase 1 head 5878f73: LOC + capability snapshot per module so Phase 2 diff has a reference frame. - Two boltzmann-proxy blockers (SSE buffering, model-field routing) carried explicitly into Phase 7. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE2-baseline.md | 154 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 docs/PHASE2-baseline.md diff --git a/docs/PHASE2-baseline.md b/docs/PHASE2-baseline.md new file mode 100644 index 0000000..2678c34 --- /dev/null +++ b/docs/PHASE2-baseline.md @@ -0,0 +1,154 @@ +# Phase 2 Baseline — pre-implementation measurements + +**Date:** 2026-05-12 +**Targets probed:** lmcp v0.5.4 on `boltzmann.fritz.box:8080/mcp`; OpenAI-compat broker on `hossenfelder.fritz.box:8082`. + +This is the Phase 7 (verify) anchor — captures what the world looked like just *before* Phase 2 implementation lands, so post-implementation behavior can be compared against it. Companion to PHASE2.md (manifest). + +--- + +## 1. MCP RPC round-trip timings (cold path, single warm-up) + +| RPC | Latency | +|---|---| +| `initialize` | 19 ms | +| `notifications/initialized` (HTTP 202, no body) | 11 ms | +| `tools/list` | 17 ms | +| `tools/call` `list_dir({path:"/tmp"})` (success, ~1 KB result) | 72 ms | +| `tools/call` `read_file({path:"/nonexistent/..."})` (handler-caught failure) | 12 ms | +| `tools/call` `nope_tool` (JSON-RPC -32601 unknown tool) | 12 ms | + +LAN-local; sub-100ms for everything but a file-listing payload. Phase 2's +sequential tool-call dispatch won't be the bottleneck — the LLM is. + +--- + +## 2. Fixtures (saved to `/tmp/aish-baseline/`) + +| File | Shape | +|---|---| +| `01_initialize.json` | `{result:{protocolVersion, serverInfo:{name,version}, capabilities:{tools:{listChanged:false}}}}` | +| `02_notif_init.body` | empty (HTTP 202) | +| `03_tools_list.json` | `{result:{tools:[{name, description, inputSchema}...]}}` — 7 tools on boltzmann | +| `04_tools_call_ok.json` | `{result:{isError:false, content:[{type:"text", text:""}]}}` | +| `05_tools_call_iserror.json` | **see §3 finding** | +| `06_tools_call_unknown.json` | `{error:{code:-32601, message:"Tool not found: nope_tool"}}` | + +### Initialize response (compact) + +```json +{"id":1,"jsonrpc":"2.0","result":{ + "serverInfo":{"version":"0.1.0","name":"boltzmann-tools"}, + "protocolVersion":"2025-03-26", + "capabilities":{"tools":{"listChanged":false}}}} +``` + +### Unknown-tool error (transport-level failure) + +```json +{"id":5,"jsonrpc":"2.0","error":{ + "message":"Tool not found: nope_tool","code":-32601}} +``` + +--- + +## 3. Baseline finding: `isError` is not a complete failure signal + +`read_file({path:"/nonexistent/baseline-probe"})` returned: + +```json +{"id":4,"jsonrpc":"2.0","result":{ + "isError":false, + "content":[{"type":"text","text":"Error: could not read /nonexistent/baseline-probe"}]}} +``` + +`isError: false` despite an obvious failure. The handler caught the error and put it in `content` text but didn't set the flag. + +**Implication for Phase 2 design:** aish cannot rely solely on `result.isError` to decide success/failure of a tool call. The model must read the text content. This actually simplifies Phase 2: just feed `content` straight back as the `role:"tool"` turn body regardless of `isError`. The flag is advisory; the model is the discriminator. (No PHASE2.md amendment needed — §4's "pass-through to the model" stance already accommodates this.) + +This is a per-tool boltzmann-lmcp implementation quirk, not a spec issue. Other lmcp deployments may set `isError: true` correctly; aish should still pass content through and not crash on either shape. + +--- + +## 4. Streaming `tool_calls` delta shape (verified against hossenfelder) + +For `stream: true` requests with `tools` declared, observed deltas: + +``` +data: {"choices":[{"delta":{"role":"assistant","content":null}}]} +data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...","type":"function", + "function":{"name":"get_weather","arguments":""}}]}}]} +data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{"}}]}}]} +data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}]} +data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"city"}}]}}]} +... +data: {"choices":[{"finish_reason":"tool_calls","delta":{}}]} +data: [DONE] +``` + +Accumulator rules confirmed: +1. On the first delta containing `tool_calls[i]`: capture `id`, `type`, `function.name`. `arguments` may be empty `""`. +2. On subsequent deltas matching same `index`: concatenate `function.arguments` into the running buffer. +3. `finish_reason: "tool_calls"` closes the set; arguments buffer is parsed as JSON at that point. + +Matches PHASE2.md §5 design. + +--- + +## 5. Baseline aish behavior (pre-MCP, what Phase 1 does today) + +Sent to hossenfelder with the standard system prompt and **no `tools` field**: + +``` +user: List the files in /tmp +``` + +Response (qwen2.5-coder-1.5b via hossenfelder, sans tools): + +``` +```cmd +dir /tmp +``` +``` + +`finish_reason: stop`, `tool_calls: null`, 9 completion tokens. + +The loaded model emits Windows shell syntax in a markdown code-fence, ignoring the system prompt's `CMD:` extraction contract. **No tool_calls path is exercised today** because no tools are declared. This is the empirical "before" of Phase 2 — once MCP servers are wired and a real tool exists (`list_dir({path:"/tmp"})`), the model has a structured path that doesn't depend on getting `CMD:` formatting right. + +--- + +## 6. Known blockers carried into Phase 7 (verify) + +Both live in the **boltzmann proxy** (`hossenfelder.fritz.box:8082`), not in aish: + +| # | Bug | Affects | Tracking | +|---|---|---|---| +| 1 | SSE buffering — proxy sets `Content-Length` on `text/event-stream` and flushes the whole response at once | streaming visibility (Phase 1) AND streaming tool_calls deltas (Phase 2) | [aish#15](https://git.reauktion.de/marfrit/aish/issues/15) + [[reference-hossenfelder-sse-buffering]] | +| 2 | `model` field routing — every request returns chunks tagged `qwen2.5-coder-1.5b-q4_k_m.gguf` regardless of requested `model`, suggesting the proxy ignores the field | Phase 2 testing against mistral-nemo specifically (the strict-chat-template canary for Q18); also any `:model deep` / `:model cloud` switch | side-finding in #15 triage; needs its own issue when Phase 7 hits it | + +Phase 2 implement/verify will proceed against whatever model is loaded. +Full template-strictness verification of Q18 (`role:"tool"` acceptance on +mistral-nemo) waits for bug #2 to be fixed in the boltzmann proxy code. + +--- + +## 7. Module pre-state (Phase 1 head: `5878f73`) + +| Module | LOC (incl. comments) | State | +|---|---|---| +| `broker.lua` | 92 | chat + chat_stream, no `tools` field | +| `context.lua` | (per Phase 1) | `pending_exec_output` buffer; no `role:"tool"`; no `tool_calls` on assistant turns | +| `executor.lua` | (per Phase 1) | PTY-backed, `CMD:` extract, no tool dispatch | +| `repl.lua` | 287 | meta cmds, ask_ai stream loop, no `:mcp …`, no tool-call sub-loop | +| `renderer.lua` | 79 | exec frame, streaming text; no tool-call frame | +| `safety.lua` | (per PHASE0 §4) | stub — only the file exists | +| `mcp.lua` | — | does not exist yet | +| `config.lua` | (per user's edits) | models registry; no `mcp = { servers = {...} }` section | + +After Phase 2 lands, `git diff main..post-phase-2 --stat` should show: +new `mcp.lua` (substantial), modest growth in `broker.lua` / `context.lua` / +`repl.lua` / `renderer.lua`, finally non-stub `safety.lua`. + +--- + +*End of Phase 2 Baseline — aish*