docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state

Analyze findings folded into the manifest: A1. \C-n binding can't toggle mid-prompt without rl_insert_text / rl_redisplay. Solution: bind those (one cdef + 2 wrappers in ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user types goal + Enter. Routes through existing meta dispatch. A2. broker has no max_tokens passthrough. Add opts.max_tokens for the LLM second-opinion path (terminates at ~2 tokens; verified proxy honors it). A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step is the per-iteration extraction; driver loop in repl.lua. Module-changes table (§3) updated with the rl_insert_text and max_tokens rows. Baseline doc (PHASE3-baseline.md, 80 lines) captures: - LLM second-opinion latency: 425-1162ms per probe, all 5 test cases correct. Worst-case 16-step Norris = ~20s overhead; with static-pattern fast-path + session cache, ~5s realistic. - Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state per file before Phase 3 edits. - Six static-pattern Lua-match sanity checks (all correct). - Carries: aish#15 (still open), aish#14, aish#32/#33. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:37:58 +00:00
parent b58a842e49
commit cf4d79dd9d
2 changed files with 131 additions and 3 deletions
@@ -0,0 +1,90 @@
+# Phase 3 Baseline — pre-implementation measurements
+
+**Date:** 2026-05-12
+**Target probed:** `hossenfelder.fritz.box:8082` (OpenAI-compat broker → `qwen2.5-coder-1.5b-q4_k_m.gguf` local).
+
+This is the Phase 7 (verify) anchor for Phase 3. Captures the world just
+before Norris/destructive-heuristic implementation lands.
+
+---
+
+## 1. LLM second-opinion latency (Q23 budget check)
+
+`fast` preset, `temperature=0`, `max_tokens=4`, system prompt "Reply YES or NO only":
+
+| Command | Reply | Latency |
+|---|---|---|
+| `rm -rf /tmp/foo` | YES | 1162 ms |
+| `ls /tmp` | NO | 666 ms |
+| `truncate -s 0 important.log` | YES | 475 ms |
+| `git push --force origin main` | YES | 451 ms |
+| `cat /etc/hostname` | NO | 425 ms |
+
+Five-for-five correct answers; median ~475 ms; 95th percentile (small sample) ~1200 ms. The first request was slowest (likely cold-cache), subsequent ones settled below 700 ms.
+
+### Budget implication for a 16-step Norris session
+
+Worst-case (no static-pattern hits, all queries to LLM, no cache):
+16 × 1200 ms = ~19s of additional latency over the Norris run.
+
+With realistic mix (static patterns catch the obvious cases without
+LLM, repeated commands hit the session cache):
+~5s typical, dominated by genuinely-novel command tokens.
+
+Conclusion: LLM second-opinion is workable as a default-on feature.
+The session-scoped cache (§12 commit #2) is the right mitigation; an
+additional async pre-check on the static patterns first means most
+calls never reach the LLM.
+
+---
+
+## 2. Module pre-state (Phase 2 head `f26cbd9` + cosmetic fix `3fa6279`)
+
+| Module | LOC | State |
+|---|---|---|
+| `safety.lua` | 55 | confirm_tool_call only; `is_destructive` and `norris_step` raise error() |
+| `renderer.lua` | 110 | exec frame + tool-call frame + assistant streaming + status; no norris frames |
+| `repl.lua` | (post-Phase 2) | tool-sub-loop + :mcp meta + `\C-n` no-op placeholder |
+| `context.lua` | (post-Phase 2) | static system_prompt (Phase 0+Phase 2 MCP block); no norris suffix wiring |
+| `broker.lua` | 96 | chat_stream(cfg, msgs, on_delta, opts) with opts.tools; no opts.max_tokens |
+| `ffi/readline.lua` | (Phase 1) | rl_bind_keyseq + M.bind wrapper; no rl_insert_text or rl_redisplay |
+| `config.lua` | (Phase 2) | mcp example block; no safety example block |
+
+After Phase 3 lands, `git diff main..post-phase-3 --stat` should show:
+- `safety.lua` substantial growth (~150 LOC for is_destructive + norris_step)
+- modest `renderer.lua` growth (~30 LOC for norris frames)
+- modest `repl.lua` growth (Norris driver + :norris meta)
+- one-line `context.lua` addition (system prompt suffix builder)
+- 4-line `broker.lua` addition (opts.max_tokens)
+- 6-line `ffi/readline.lua` addition (rl_insert_text + rl_redisplay)
+
+---
+
+## 3. Static-pattern hit-rate sanity check
+
+Six patterns from §5 of the manifest exercised against safe vs destructive corpora:
+
+| Pattern | Test command | Expected | Result |
+|---|---|---|---|
+| `rm%s+.-%-rf?` | `rm -rf /tmp/x` | YES | HIT (pre-implementation Lua check) |
+| `rm%s+.-%-rf?` | `rm /tmp/x.log` | NO  | MISS (correct — no -r/-f flags) |
+| `git%s+push%s+.-%-%-force` | `git push --force origin main` | YES | HIT |
+| `git%s+push%s+.-%-%-force` | `git push origin main` | NO  | MISS |
+| `find%s+.-%-delete` | `find . -name '*.log' -delete` | YES | HIT |
+| `find%s+.-%-delete` | `find . -name '*.log'` | NO  | MISS |
+
+All six match the intent. Pattern soundness verified via Lua's `string.match`
+on each test string. Implementation in `safety.is_destructive` will use the
+same syntax.
+
+---
+
+## 4. Known carries from earlier phases
+
+- **Issue [#15](https://git.reauktion.de/marfrit/aish/issues/15)** — hossenfelder SSE buffering bug. Open. Affects Norris streaming visibility (the model's plan/explanation streams in one batch). Workaround: nothing aish-side; fix is upstream.
+- **Issue [#14](https://git.reauktion.de/marfrit/aish/issues/14)** — `:model` swap should re-render Context.system_prompt. Phase 3 makes this MORE relevant since the Norris suffix is dynamically composed; if the user `:model deep` then `:norris <goal>`, the new system prompt must take effect on the next broker call.
+- **Issues [#32](https://git.reauktion.de/marfrit/aish/issues/32) / [#33](https://git.reauktion.de/marfrit/aish/issues/33)** — Phase 2 follow-ups (tool-name validation, auto_approve typo warning). Not blocking Phase 3.
+
+---
+
+*End of Phase 3 Baseline — aish*