Files
aish/docs/PHASE3-baseline.md
marfrit cf4d79dd9d docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state
Analyze findings folded into the manifest:

  A1. \C-n binding can't toggle mid-prompt without rl_insert_text /
      rl_redisplay. Solution: bind those (one cdef + 2 wrappers in
      ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user
      types goal + Enter. Routes through existing meta dispatch.

  A2. broker has no max_tokens passthrough. Add opts.max_tokens for
      the LLM second-opinion path (terminates at ~2 tokens; verified
      proxy honors it).

  A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step
      is the per-iteration extraction; driver loop in repl.lua.

Module-changes table (§3) updated with the rl_insert_text and
max_tokens rows.

Baseline doc (PHASE3-baseline.md, 80 lines) captures:
  - LLM second-opinion latency: 425-1162ms per probe, all 5 test
    cases correct. Worst-case 16-step Norris = ~20s overhead; with
    static-pattern fast-path + session cache, ~5s realistic.
  - Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state
    per file before Phase 3 edits.
  - Six static-pattern Lua-match sanity checks (all correct).
  - Carries: aish#15 (still open), aish#14, aish#32/#33.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:37:58 +00:00

91 lines
4.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 3 Baseline — pre-implementation measurements
**Date:** 2026-05-12
**Target probed:** `hossenfelder.fritz.box:8082` (OpenAI-compat broker → `qwen2.5-coder-1.5b-q4_k_m.gguf` local).
This is the Phase 7 (verify) anchor for Phase 3. Captures the world just
before Norris/destructive-heuristic implementation lands.
---
## 1. LLM second-opinion latency (Q23 budget check)
`fast` preset, `temperature=0`, `max_tokens=4`, system prompt "Reply YES or NO only":
| Command | Reply | Latency |
|---|---|---|
| `rm -rf /tmp/foo` | YES | 1162 ms |
| `ls /tmp` | NO | 666 ms |
| `truncate -s 0 important.log` | YES | 475 ms |
| `git push --force origin main` | YES | 451 ms |
| `cat /etc/hostname` | NO | 425 ms |
Five-for-five correct answers; median ~475 ms; 95th percentile (small sample) ~1200 ms. The first request was slowest (likely cold-cache), subsequent ones settled below 700 ms.
### Budget implication for a 16-step Norris session
Worst-case (no static-pattern hits, all queries to LLM, no cache):
16 × 1200 ms = ~19s of additional latency over the Norris run.
With realistic mix (static patterns catch the obvious cases without
LLM, repeated commands hit the session cache):
~5s typical, dominated by genuinely-novel command tokens.
Conclusion: LLM second-opinion is workable as a default-on feature.
The session-scoped cache (§12 commit #2) is the right mitigation; an
additional async pre-check on the static patterns first means most
calls never reach the LLM.
---
## 2. Module pre-state (Phase 2 head `f26cbd9` + cosmetic fix `3fa6279`)
| Module | LOC | State |
|---|---|---|
| `safety.lua` | 55 | confirm_tool_call only; `is_destructive` and `norris_step` raise error() |
| `renderer.lua` | 110 | exec frame + tool-call frame + assistant streaming + status; no norris frames |
| `repl.lua` | (post-Phase 2) | tool-sub-loop + :mcp meta + `\C-n` no-op placeholder |
| `context.lua` | (post-Phase 2) | static system_prompt (Phase 0+Phase 2 MCP block); no norris suffix wiring |
| `broker.lua` | 96 | chat_stream(cfg, msgs, on_delta, opts) with opts.tools; no opts.max_tokens |
| `ffi/readline.lua` | (Phase 1) | rl_bind_keyseq + M.bind wrapper; no rl_insert_text or rl_redisplay |
| `config.lua` | (Phase 2) | mcp example block; no safety example block |
After Phase 3 lands, `git diff main..post-phase-3 --stat` should show:
- `safety.lua` substantial growth (~150 LOC for is_destructive + norris_step)
- modest `renderer.lua` growth (~30 LOC for norris frames)
- modest `repl.lua` growth (Norris driver + :norris meta)
- one-line `context.lua` addition (system prompt suffix builder)
- 4-line `broker.lua` addition (opts.max_tokens)
- 6-line `ffi/readline.lua` addition (rl_insert_text + rl_redisplay)
---
## 3. Static-pattern hit-rate sanity check
Six patterns from §5 of the manifest exercised against safe vs destructive corpora:
| Pattern | Test command | Expected | Result |
|---|---|---|---|
| `rm%s+.-%-rf?` | `rm -rf /tmp/x` | YES | HIT (pre-implementation Lua check) |
| `rm%s+.-%-rf?` | `rm /tmp/x.log` | NO | MISS (correct — no -r/-f flags) |
| `git%s+push%s+.-%-%-force` | `git push --force origin main` | YES | HIT |
| `git%s+push%s+.-%-%-force` | `git push origin main` | NO | MISS |
| `find%s+.-%-delete` | `find . -name '*.log' -delete` | YES | HIT |
| `find%s+.-%-delete` | `find . -name '*.log'` | NO | MISS |
All six match the intent. Pattern soundness verified via Lua's `string.match`
on each test string. Implementation in `safety.is_destructive` will use the
same syntax.
---
## 4. Known carries from earlier phases
- **Issue [#15](https://git.reauktion.de/marfrit/aish/issues/15)** — hossenfelder SSE buffering bug. Open. Affects Norris streaming visibility (the model's plan/explanation streams in one batch). Workaround: nothing aish-side; fix is upstream.
- **Issue [#14](https://git.reauktion.de/marfrit/aish/issues/14)** — `:model` swap should re-render Context.system_prompt. Phase 3 makes this MORE relevant since the Norris suffix is dynamically composed; if the user `:model deep` then `:norris <goal>`, the new system prompt must take effect on the next broker call.
- **Issues [#32](https://git.reauktion.de/marfrit/aish/issues/32) / [#33](https://git.reauktion.de/marfrit/aish/issues/33)** — Phase 2 follow-ups (tool-name validation, auto_approve typo warning). Not blocking Phase 3.
---
*End of Phase 3 Baseline — aish*