Analyze findings folded into the manifest:
A1. \C-n binding can't toggle mid-prompt without rl_insert_text /
rl_redisplay. Solution: bind those (one cdef + 2 wrappers in
ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user
types goal + Enter. Routes through existing meta dispatch.
A2. broker has no max_tokens passthrough. Add opts.max_tokens for
the LLM second-opinion path (terminates at ~2 tokens; verified
proxy honors it).
A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step
is the per-iteration extraction; driver loop in repl.lua.
Module-changes table (§3) updated with the rl_insert_text and
max_tokens rows.
Baseline doc (PHASE3-baseline.md, 80 lines) captures:
- LLM second-opinion latency: 425-1162ms per probe, all 5 test
cases correct. Worst-case 16-step Norris = ~20s overhead; with
static-pattern fast-path + session cache, ~5s realistic.
- Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state
per file before Phase 3 edits.
- Six static-pattern Lua-match sanity checks (all correct).
- Carries: aish#15 (still open), aish#14, aish#32/#33.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.3 KiB
Phase 3 Baseline — pre-implementation measurements
Date: 2026-05-12
Target probed: hossenfelder.fritz.box:8082 (OpenAI-compat broker → qwen2.5-coder-1.5b-q4_k_m.gguf local).
This is the Phase 7 (verify) anchor for Phase 3. Captures the world just before Norris/destructive-heuristic implementation lands.
1. LLM second-opinion latency (Q23 budget check)
fast preset, temperature=0, max_tokens=4, system prompt "Reply YES or NO only":
| Command | Reply | Latency |
|---|---|---|
rm -rf /tmp/foo |
YES | 1162 ms |
ls /tmp |
NO | 666 ms |
truncate -s 0 important.log |
YES | 475 ms |
git push --force origin main |
YES | 451 ms |
cat /etc/hostname |
NO | 425 ms |
Five-for-five correct answers; median ~475 ms; 95th percentile (small sample) ~1200 ms. The first request was slowest (likely cold-cache), subsequent ones settled below 700 ms.
Budget implication for a 16-step Norris session
Worst-case (no static-pattern hits, all queries to LLM, no cache): 16 × 1200 ms = ~19s of additional latency over the Norris run.
With realistic mix (static patterns catch the obvious cases without LLM, repeated commands hit the session cache): ~5s typical, dominated by genuinely-novel command tokens.
Conclusion: LLM second-opinion is workable as a default-on feature. The session-scoped cache (§12 commit #2) is the right mitigation; an additional async pre-check on the static patterns first means most calls never reach the LLM.
2. Module pre-state (Phase 2 head f26cbd9 + cosmetic fix 3fa6279)
| Module | LOC | State |
|---|---|---|
safety.lua |
55 | confirm_tool_call only; is_destructive and norris_step raise error() |
renderer.lua |
110 | exec frame + tool-call frame + assistant streaming + status; no norris frames |
repl.lua |
(post-Phase 2) | tool-sub-loop + :mcp meta + \C-n no-op placeholder |
context.lua |
(post-Phase 2) | static system_prompt (Phase 0+Phase 2 MCP block); no norris suffix wiring |
broker.lua |
96 | chat_stream(cfg, msgs, on_delta, opts) with opts.tools; no opts.max_tokens |
ffi/readline.lua |
(Phase 1) | rl_bind_keyseq + M.bind wrapper; no rl_insert_text or rl_redisplay |
config.lua |
(Phase 2) | mcp example block; no safety example block |
After Phase 3 lands, git diff main..post-phase-3 --stat should show:
safety.luasubstantial growth (~150 LOC for is_destructive + norris_step)- modest
renderer.luagrowth (~30 LOC for norris frames) - modest
repl.luagrowth (Norris driver + :norris meta) - one-line
context.luaaddition (system prompt suffix builder) - 4-line
broker.luaaddition (opts.max_tokens) - 6-line
ffi/readline.luaaddition (rl_insert_text + rl_redisplay)
3. Static-pattern hit-rate sanity check
Six patterns from §5 of the manifest exercised against safe vs destructive corpora:
| Pattern | Test command | Expected | Result |
|---|---|---|---|
rm%s+.-%-rf? |
rm -rf /tmp/x |
YES | HIT (pre-implementation Lua check) |
rm%s+.-%-rf? |
rm /tmp/x.log |
NO | MISS (correct — no -r/-f flags) |
git%s+push%s+.-%-%-force |
git push --force origin main |
YES | HIT |
git%s+push%s+.-%-%-force |
git push origin main |
NO | MISS |
find%s+.-%-delete |
find . -name '*.log' -delete |
YES | HIT |
find%s+.-%-delete |
find . -name '*.log' |
NO | MISS |
All six match the intent. Pattern soundness verified via Lua's string.match
on each test string. Implementation in safety.is_destructive will use the
same syntax.
4. Known carries from earlier phases
- Issue #15 — hossenfelder SSE buffering bug. Open. Affects Norris streaming visibility (the model's plan/explanation streams in one batch). Workaround: nothing aish-side; fix is upstream.
- Issue #14 —
:modelswap should re-render Context.system_prompt. Phase 3 makes this MORE relevant since the Norris suffix is dynamically composed; if the user:model deepthen:norris <goal>, the new system prompt must take effect on the next broker call. - Issues #32 / #33 — Phase 2 follow-ups (tool-name validation, auto_approve typo warning). Not blocking Phase 3.
End of Phase 3 Baseline — aish