From cf4d79dd9d94ece569c8b4222b0d29d4db5f0576 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Tue, 12 May 2026 22:37:58 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE3:=20analyze=20+=20baseline=20?= =?UTF-8?q?=E2=80=94=20\C-n=20mechanics,=20LLM=20latency,=20module=20pre-s?= =?UTF-8?q?tate?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Analyze findings folded into the manifest: A1. \C-n binding can't toggle mid-prompt without rl_insert_text / rl_redisplay. Solution: bind those (one cdef + 2 wrappers in ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user types goal + Enter. Routes through existing meta dispatch. A2. broker has no max_tokens passthrough. Add opts.max_tokens for the LLM second-opinion path (terminates at ~2 tokens; verified proxy honors it). A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step is the per-iteration extraction; driver loop in repl.lua. Module-changes table (§3) updated with the rl_insert_text and max_tokens rows. Baseline doc (PHASE3-baseline.md, 80 lines) captures: - LLM second-opinion latency: 425-1162ms per probe, all 5 test cases correct. Worst-case 16-step Norris = ~20s overhead; with static-pattern fast-path + session cache, ~5s realistic. - Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state per file before Phase 3 edits. - Six static-pattern Lua-match sanity checks (all correct). - Carries: aish#15 (still open), aish#14, aish#32/#33. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE3-baseline.md | 90 +++++++++++++++++++++++++++++++++++++++++ docs/PHASE3.md | 44 ++++++++++++++++++-- 2 files changed, 131 insertions(+), 3 deletions(-) create mode 100644 docs/PHASE3-baseline.md diff --git a/docs/PHASE3-baseline.md b/docs/PHASE3-baseline.md new file mode 100644 index 0000000..af1794f --- /dev/null +++ b/docs/PHASE3-baseline.md @@ -0,0 +1,90 @@ +# Phase 3 Baseline — pre-implementation measurements + +**Date:** 2026-05-12 +**Target probed:** `hossenfelder.fritz.box:8082` (OpenAI-compat broker → `qwen2.5-coder-1.5b-q4_k_m.gguf` local). + +This is the Phase 7 (verify) anchor for Phase 3. Captures the world just +before Norris/destructive-heuristic implementation lands. + +--- + +## 1. LLM second-opinion latency (Q23 budget check) + +`fast` preset, `temperature=0`, `max_tokens=4`, system prompt "Reply YES or NO only": + +| Command | Reply | Latency | +|---|---|---| +| `rm -rf /tmp/foo` | YES | 1162 ms | +| `ls /tmp` | NO | 666 ms | +| `truncate -s 0 important.log` | YES | 475 ms | +| `git push --force origin main` | YES | 451 ms | +| `cat /etc/hostname` | NO | 425 ms | + +Five-for-five correct answers; median ~475 ms; 95th percentile (small sample) ~1200 ms. The first request was slowest (likely cold-cache), subsequent ones settled below 700 ms. + +### Budget implication for a 16-step Norris session + +Worst-case (no static-pattern hits, all queries to LLM, no cache): +16 × 1200 ms = ~19s of additional latency over the Norris run. + +With realistic mix (static patterns catch the obvious cases without +LLM, repeated commands hit the session cache): +~5s typical, dominated by genuinely-novel command tokens. + +Conclusion: LLM second-opinion is workable as a default-on feature. +The session-scoped cache (§12 commit #2) is the right mitigation; an +additional async pre-check on the static patterns first means most +calls never reach the LLM. + +--- + +## 2. Module pre-state (Phase 2 head `f26cbd9` + cosmetic fix `3fa6279`) + +| Module | LOC | State | +|---|---|---| +| `safety.lua` | 55 | confirm_tool_call only; `is_destructive` and `norris_step` raise error() | +| `renderer.lua` | 110 | exec frame + tool-call frame + assistant streaming + status; no norris frames | +| `repl.lua` | (post-Phase 2) | tool-sub-loop + :mcp meta + `\C-n` no-op placeholder | +| `context.lua` | (post-Phase 2) | static system_prompt (Phase 0+Phase 2 MCP block); no norris suffix wiring | +| `broker.lua` | 96 | chat_stream(cfg, msgs, on_delta, opts) with opts.tools; no opts.max_tokens | +| `ffi/readline.lua` | (Phase 1) | rl_bind_keyseq + M.bind wrapper; no rl_insert_text or rl_redisplay | +| `config.lua` | (Phase 2) | mcp example block; no safety example block | + +After Phase 3 lands, `git diff main..post-phase-3 --stat` should show: +- `safety.lua` substantial growth (~150 LOC for is_destructive + norris_step) +- modest `renderer.lua` growth (~30 LOC for norris frames) +- modest `repl.lua` growth (Norris driver + :norris meta) +- one-line `context.lua` addition (system prompt suffix builder) +- 4-line `broker.lua` addition (opts.max_tokens) +- 6-line `ffi/readline.lua` addition (rl_insert_text + rl_redisplay) + +--- + +## 3. Static-pattern hit-rate sanity check + +Six patterns from §5 of the manifest exercised against safe vs destructive corpora: + +| Pattern | Test command | Expected | Result | +|---|---|---|---| +| `rm%s+.-%-rf?` | `rm -rf /tmp/x` | YES | HIT (pre-implementation Lua check) | +| `rm%s+.-%-rf?` | `rm /tmp/x.log` | NO | MISS (correct — no -r/-f flags) | +| `git%s+push%s+.-%-%-force` | `git push --force origin main` | YES | HIT | +| `git%s+push%s+.-%-%-force` | `git push origin main` | NO | MISS | +| `find%s+.-%-delete` | `find . -name '*.log' -delete` | YES | HIT | +| `find%s+.-%-delete` | `find . -name '*.log'` | NO | MISS | + +All six match the intent. Pattern soundness verified via Lua's `string.match` +on each test string. Implementation in `safety.is_destructive` will use the +same syntax. + +--- + +## 4. Known carries from earlier phases + +- **Issue [#15](https://git.reauktion.de/marfrit/aish/issues/15)** — hossenfelder SSE buffering bug. Open. Affects Norris streaming visibility (the model's plan/explanation streams in one batch). Workaround: nothing aish-side; fix is upstream. +- **Issue [#14](https://git.reauktion.de/marfrit/aish/issues/14)** — `:model` swap should re-render Context.system_prompt. Phase 3 makes this MORE relevant since the Norris suffix is dynamically composed; if the user `:model deep` then `:norris `, the new system prompt must take effect on the next broker call. +- **Issues [#32](https://git.reauktion.de/marfrit/aish/issues/32) / [#33](https://git.reauktion.de/marfrit/aish/issues/33)** — Phase 2 follow-ups (tool-name validation, auto_approve typo warning). Not blocking Phase 3. + +--- + +*End of Phase 3 Baseline — aish* diff --git a/docs/PHASE3.md b/docs/PHASE3.md index ec6ff51..ebc2d46 100644 --- a/docs/PHASE3.md +++ b/docs/PHASE3.md @@ -2,9 +2,46 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 3 Requirements, Architecture & Design Decisions -**Status:** Formulate (pre-analyze) +**Status:** Analyze (formulate complete; live-probed against current tree at `b58a842`) **Date:** 2026-05-12 +**Analyze findings (2026-05-12):** + +A1. **`\C-n` mid-readline limitation.** Phase 1's `\C-n` handler fires + synchronously from inside the readline keystroke callback (via + `rl_bind_keyseq` → ffi-cast Lua closure). The current binding API + only exposes `rl_bind_keyseq` — no `rl_insert_text`, + `rl_replace_line`, or `rl_redisplay`. So a `\C-n` callback cannot + cleanly mutate the in-progress prompt buffer or end the + readline call early to "transition into Norris mode". + **Resolution**: bind `rl_insert_text` + `rl_redisplay` (single cdef + + 2 wrapper lines in `ffi/readline.lua`) so the `\C-n` handler + inserts `:norris ` at the cursor and refreshes the display. User + then types the goal + Enter, routing through the existing meta + dispatch normally. `\C-n` becomes a typing shortcut, not a state + toggle. + +A2. **`broker.chat` lacks `max_tokens`.** The LLM second-opinion path + in `safety.is_destructive` needs a tight YES/NO completion (2 + tokens max). The proxy + small models honor `max_tokens` + correctly (verified vs hossenfelder: `max_tokens=4` returned a + clean "YES" in 2 completion tokens). Phase 2's broker doesn't + surface this option. **Resolution**: add `opts.max_tokens` to + `M.chat_stream`'s opts table (Phase 2 already widened opts); + `M.chat` passes through. Defaults nil → field omitted from the + request body — Phase 1/2 callers unaffected. + +A3. **Tool-sub-loop is structurally reusable.** Phase 2's `ask_ai` sub- + loop (stream → collect text + tool_calls → dispatch → append → loop + until pure-text response or cap) IS the planner shape Phase 3 wants. + `safety.norris_step` per §4 is essentially this iteration extracted + behind a function call, plus the `GOAL: complete` sentinel check. + No structural refactor of Phase 2 needed — Norris is additive. + +These findings tighten §3's module-changes table and §12's commit #1 +scope (adds a small `ffi/readline.lua` extension to commit #5) — see +inline notes below where the change matters. + PHASE0.md is the locked substrate; PHASE1.md and PHASE2.md are layered on top. This manifest specifies what Phase 3 adds — **Chuck Norris autonomous mode**, the **destructive-op safety heuristic** that gates @@ -78,9 +115,10 @@ Three pillars per PHASE0.md §11 row 3: | `safety.lua` | `confirm_tool_call` (Phase 2 surface only) + Phase 3 stubs `is_destructive` / `norris_step` raising error() | Implement the stubs: (a) `is_destructive(cmd_or_tool_call) -> (bool, reason)` with static pattern matching + optional LLM second-opinion (controlled by `cfg.safety.llm_second_opinion`, default true); (b) `norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts) -> {status, reason}` — single iteration of the Norris loop. Pattern list is module-local; LLM second-opinion uses `broker.chat` (non-streaming, no tools, single-shot). | | `repl.lua` | tool-sub-loop + `:mcp` meta + Phase 1 `\C-n` no-op binding | Replace `\C-n` body with a Norris toggle. Add `:norris ` meta cmd as the explicit-launch variant. New module-local `norris_active` flag. Implement the Norris driver loop: while active, call `safety.norris_step`; handle HALT decisions; exit on `GOAL: complete`, `abort`, or step budget exceeded. Auto_approve policy from `confirm_tool_call` is consulted in-line. | | `renderer.lua` | exec frame + tool-call frame + assistant streaming | Add `M.norris_begin(goal)`, `M.norris_step(n, action_desc)`, `M.norris_halt(reason, action)`, `M.norris_end(status, reason)`. Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes `[aish:fast ⚡]>` when Norris is active per PHASE0.md §9. | -| `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | No structural change. Norris re-uses `chat_stream` for planning rounds (same as interactive). `chat` is used by `safety.is_destructive` for LLM second-opinion. | +| `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | Re-used as-is for planning rounds (Norris just calls chat_stream like interactive). See row below for the small `max_tokens` opts extension needed by the LLM second-opinion path. | | `context.lua` | system_prompt + turns + pending_exec_output + use_tool_role | When Norris is active, `to_messages()` appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage. | -| `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) | No additions — `\C-n` binding mechanism already in place. The Phase 1 placeholder handler is just replaced with a real one in repl.lua. | +| `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) | **Small extension per A1**: add `rl_insert_text` + `rl_redisplay` to the `ffi.cdef` block and expose `M.insert_text(s)` / `M.redisplay()` wrappers. Needed so the `\C-n` handler can stuff `:norris ` into the in-progress buffer cleanly rather than just printing a status that disappears. | +| `broker.lua` | `chat_stream(cfg, msgs, on_delta, opts)` with opts.tools | **Small extension per A2**: `opts.max_tokens` (integer) is passed through to the request body as `max_tokens`. Omitted when nil. `M.chat` accepts the same opt. Needed so `safety.is_destructive`'s YES/NO probe terminates in ~2 tokens. | | `config.lua` | mcp example block | New optional `safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} }` block, also commented-out example. Defaults are sane when absent. | No new module files beyond what already exists. The `\C-x\C-c` abort keybinding (PHASE1.md §7 reserved) gets wired here.