docs/PHASE3: formulate — Norris autonomous mode + destructive-op gate

Phase 3 formulate manifest. Three pillars per PHASE0.md §11 row 3: Chuck Norris autonomous mode (planning loop), destructive-op heuristic (static patterns + LLM second-opinion), and HALT/confirm protocol. Resolutions baked in via §2: Q2 iterative re-plan after each action (not top-down tree) Action sources CMD: lines AND MCP tool_calls — Phase 2 contract honored HALT trigger static-pattern hit OR LLM-second-opinion flag HALT shape 3-way: proceed / skip / abort Auto-approve under Norris honors Phase 2 auto_approve policy EXCEPT destructive-op heuristic always wins LLM second-opinion model the `fast` preset (cheapest) Norris prompt suffix appended to system prompt while active; "GOAL: complete" sentinel for done Key extensions: - safety.is_destructive: ~20 static shell-idiom patterns + LLM probe; runs on interactive CMD: extraction too (§9 — replaces bare confirm_cmd for known-destructive cases). Q24 worth challenging at analyze. - safety.norris_step: single-iteration of the planner. Driver loop in repl.lua. \C-n toggle (real binding, replaces Phase 1 placeholder); :norris <goal> explicit launch. - renderer.norris_begin/step/halt/end: visual parity with exec and tool_call frames. Prompt becomes [aish:fast ⚡]> per PHASE0.md §9. - context.to_messages dynamically appends NORRIS MODE suffix when norris_active. New open questions (Q23–Q30) tracked in §11: Q23 LLM second-opinion latency budget (caching mitigation) Q24 interactive CMD: also subject to is_destructive? (proposal: yes) Q25 GOAL: complete + pending actions in same response — dispatch first Q26 context preservation on abort/done/budget — all preserve Q27 :norris continue (resume after abort) — deferred to v2 Q28 side-effect MCP tools not in *__shell/*__write_file patterns Q29 goal-implies-authorization for destructive ops — no, always confirm Q30 :norris no-arg vs \C-n share goal-prompt path — yes, trivial Module-layout (PHASE0 §4) untouched — all changes are growth of existing files. 6 commits expected at implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:45:03 +00:00
parent f26cbd9a3a
commit b58a842e49
1 changed files with 404 additions and 0 deletions
@@ -0,0 +1,404 @@
+# aish — Phase 3 Manifest
+
+**Project:** aish — AI-augmented conversational shell
+**Document:** Phase 3 Requirements, Architecture & Design Decisions
+**Status:** Formulate (pre-analyze)
+**Date:** 2026-05-12
+
+PHASE0.md is the locked substrate; PHASE1.md and PHASE2.md are layered
+on top. This manifest specifies what Phase 3 adds — **Chuck Norris
+autonomous mode**, the **destructive-op safety heuristic** that gates
+it, and the **HALT/confirm protocol** for human-in-the-loop control.
+Section numbers reference back to earlier phases where relevant.
+
+---
+
+## 1. Scope of Phase 3
+
+Three pillars per PHASE0.md §11 row 3:
+
+1. **Norris autonomous mode** (`safety.norris_step` + `repl.lua`
+   integration) — a planning-and-execution loop where the model
+   pursues a user-stated goal across multiple shell-exec and
+   tool-call turns without per-turn user prompting. Triggered by
+   `\C-n` (Phase 1 reserved key) or `:norris <goal>`. Iterative
+   re-plan after each action.
+
+2. **Destructive-op heuristic** (`safety.is_destructive`) — hybrid
+   gate that combines (a) a static pattern allowlist of obviously
+   destructive shell idioms (`rm -rf`, `dd of=`, `mkfs`, `git push
+   --force`, etc.) with (b) an LLM second-opinion via the `fast`
+   model for ambiguous cases. Any positive hit forces HALT before
+   execution, regardless of Norris-mode policy.
+
+3. **HALT/confirm protocol** — a uniform way for the Norris loop to
+   surface decisions to the user. HALT means: stop generation, drop
+   to a `[Norris] proceed / skip / abort?` prompt with the proposed
+   action displayed. User decides on each gate; abort returns control
+   to the interactive REPL with the conversation intact.
+
+**Phase 3 is done when:**
+
+- `\C-n` toggles Norris mode (replacing the Phase 1 status no-op).
+- `:norris <goal>` launches an autonomous task explicitly.
+- The model can plan + execute a multi-step task (e.g. "find all
+  Python files modified in the last week and count them") through
+  iterative CMD:/tool_call cycles without per-step user confirms
+  for safe operations.
+- `rm -rf /tmp/foo`, `dd of=/dev/sda`, and equivalent destructive
+  operations HALT and require explicit user approval.
+- The LLM second-opinion catches at least one realistic ambiguous
+  case the static patterns miss (e.g. `find . -delete`,
+  `truncate -s 0 important.log`).
+- HALT-abort returns to interactive mode without context loss.
+
+---
+
+## 2. Technology Decisions (delta from Phase 2)
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| Planning model | **Iterative re-plan after each action** | Resolves PHASE0.md §13 Q2. Top-down task trees are brittle to dynamic environments — a shell command's output frequently changes what the next step should be. Iterative re-plan piggybacks the existing Phase 2 tool-sub-loop pattern: model emits next action, gets result, decides next. Depth-bounded by `max_norris_steps` (default 16, configurable). |
+| Action sources | **`CMD:` lines + MCP `tool_calls`** | Per PHASE0.md §11 row 3 ("now able to use MCP tools as well as CMD: lines"). Norris consumes both kinds equally. The Phase 2 system prompt already biases toward tools when available; that bias carries into Norris mode unchanged. |
+| HALT trigger | **Static-pattern hit OR LLM-second-opinion flag** | Either gate fires HALT independently. Static for speed and predictability on known footguns; LLM for novel/ambiguous patterns. Cost of an LLM second-opinion call: one fast-model round-trip (≤3s on local Q4). Only invoked when static doesn't already HALT. |
+| HALT response shape | **3-way prompt**: `proceed` / `skip` / `abort` | `proceed` runs the action and continues. `skip` reports "user skipped" to the model and lets it re-plan. `abort` ends the Norris session, drops back to interactive mode. (`abort` is also bound to `\C-x\C-c` per PHASE1.md §7 reserved keys.) |
+| Auto-approve under Norris | **Trust the Phase 2 `auto_approve` policy** | A tool already in `auto_approve` runs without HALT even in Norris mode, as long as the destructive-op heuristic doesn't flag it. The user opted in once; Norris doesn't unilaterally re-prompt. CMD: lines never auto-approve under Norris — they always pass through `is_destructive` first. |
+| Destructive-op static rules | **Patterned shell-idiom list** in `safety.lua` (hardcoded; configurable later via `config.safety.destructive_patterns`) | Phase 3 v1 ships a fixed list (~20 patterns) inline. v2 may make it user-extendable. Patterns target the command string after expansion; conservative — false positives mean a confirm prompt the user dismisses, false negatives mean unsupervised destructive action. Bias to false positives. |
+| LLM second-opinion model | **The `fast` preset** (whichever model maps to the user's small/cheap local) | Cheapest available; destructive-detection doesn't need a smart model. Prompt: "Is this shell command destructive (could delete or overwrite data)? Answer YES or NO." Single-token-ish response, no streaming. Falls back to YES (safe default) on broker failure. |
+| Norris prompt suffix | **Status appended to the system prompt** when Norris is active: `[NORRIS MODE] You are operating autonomously toward a stated goal. Plan and execute step by step. Use CMD: lines or tool_calls. When done, emit "GOAL: complete" on its own line.` | The `GOAL: complete` sentinel is how the model signals task completion; Norris loop exits the planning sub-loop on seeing it. |
+| Interrupt handling | **`\C-c` during a Norris step sends abort** | Standard SIGINT semantics for the user. Mid-stream, this means: stop the broker request, stop any running shell command, drop to interactive mode. The current context is preserved (incl. partial assistant turn). |
+| Context budgeting under Norris | **Same `max_turns` and `token_budget` as interactive** | Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long. Phase 4's `memory.jsonl` summarization is the proper fix; Phase 3 just gets the eviction status as before. |
+
+---
+
+## 3. Module Changes
+
+| File | State after Phase 2 | Phase 3 changes |
+|---|---|---|
+| `safety.lua` | `confirm_tool_call` (Phase 2 surface only) + Phase 3 stubs `is_destructive` / `norris_step` raising error() | Implement the stubs: (a) `is_destructive(cmd_or_tool_call) -> (bool, reason)` with static pattern matching + optional LLM second-opinion (controlled by `cfg.safety.llm_second_opinion`, default true); (b) `norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts) -> {status, reason}` — single iteration of the Norris loop. Pattern list is module-local; LLM second-opinion uses `broker.chat` (non-streaming, no tools, single-shot). |
+| `repl.lua` | tool-sub-loop + `:mcp` meta + Phase 1 `\C-n` no-op binding | Replace `\C-n` body with a Norris toggle. Add `:norris <goal>` meta cmd as the explicit-launch variant. New module-local `norris_active` flag. Implement the Norris driver loop: while active, call `safety.norris_step`; handle HALT decisions; exit on `GOAL: complete`, `abort`, or step budget exceeded. Auto_approve policy from `confirm_tool_call` is consulted in-line. |
+| `renderer.lua` | exec frame + tool-call frame + assistant streaming | Add `M.norris_begin(goal)`, `M.norris_step(n, action_desc)`, `M.norris_halt(reason, action)`, `M.norris_end(status, reason)`. Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes `[aish:fast ⚡]>` when Norris is active per PHASE0.md §9. |
+| `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | No structural change. Norris re-uses `chat_stream` for planning rounds (same as interactive). `chat` is used by `safety.is_destructive` for LLM second-opinion. |
+| `context.lua` | system_prompt + turns + pending_exec_output + use_tool_role | When Norris is active, `to_messages()` appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage. |
+| `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) | No additions — `\C-n` binding mechanism already in place. The Phase 1 placeholder handler is just replaced with a real one in repl.lua. |
+| `config.lua` | mcp example block | New optional `safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} }` block, also commented-out example. Defaults are sane when absent. |
+
+No new module files beyond what already exists. The `\C-x\C-c` abort keybinding (PHASE1.md §7 reserved) gets wired here.
+
+---
+
+## 4. The Planning Loop (`safety.norris_step`)
+
+One iteration of Norris is exactly one round-trip with the model — same
+shape as Phase 2's tool-sub-loop iteration, with the model deciding what
+to do next based on accumulated context:
+
+```
+norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts):
+    # opts.step_n, opts.max_steps, opts.cfg
+
+    1. Call broker.chat_stream(broker_cfg, ctx:to_messages(), on_delta, {tools=tools_fn()})
+       — collect (text, tool_calls).
+    2. If text contains "GOAL: complete" line → return {status="done"}.
+    3. If no actions emitted (no tool_calls, no CMD: in text):
+        → return {status="stalled", reason="no action"} (user-visible).
+    4. For each action (tool_call OR CMD: line):
+         a. Pass through safety.is_destructive(action).
+         b. If destructive: invoke halt_fn(action, reason) → user verdict.
+              "proceed"  → run action.
+              "skip"     → append a synthesized turn telling the model
+                           "[aish] action skipped by user: <reason>".
+              "abort"    → return {status="aborted"}.
+         c. If non-destructive: check auto_approve (for tool_calls only)
+            or destructive_check passed (for CMD:). Run.
+         d. Append result turn to ctx (role:"tool" for tool calls,
+            exec-output buffer for CMD:).
+    5. step_n += 1. If step_n >= max_steps:
+       return {status="budget_exhausted"}.
+    6. Continue loop (driver in repl.lua re-calls norris_step).
+```
+
+The driver in repl.lua is the simple while loop; norris_step is one
+iteration so testing is granular.
+
+---
+
+## 5. Destructive-Op Heuristic (`safety.is_destructive`)
+
+### Static pattern list (v1, ~20 entries)
+
+```lua
+local DESTRUCTIVE_PATTERNS = {
+    -- Filesystem
+    { pat = "rm%s+.-%-rf?",                    reason = "rm -rf" },
+    { pat = "rm%s+.-%-fr?",                    reason = "rm -fr" },
+    { pat = "find%s+.-%-delete",               reason = "find -delete" },
+    { pat = "find%s+.-%-exec%s+rm",            reason = "find -exec rm" },
+    { pat = ">%s*/dev/sd[a-z]",                reason = "write to raw disk" },
+    { pat = "dd%s+.-of=/dev/",                 reason = "dd to device" },
+    { pat = "mkfs%.",                          reason = "mkfs (format)" },
+    { pat = "shred%s",                         reason = "shred" },
+    { pat = "wipefs%s",                        reason = "wipefs" },
+    { pat = "truncate%s+.-%-s%s*0",            reason = "truncate to zero" },
+
+    -- Version control destructive
+    { pat = "git%s+push%s+.-%-%-force",        reason = "git push --force" },
+    { pat = "git%s+push%s+.-%-f%f[%s]",        reason = "git push -f" },
+    { pat = "git%s+reset%s+.-%-%-hard",        reason = "git reset --hard" },
+    { pat = "git%s+clean%s+.-%-fd?",           reason = "git clean -fd" },
+    { pat = "git%s+branch%s+.-%-D",            reason = "git branch -D" },
+
+    -- Database / process
+    { pat = "DROP%s+TABLE",                    reason = "DROP TABLE", ci = true },
+    { pat = "DROP%s+DATABASE",                 reason = "DROP DATABASE", ci = true },
+    { pat = "TRUNCATE%s+TABLE",                reason = "TRUNCATE TABLE", ci = true },
+    { pat = "kill%s+%-9",                      reason = "kill -9" },
+    { pat = "pkill%s+%-9",                     reason = "pkill -9" },
+
+    -- Network/permission
+    { pat = "chmod%s+.-777",                   reason = "chmod 777" },
+    { pat = "chown%s+.-/$",                    reason = "chown on root path" },
+}
+```
+
+Patterns are Lua patterns (not regex), `ci = true` enables case-insensitive match.
+
+### LLM second-opinion (when static doesn't HALT)
+
+If `cfg.safety.llm_second_opinion ~= false`, call `broker.chat` (non-streaming,
+no tools) on the `fast` model with:
+
+```
+System: Detect destructive shell commands. Reply YES if running this would
+delete, overwrite, or irreversibly modify data, processes, or system state.
+Otherwise reply NO. Respond with only YES or NO.
+
+User: <the command text or tool-call summary>
+```
+
+Response parse: case-insensitive match on `YES` → destructive; everything
+else → not. Broker failure → YES (safe default).
+
+Returns: `(is_destructive, reason)`. Reason is the matched pattern name
+for static hits, "LLM flagged as destructive" for LLM hits.
+
+### Tool-call destructive check
+
+For MCP tool_calls, `is_destructive` checks:
+1. Tool name against an "always destructive" set (configurable; v1 includes
+   `*__shell` / `*__write_file` / `*__edit_file` / `*__shell_bg` patterns).
+2. Arguments serialized as JSON against the static shell patterns (in case
+   a `shell` tool's command argument is destructive).
+3. LLM second-opinion on the JSON-serialized call.
+
+---
+
+## 6. HALT Protocol
+
+When `is_destructive` returns true OR a non-auto_approve tool_call is
+attempted under Norris (auto_approve is the user's explicit consent
+that DOES apply):
+
+```
+─── NORRIS HALT ───────────────────────────────
+  step 7/16
+  reason: rm -rf
+  action: rm -rf /var/log/old
+[N] proceed / skip / abort? p
+```
+
+User types `p` (proceed) / `s` (skip) / `a` (abort).
+
+- **proceed**: run the action, append result to context, continue loop.
+- **skip**: append a synthesized turn explaining the user skipped this
+  step (gives the model a chance to re-plan); continue loop.
+- **abort**: exit Norris mode; the conversation context is preserved.
+  Drop back to the interactive prompt.
+
+`\C-x\C-c` at any prompt also aborts.
+
+Auto-approved tools (per `cfg.mcp.auto_approve`) skip the HALT entirely
+IF AND ONLY IF the destructive-op heuristic doesn't flag them. The
+heuristic is the final word — auto_approve is a confirmation bypass,
+not a destructive bypass.
+
+---
+
+## 7. Meta Commands (Phase 3 additions)
+
+| Command | Action |
+|---|---|
+| `:norris <goal>` | Launch Norris mode with an explicit goal text (same as `\C-n` after typing a goal but works on previously-issued goals too) |
+| `:norris off` | Exit Norris mode mid-loop (alternative to abort prompt) |
+| `:safety patterns` | Show the active destructive-op pattern list |
+| `:safety check <cmd>` | Probe `is_destructive` against a hypothetical command without running it (debug aid) |
+
+`\C-n` toggles Norris on/off in-place. If on, prompts for a goal if none
+pending; if off and a goal is in progress, asks for confirm-abort.
+
+---
+
+## 8. System Prompt Augmentation (active only in Norris)
+
+Appended to the default Phase 2 system prompt while `norris_active == true`:
+
+```
+[NORRIS MODE] You are operating autonomously toward a stated goal. Plan
+and execute step by step using CMD: lines (for shell) or tool_calls
+(when MCP tools are available). After each action, you will see its
+result in the next turn. Re-plan based on what you observe.
+
+When the goal is achieved, emit a single line:
+    GOAL: complete
+on its own line, optionally followed by a brief summary.
+
+If the goal is unreachable or you need user input, emit:
+    GOAL: blocked
+with a one-line reason.
+
+Avoid destructive operations unless the goal explicitly requires them.
+The user will be prompted to confirm destructive actions; expect their
+verdict in the next turn as "[aish] action skipped by user" or
+"[aish] action approved".
+```
+
+This block is composed dynamically by `context.to_messages()` when
+`ctx.norris_active` is set. No state stored beyond the boolean.
+
+---
+
+## 9. Migration from Phase 2
+
+User-visible:
+- `\C-n` now does something (was a Phase 1 placeholder).
+- `:norris <goal>` is a new meta command.
+- Destructive-looking commands suddenly stop and ask for confirmation
+  even outside Norris mode (the `is_destructive` check is also applied
+  to interactive CMD: extraction, replacing the current bare
+  `confirm_cmd` for known-destructive cases). This is a behavior change
+  to interactive mode.
+
+Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction
+marker is still the only shell-suggestion contract.
+
+`config.lua`: configs without a `safety` block work unchanged — defaults
+kick in (LLM second-opinion enabled, default pattern list, default step
+budget).
+
+---
+
+## 10. Out of Scope (Phase 3)
+
+Per PHASE0.md §11, these belong to later phases:
+- `memory.jsonl` summarization across sessions (Phase 4).
+- Multi-model routing / cloud fallback (Phase 5) — but Norris's
+  LLM second-opinion uses the `fast` model regardless of active model.
+- Tree-sitter syntax highlighting (Phase 6).
+
+Specifically out of Phase 3 scope despite proximity:
+- Per-session destructive-pattern learning (user-corrects-LLM feedback
+  loop). v2.
+- Parallel exploration / branching Norris sessions. v3+.
+- User-extendable pattern list via config. v2 — Phase 3 ships hardcoded.
+- Goal-decomposition for very long-running tasks (multi-day, persistent
+  state). Out of aish's scope entirely; that's a different tool.
+
+---
+
+## 11. Open Questions
+
+| # | Question | Impact | Resolve by |
+|---|---|---|---|
+| Q23 | LLM second-opinion latency budget: 3s per check on the fast model means a 16-step Norris session adds ~48s of overhead. Acceptable for autonomous mode? Or cache by command-hash within a session? | safety.lua | Phase 3 (analyze) |
+| Q24 | `is_destructive` also runs on **interactive** `CMD:` extraction (per §9)? Or only under Norris? §9 says yes; the manifest implicitly broadens the destructive gate. The alternative is to keep `confirm_cmd` as the interactive surface and Norris uses its own stricter check. Mixing both is the proposed default but worth challenging. | safety.lua + repl.lua | Phase 3 (analyze) |
+| Q25 | If the model emits BOTH text AND a `GOAL: complete` line in the same response, is the goal done immediately, or are any pending actions in that response still dispatched first? Default proposal: dispatch pending actions first; the GOAL: marker fires after the loop's next round-trip would have been called (so the model effectively pre-announces). Less surprising. | repl.lua norris driver | Phase 3 (analyze) |
+| Q26 | Context preservation when Norris ends with `abort` vs `done` vs `budget_exhausted`. Proposal: all three keep ctx intact (user sees the conversation in `:history`). The only difference is the renderer summary. | repl.lua + renderer.lua | Phase 3 (plan) |
+| Q27 | Resume mode after abort: should the user be able to type `:norris continue` to pick up where the model left off? v1 says no — too many edge cases with stale plans. v2 maybe. | scope | Phase 3 — defer to v2 |
+| Q28 | `tool_calls` from MCP servers that have side effects but aren't in `*__shell` / `*__write_file` patterns (e.g. a custom `hertz__wol_machine` tool that wakes a server). The static set in §5 won't catch this; the LLM second-opinion might. Reasonable default given the LLM's role here. | safety.lua | Phase 3 (verify) |
+| Q29 | Norris response when `is_destructive` returns YES but the user-stated goal explicitly authorizes destruction (e.g. "clean up old logs in /var/log"). Currently the HALT still fires. Should the model be allowed to convey "user authorized this implicitly" in the goal? v1: no — explicit per-action confirm always. v2 could relax. | UX + safety.lua | Phase 3 (verify) |
+| Q30 | `:norris` without a goal arg vs `\C-n`: should they share a single "ask for goal" code path? Yes; trivial. | repl.lua | Phase 3 (plan) |
+
+Resolved at formulate (in §2 table):
+- Q2 (planner shape) — iterative re-plan after each action.
+- Q8 inheritance — auto_approve from Phase 2 applies under Norris IF destructive heuristic clears.
+
+Carried forward (not in §13 originally):
+- Norris's interaction with Phase 4's memory.jsonl — captured tasks could pre-populate context. Phase 4 concern.
+
+---
+
+## 12. Implementation Plan (commit-by-commit)
+
+Bottom-up, same cadence as Phase 0/1/2. Six commits expected:
+
+1. **`safety.is_destructive` — static pattern list only.** Implement the
+   ~20-pattern matcher + the tool-call shell-arg extraction. No LLM
+   second-opinion yet. Returns `(bool, reason)`. **Test**: unit-table of
+   ~30 commands (mix of destructive + safe) → assertEqual on each.
+
+2. **`safety.is_destructive` — LLM second-opinion + cache.** Add the
+   fast-model probe path with a session-scoped cache keyed by the
+   normalized command string (mitigates Q23 latency). Broker-failure
+   falls back to YES. **Test**: mock broker; verify cache hits don't
+   re-call; verify failure-fallback is YES.
+
+3. **`renderer.lua` — Norris frames.** Add `norris_begin/step/halt/end`
+   per §3. Visual parity with exec/tool frames. Update prompt to
+   include `⚡` when active. **Test**: one-liner script renders each
+   frame visually.
+
+4. **`safety.norris_step` — single-iteration planner.** The
+   `norris_step` function per §4. Caller provides ctx + dispatch
+   helpers; returns `{status, reason}`. No driver loop yet — that's
+   the next commit. **Test**: mock broker emitting various model
+   responses (text+actions, GOAL:complete, stalled, destructive
+   action requiring HALT) and verify each return shape.
+
+5. **`repl.lua` — Norris driver + `\C-n` real binding + `:norris` meta.**
+   The while-loop driver consuming `safety.norris_step`, the rebound
+   `\C-n` (replacing Phase 1 placeholder), the `:norris <goal>` /
+   `:norris off` meta cmds, and `\C-x\C-c` abort handler. Also extends
+   the interactive `CMD:` confirm path to consult `is_destructive`
+   first (per Q24 resolution). **Test**: mocked-broker end-to-end —
+   submit a multi-step goal, verify driver loops correctly, hits
+   GOAL:complete, returns to interactive.
+
+6. **`config.lua` — `safety` example block.** Commented-out example
+   showing `llm_second_opinion`, `llm_model`, `destructive_patterns`,
+   `max_norris_steps`. Documentation only.
+
+### Risk / non-obvious
+
+- **Catastrophic false-negative in `is_destructive`**: the static list
+  is patterned; a creative model could write `bash -c "rm -rf /tmp"` or
+  `r"m" -rf` etc. Static is the floor, LLM second-opinion is the
+  net. Both check.
+- **LLM second-opinion model itself being autonomous** in a Norris run
+  would be circular. Mitigation: the second-opinion call uses
+  `broker.chat` (no tools, no streaming, dedicated prompt) — distinct
+  call path from the Norris planning stream. No tool-call recursion
+  possible.
+- **Norris loop runs the LLM N times**: each step is a full broker
+  round-trip plus optionally an LLM second-opinion. A 16-step Norris
+  goal could be ~32 LLM calls on the fast model. Visible as latency
+  but no economic surprise on local models.
+- **Destructive check on interactive CMD: extraction (Q24)** is a
+  behavior change to Phase 0/1 (`confirm_cmd` users will see the
+  prompt automatically for destructive commands even with
+  `confirm_cmd=false`). Documented in §9. Defensible: the worst case
+  is a confirm prompt the user dismisses.
+- **`GOAL: complete` extraction** uses the same `^GOAL: complete$` regex
+  on emitted text. Substrate-aligned with CMD: extraction.
+
+### Open at plan; resolve at review
+
+- Whether to ship the LLM second-opinion **on by default** or
+  **off by default with a config opt-in**. Default on is safer; off
+  saves latency. Recommend on; Phase 7 verify will quantify the
+  overhead.
+- Whether `:safety check <cmd>` should also be reachable by `\C-x`
+  keybinding for fast probing during interactive sessions. v2.
+
+---
+
+*End of Phase 3 Manifest — aish*