Files
aish/docs/PHASE3.md
T
marfrit 91ddcb005d docs/PHASE3: review fold-in — security-layer BLOCKERs resolved
Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against
the analyze-tier draft. Resolutions applied:

BLOCKERs:
  B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c,
     eval, pipe-to-shell, python -c, xargs|rm. Added 9 wrapper
     patterns to §5. Norris HALTs on any wrapper invocation; user
     reads the inner before proceed. The patterns are the
     conservative floor against the wrapper bypass class.
  B2 LLM second-opinion was self-policing — same model class
     generating actions then judging them. Switched probe model
     from `fast` to `deep` (qwen3-30b). Added re-roll inversion:
     if first probe says NO, ask "is this SAFE?". Disagreement
     between two probes → HALT. Cheap independent-class insurance.
  B3 `is_destructive` would have run on interactive CMD: extraction
     — a PHASE0 §6/§10 substrate amendment in disguise. Resolved
     Q24: heuristic runs ONLY when norris_active == true. No
     substrate change; interactive `confirm_cmd` semantics unchanged.

CONCERNs:
  C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips
     escalate to abort/force-proceed prompt.
  C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to
     dispatch ALL pending actions before checking GOAL: complete.
  C3 Norris-goal eviction: goal embedded directly in the dynamic
     system-prompt suffix; survives sliding-window eviction.
  C4 Readline use-after-free window: M.bind no longer frees old
     callbacks; pin for process lifetime (bounded memory cost).
  C5 GOAL: complete matcher: line-level scan, exact match after
     trim — substrate-aligned with CMD: rigor.
  C6 §4 step 4 tightened: auto_approve does NOT bypass destructive
     heuristic; tool_call without auto_approve still HALTs even
     when destructive-clear (Norris conservative).

NITs deferred or rolled into pattern table:
  - chown root-path pattern tightened (NIT 2 in-line)
  - Test corpus expansion noted in §12 commit #1 risk
  - Other NITs are wording-level

Status: Plan (review folded). Ready for commit #1 (safety static
patterns) once another review pass clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:42:58 +00:00

32 KiB

aish — Phase 3 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 3 Requirements, Architecture & Design Decisions Status: Plan (review fold-in 2026-05-12 — security-layer BLOCKERs resolved) Date: 2026-05-12

Review fold-in (2026-05-12, security layer):

R-B1. Shell-wrapper bypass coverage. Static patterns missed bash -c, sh -c, eval, xargs | rm, | sh, python -c. Added to the pattern list in §5 as a "wrapper requires manual review" class — in Norris mode, any wrapper invocation HALTs regardless of the inner command. The wrapper itself is the trigger.

R-B2. LLM second-opinion model class. Switched from fast to deep for the destructive-detection probe. fast co-emits the action AND judges it (circular). deep is a different model class (qwen3-30b currently mapped to deep per config.lua) — adds ~1-3s per probe but breaks the self-policing loop. Added a YES/inversion re-roll: if the deep model says NO, re-ask "Is this safe?" — disagreement → HALT. Cheap insurance for the edge cases. §5 reflects.

R-B3. is_destructive scope narrowed to Norris mode. The formulate-time §9 said the heuristic would also gate interactive CMD: extraction. That's a PHASE0 §6/§10 substrate amendment that's bigger than Phase 3 should be making implicitly. Q24 resolved: is_destructive runs ONLY when norris_active == true. Interactive CMD: extraction continues to honor confirm_cmd exactly as Phase 0 specified — no behavior change.

CONCERN folds (2026-05-12):

R-C1. Skip-budget addedconsecutive_user_skips counter; ≥2 triggers escalation HALT "model has proposed similar destructive action 3+ times — abort, force-proceed, or change goal?". §4 + §6 reflect.

R-C2. §4 algorithm reorder — dispatch all pending actions FIRST, then check GOAL: complete. Q25 resolution + §4 algorithm now consistent (was contradictory).

R-C3. Norris goal pinned in system-prompt suffixctx.norris_goal field; the dynamic system suffix from §8 carries it. Eviction can no longer drop the anchor.

R-C4. Readline rebind safetyM.bind will NOT free old callbacks (pin for process lifetime). Avoids a use-after-free window between :free() and the new rl_bind_keyseq call. Memory cost is bounded (one closure per bound key, negligible).

R-C5. GOAL: complete matcher — line-level scan, exact match after trim. Aligned with CMD: extraction rigor.

R-C6. §4 step 4 algorithm tightened — auto_approve only short-circuits the user-prompt, NEVER the destructive-heuristic. Tool-call without auto_approve entry AND no destructive flag → still HALTs in Norris mode (Norris is conservative by design).

Analyze findings (2026-05-12):

A1. \C-n mid-readline limitation. Phase 1's \C-n handler fires synchronously from inside the readline keystroke callback (via rl_bind_keyseq → ffi-cast Lua closure). The current binding API only exposes rl_bind_keyseq — no rl_insert_text, rl_replace_line, or rl_redisplay. So a \C-n callback cannot cleanly mutate the in-progress prompt buffer or end the readline call early to "transition into Norris mode". Resolution: bind rl_insert_text + rl_redisplay (single cdef + 2 wrapper lines in ffi/readline.lua) so the \C-n handler inserts :norris at the cursor and refreshes the display. User then types the goal + Enter, routing through the existing meta dispatch normally. \C-n becomes a typing shortcut, not a state toggle.

A2. broker.chat lacks max_tokens. The LLM second-opinion path in safety.is_destructive needs a tight YES/NO completion (2 tokens max). The proxy + small models honor max_tokens correctly (verified vs hossenfelder: max_tokens=4 returned a clean "YES" in 2 completion tokens). Phase 2's broker doesn't surface this option. Resolution: add opts.max_tokens to M.chat_stream's opts table (Phase 2 already widened opts); M.chat passes through. Defaults nil → field omitted from the request body — Phase 1/2 callers unaffected.

A3. Tool-sub-loop is structurally reusable. Phase 2's ask_ai sub- loop (stream → collect text + tool_calls → dispatch → append → loop until pure-text response or cap) IS the planner shape Phase 3 wants. safety.norris_step per §4 is essentially this iteration extracted behind a function call, plus the GOAL: complete sentinel check. No structural refactor of Phase 2 needed — Norris is additive.

These findings tighten §3's module-changes table and §12's commit #1 scope (adds a small ffi/readline.lua extension to commit #5) — see inline notes below where the change matters.

PHASE0.md is the locked substrate; PHASE1.md and PHASE2.md are layered on top. This manifest specifies what Phase 3 adds — Chuck Norris autonomous mode, the destructive-op safety heuristic that gates it, and the HALT/confirm protocol for human-in-the-loop control. Section numbers reference back to earlier phases where relevant.


1. Scope of Phase 3

Three pillars per PHASE0.md §11 row 3:

  1. Norris autonomous mode (safety.norris_step + repl.lua integration) — a planning-and-execution loop where the model pursues a user-stated goal across multiple shell-exec and tool-call turns without per-turn user prompting. Triggered by \C-n (Phase 1 reserved key) or :norris <goal>. Iterative re-plan after each action.

  2. Destructive-op heuristic (safety.is_destructive) — hybrid gate that combines (a) a static pattern allowlist of obviously destructive shell idioms (rm -rf, dd of=, mkfs, git push --force, etc.) with (b) an LLM second-opinion via the fast model for ambiguous cases. Any positive hit forces HALT before execution, regardless of Norris-mode policy.

  3. HALT/confirm protocol — a uniform way for the Norris loop to surface decisions to the user. HALT means: stop generation, drop to a [Norris] proceed / skip / abort? prompt with the proposed action displayed. User decides on each gate; abort returns control to the interactive REPL with the conversation intact.

Phase 3 is done when:

  • \C-n toggles Norris mode (replacing the Phase 1 status no-op).
  • :norris <goal> launches an autonomous task explicitly.
  • The model can plan + execute a multi-step task (e.g. "find all Python files modified in the last week and count them") through iterative CMD:/tool_call cycles without per-step user confirms for safe operations.
  • rm -rf /tmp/foo, dd of=/dev/sda, and equivalent destructive operations HALT and require explicit user approval.
  • The LLM second-opinion catches at least one realistic ambiguous case the static patterns miss (e.g. find . -delete, truncate -s 0 important.log).
  • HALT-abort returns to interactive mode without context loss.

2. Technology Decisions (delta from Phase 2)

Decision Choice Rationale
Planning model Iterative re-plan after each action Resolves PHASE0.md §13 Q2. Top-down task trees are brittle to dynamic environments — a shell command's output frequently changes what the next step should be. Iterative re-plan piggybacks the existing Phase 2 tool-sub-loop pattern: model emits next action, gets result, decides next. Depth-bounded by max_norris_steps (default 16, configurable).
Action sources CMD: lines + MCP tool_calls Per PHASE0.md §11 row 3 ("now able to use MCP tools as well as CMD: lines"). Norris consumes both kinds equally. The Phase 2 system prompt already biases toward tools when available; that bias carries into Norris mode unchanged.
HALT trigger Static-pattern hit OR LLM-second-opinion flag Either gate fires HALT independently. Static for speed and predictability on known footguns; LLM for novel/ambiguous patterns. Cost of an LLM second-opinion call: one fast-model round-trip (≤3s on local Q4). Only invoked when static doesn't already HALT.
HALT response shape 3-way prompt: proceed / skip / abort proceed runs the action and continues. skip reports "user skipped" to the model and lets it re-plan. abort ends the Norris session, drops back to interactive mode. (abort is also bound to \C-x\C-c per PHASE1.md §7 reserved keys.)
Auto-approve under Norris Trust the Phase 2 auto_approve policy A tool already in auto_approve runs without HALT even in Norris mode, as long as the destructive-op heuristic doesn't flag it. The user opted in once; Norris doesn't unilaterally re-prompt. CMD: lines never auto-approve under Norris — they always pass through is_destructive first.
Destructive-op static rules Patterned shell-idiom list in safety.lua (hardcoded; configurable later via config.safety.destructive_patterns) Phase 3 v1 ships a fixed list (~20 patterns) inline. v2 may make it user-extendable. Patterns target the command string after expansion; conservative — false positives mean a confirm prompt the user dismisses, false negatives mean unsupervised destructive action. Bias to false positives.
LLM second-opinion model The deep preset (independent model class, not the one emitting actions) R-B2 resolution. Same model class self-policing is circular — deep (qwen3-30b currently) judges actions emitted by the active model (often fast qwen-1.5b under Norris). Adds ~1-3s per probe; broker failure → YES (safe default). Re-roll inversion: if first probe says NO, ask the inverted "Is this safe?" — disagreement → HALT.
Norris prompt suffix Status appended to the system prompt when Norris is active: [NORRIS MODE] You are operating autonomously toward a stated goal. Plan and execute step by step. Use CMD: lines or tool_calls. When done, emit "GOAL: complete" on its own line. The GOAL: complete sentinel is how the model signals task completion; Norris loop exits the planning sub-loop on seeing it.
Interrupt handling \C-c during a Norris step sends abort Standard SIGINT semantics for the user. Mid-stream, this means: stop the broker request, stop any running shell command, drop to interactive mode. The current context is preserved (incl. partial assistant turn).
Context budgeting under Norris Same max_turns and token_budget as interactive Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long. Phase 4's memory.jsonl summarization is the proper fix; Phase 3 just gets the eviction status as before.

3. Module Changes

File State after Phase 2 Phase 3 changes
safety.lua confirm_tool_call (Phase 2 surface only) + Phase 3 stubs is_destructive / norris_step raising error() Implement the stubs: (a) is_destructive(cmd_or_tool_call) -> (bool, reason) with static pattern matching + optional LLM second-opinion (controlled by cfg.safety.llm_second_opinion, default true); (b) norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts) -> {status, reason} — single iteration of the Norris loop. Pattern list is module-local; LLM second-opinion uses broker.chat (non-streaming, no tools, single-shot).
repl.lua tool-sub-loop + :mcp meta + Phase 1 \C-n no-op binding Replace \C-n body with a Norris toggle. Add :norris <goal> meta cmd as the explicit-launch variant. New module-local norris_active flag. Implement the Norris driver loop: while active, call safety.norris_step; handle HALT decisions; exit on GOAL: complete, abort, or step budget exceeded. Auto_approve policy from confirm_tool_call is consulted in-line.
renderer.lua exec frame + tool-call frame + assistant streaming Add M.norris_begin(goal), M.norris_step(n, action_desc), M.norris_halt(reason, action), M.norris_end(status, reason). Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes [aish:fast ⚡]> when Norris is active per PHASE0.md §9.
broker.lua chat_stream with opts.tools, chat non-streaming Re-used as-is for planning rounds (Norris just calls chat_stream like interactive). See row below for the small max_tokens opts extension needed by the LLM second-opinion path.
context.lua system_prompt + turns + pending_exec_output + use_tool_role When Norris is active, to_messages() appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage.
ffi/readline.lua bind(seq, fn) (Phase 1) — frees old callback before rebinding Small extension per A1 + R-C4 fix: (a) add rl_insert_text + rl_redisplay to the ffi.cdef block and expose M.insert_text(s) / M.redisplay() wrappers — needed so \C-n can stuff :norris into the buffer; (b) drop the _bound[seq]:free() call from M.bind — readline retains the function pointer in its keymap; freeing before re-bind opens a use-after-free window if the user presses the key in that gap. Pin all bound callbacks for process lifetime; memory cost is bounded (one closure per key, ~O(N) where N = number of bound keys ≤ ~10).
broker.lua chat_stream(cfg, msgs, on_delta, opts) with opts.tools Small extension per A2: opts.max_tokens (integer) is passed through to the request body as max_tokens. Omitted when nil. M.chat accepts the same opt. Needed so safety.is_destructive's YES/NO probe terminates in ~2 tokens.
config.lua mcp example block New optional safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} } block, also commented-out example. Defaults are sane when absent.

No new module files beyond what already exists. The \C-x\C-c abort keybinding (PHASE1.md §7 reserved) gets wired here.


4. The Planning Loop (safety.norris_step)

One iteration of Norris is exactly one round-trip with the model — same shape as Phase 2's tool-sub-loop iteration, with the model deciding what to do next based on accumulated context:

norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts):
    # opts.step_n, opts.max_steps, opts.cfg, opts.consecutive_skips

    1. Call broker.chat_stream(broker_cfg, ctx:to_messages(), on_delta, {tools=tools_fn()})
       — collect (text, tool_calls).

    2. Extract actions from response:
         - tool_calls   (already collected by broker accumulator)
         - cmd_lines    via executor.extract_cmd_lines(text) — line-anchored
         - goal_done    line-level scan for exact "GOAL: complete" (R-C5)

    3. If actions are empty AND goal_done is false:
         → return {status="stalled", reason="no action"}.

    4. Dispatch ALL pending actions BEFORE checking goal_done (R-C2):
       tool_calls first (structured route), CMD: lines second (legacy).
       For each action:
         a. Pass through safety.is_destructive(action).
            - tool_calls: check tool-name set + serialized args.
            - CMD: lines: pattern match + LLM probe.
         b. If destructive: invoke halt_fn(action, reason, opts.cfg).
            "proceed" → run action.
            "skip"    → opts.consecutive_skips += 1.
                        If consecutive_skips >= 3 (R-C1):
                          escalate halt with reason "repeated similar skips"
                          → user verdict abort / force-proceed.
                        Append synthesized "[aish] action skipped by user: <reason>"
                        as a role:"tool" turn (for tool_calls) or as exec-output
                        prefix (for CMD: lines) — alternation invariant.
            "abort"   → return {status="aborted"}.
         c. If non-destructive (cleared by static + LLM):
            - tool_call: check auto_approve. If in policy, run silently;
              otherwise (R-C6) halt_fn STILL fires for the consent prompt
              (Norris is conservative; auto_approve is the *only* way to
              skip consent in autonomous mode).
            - CMD: line: run (destructive-check is the gate; confirm_cmd
              is interactive-mode-only — R-B3 narrows scope).
         d. On successful proceed: opts.consecutive_skips = 0.
         e. Append result turn to ctx (role:"tool" for tool calls,
            exec-output buffer for CMD: — same as Phase 0/2 paths).

    5. After all actions dispatched: if goal_done → return {status="done"}.

    6. step_n += 1. If step_n >= max_steps:
       return {status="budget_exhausted"}.

    7. Continue loop (driver in repl.lua re-calls norris_step).

The driver in repl.lua is the simple while loop; norris_step is one iteration so testing is granular.


5. Destructive-Op Heuristic (safety.is_destructive)

Static pattern list (v1, ~20 entries)

local DESTRUCTIVE_PATTERNS = {
    -- ── Shell wrappers (R-B1) — flag the wrapper itself; can't inspect content
    --    safely without parsing the inner shell. Norris HALTs on these
    --    unconditionally; the user can proceed/abort with the full context.
    { pat = "^%s*bash%s+%-l?c%s",              reason = "bash -c (wrapped shell)" },
    { pat = "^%s*sh%s+%-l?c%s",                reason = "sh -c (wrapped shell)" },
    { pat = "^%s*zsh%s+%-l?c%s",               reason = "zsh -c (wrapped shell)" },
    { pat = "^%s*eval%s",                      reason = "eval (dynamic shell)" },
    { pat = "^%s*python3?%s+%-c%s",            reason = "python -c (inline script)" },
    { pat = "^%s*perl%s+%-e%s",                reason = "perl -e (inline script)" },
    { pat = "|%s*sh%f[%s]",                    reason = "pipe-to-sh" },
    { pat = "|%s*bash%f[%s]",                  reason = "pipe-to-bash" },
    { pat = "xargs%s+.-rm",                    reason = "xargs ... rm" },

    -- ── Filesystem destructive
    { pat = "rm%s+.-%-rf?",                    reason = "rm -rf" },
    { pat = "rm%s+.-%-fr?",                    reason = "rm -fr" },
    { pat = "find%s+.-%-delete",               reason = "find -delete" },
    { pat = "find%s+.-%-exec%s+rm",            reason = "find -exec rm" },
    { pat = ">%s*/dev/sd[a-z]",                reason = "write to raw disk" },
    { pat = "dd%s+.-of=/dev/",                 reason = "dd to device" },
    { pat = "mkfs%.",                          reason = "mkfs (format)" },
    { pat = "shred%s",                         reason = "shred" },
    { pat = "wipefs%s",                        reason = "wipefs" },
    { pat = "truncate%s+.-%-s%s*0",            reason = "truncate to zero" },

    -- ── Version control destructive
    { pat = "git%s+push%s+.-%-%-force",        reason = "git push --force" },
    { pat = "git%s+push%s+.-%-f%f[%s]",        reason = "git push -f" },
    { pat = "git%s+reset%s+.-%-%-hard",        reason = "git reset --hard" },
    { pat = "git%s+clean%s+.-%-fd?",           reason = "git clean -fd" },
    { pat = "git%s+branch%s+.-%-D",            reason = "git branch -D" },

    -- ── Database / process
    { pat = "DROP%s+TABLE",                    reason = "DROP TABLE", ci = true },
    { pat = "DROP%s+DATABASE",                 reason = "DROP DATABASE", ci = true },
    { pat = "TRUNCATE%s+TABLE",                reason = "TRUNCATE TABLE", ci = true },
    { pat = "kill%s+%-9",                      reason = "kill -9" },
    { pat = "pkill%s+%-9",                     reason = "pkill -9" },

    -- ── Network/permission (chown tightened per NIT 2)
    { pat = "chmod%s+.-777",                   reason = "chmod 777" },
    { pat = "chown%s+.-%s+/%s*$",              reason = "chown on root path" },
}

The 9 wrapper patterns are the conservative floor against R-B1 bypass classes. Norris emits bash -c '...' → wrapper hit → HALT (user can proceed if they read the inner). LLM second-opinion still runs as a backup but the static net catches the obvious cases first.

Patterns are Lua patterns (not regex), ci = true enables case-insensitive match.

LLM second-opinion (when static doesn't HALT)

If cfg.safety.llm_second_opinion ~= false, call broker.chat (non-streaming, no tools, opts.max_tokens=4) on the deep model preset (independent class from the action-emitting model — R-B2):

System: Detect destructive shell commands. Reply YES if running this would
delete, overwrite, or irreversibly modify data, processes, or system state.
Otherwise reply NO. Respond with only YES or NO.

User: <the command text or tool-call summary>

Response parse: case-insensitive match on YES → destructive.

Re-roll on NO (R-B2 cheap insurance): if the first probe returns NO, run a second probe with inverted phrasing:

System: Reply YES or NO only. Is the following shell command SAFE to
run autonomously without user review?

User: <same command>

Re-roll says NO → command is dangerous → HALT. Disagreement between the two probes (first NO, second NO) → HALT (treat agreement-by-default as suspicious). Both probes agree YES is safe → clear.

Broker failure → YES (safe default).

Session-scoped cache keyed by the normalized command string mitigates the latency cost (~1-3s per probe on deep model — see PHASE3-baseline §1). Repeated patterns within a single session probe once.

Returns: (is_destructive, reason). Reason is the matched pattern name for static hits, "LLM flagged as destructive" / "LLM probe disagreement" for the two LLM failure modes.

Tool-call destructive check

For MCP tool_calls, is_destructive checks:

  1. Tool name against an "always destructive" set (configurable; v1 includes *__shell / *__write_file / *__edit_file / *__shell_bg patterns).
  2. Arguments serialized as JSON against the static shell patterns (in case a shell tool's command argument is destructive).
  3. LLM second-opinion on the JSON-serialized call.

6. HALT Protocol

When is_destructive returns true OR a non-auto_approve tool_call is attempted under Norris (auto_approve is the user's explicit consent that DOES apply):

─── NORRIS HALT ───────────────────────────────
  step 7/16
  reason: rm -rf
  action: rm -rf /var/log/old
[N] proceed / skip / abort? p

User types p (proceed) / s (skip) / a (abort).

  • proceed: run the action, append result to context, continue loop.
  • skip: append a synthesized turn explaining the user skipped this step (gives the model a chance to re-plan); continue loop.
  • abort: exit Norris mode; the conversation context is preserved. Drop back to the interactive prompt.

\C-x\C-c at any prompt also aborts.

Auto-approved tools (per cfg.mcp.auto_approve) skip the HALT entirely IF AND ONLY IF the destructive-op heuristic doesn't flag them. The heuristic is the final word — auto_approve is a confirmation bypass, not a destructive bypass.


7. Meta Commands (Phase 3 additions)

Command Action
:norris <goal> Launch Norris mode with an explicit goal text (same as \C-n after typing a goal but works on previously-issued goals too)
:norris off Exit Norris mode mid-loop (alternative to abort prompt)
:safety patterns Show the active destructive-op pattern list
:safety check <cmd> Probe is_destructive against a hypothetical command without running it (debug aid)

\C-n toggles Norris on/off in-place. If on, prompts for a goal if none pending; if off and a goal is in progress, asks for confirm-abort.


8. System Prompt Augmentation (active only in Norris)

Appended to the default Phase 2 system prompt while norris_active == true. The current goal is embedded in the suffix so eviction can't drop the anchor (R-C3):

[NORRIS MODE] You are operating autonomously toward the following goal:

    <ctx.norris_goal>

Plan and execute step by step using CMD: lines (for shell) or tool_calls
(when MCP tools are available). After each action, you will see its
result in the next turn. Re-plan based on what you observe.

When the goal is achieved, emit a single line:
    GOAL: complete
on its own line, optionally followed by a brief summary.

If the goal is unreachable or you need user input, emit:
    GOAL: blocked
with a one-line reason.

Avoid destructive operations unless the goal explicitly requires them.
The user will be prompted to confirm destructive actions; expect their
verdict in the next turn as "[aish] action skipped by user" or
"[aish] action approved".

This block is composed dynamically by context.to_messages() when ctx.norris_active is set. State stored:

  • ctx.norris_active = true|false
  • ctx.norris_goal = "<goal text>" (cleared on exit)

The user-emitted "[norris] " turn ALSO lives in the turn list as a regular user turn for the model's reading benefit. If the sliding window evicts it later, the system-prompt suffix still carries the goal — alignment with the eviction policy without special-case pinning.


9. Migration from Phase 2

User-visible:

  • \C-n now does something (was a Phase 1 placeholder) — inserts :norris at the cursor.
  • :norris <goal> is a new meta command.
  • Interactive mode is UNCHANGED (R-B3 resolution of Q24): the is_destructive heuristic runs ONLY when norris_active == true. Interactive CMD: extraction continues to honor confirm_cmd exactly as Phase 0 specified. No surprises for existing users.

Substrate (PHASE0.md §3) invariants: unchanged. The CMD: extraction marker is still the only shell-suggestion contract. confirm_cmd semantics are preserved as-defined in PHASE0 §10.

config.lua: configs without a safety block work unchanged — defaults kick in (LLM second-opinion enabled, default pattern list, default step budget).


10. Out of Scope (Phase 3)

Per PHASE0.md §11, these belong to later phases:

  • memory.jsonl summarization across sessions (Phase 4).
  • Multi-model routing / cloud fallback (Phase 5) — but Norris's LLM second-opinion uses the fast model regardless of active model.
  • Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 3 scope despite proximity:

  • Per-session destructive-pattern learning (user-corrects-LLM feedback loop). v2.
  • Parallel exploration / branching Norris sessions. v3+.
  • User-extendable pattern list via config. v2 — Phase 3 ships hardcoded.
  • Goal-decomposition for very long-running tasks (multi-day, persistent state). Out of aish's scope entirely; that's a different tool.

11. Open Questions

# Question Impact Resolve by
Q23 LLM second-opinion latency budget safety.lua Resolved at baseline — 425-1162ms per probe on the fast model (baseline §1); switched to deep at review (R-B2) at the cost of ~1-3s per probe, paid back by independent model class. Session cache mitigates repeated patterns.
Q24 is_destructive also runs on interactive CMD: extraction? safety.lua + repl.lua Resolved at review (R-B3) — NO. is_destructive runs ONLY when norris_active == true. Interactive CMD: extraction honors confirm_cmd exactly as Phase 0 specified. No substrate amendment.
Q25 GOAL: complete AND pending actions in same response? repl.lua norris driver Resolved at review (R-C2) — dispatch all pending actions FIRST (tool_calls then CMD:), THEN check for GOAL: complete. Algorithm in §4 reflects.
Q26 Context preservation when Norris ends with abort vs done vs budget_exhausted. Proposal: all three keep ctx intact (user sees the conversation in :history). The only difference is the renderer summary. repl.lua + renderer.lua Phase 3 (plan)
Q27 Resume mode after abort: should the user be able to type :norris continue to pick up where the model left off? v1 says no — too many edge cases with stale plans. v2 maybe. scope Phase 3 — defer to v2
Q28 tool_calls from MCP servers that have side effects but aren't in *__shell / *__write_file patterns (e.g. a custom hertz__wol_machine tool that wakes a server). The static set in §5 won't catch this; the LLM second-opinion might. Reasonable default given the LLM's role here. safety.lua Phase 3 (verify)
Q29 Norris response when is_destructive returns YES but the user-stated goal explicitly authorizes destruction (e.g. "clean up old logs in /var/log"). Currently the HALT still fires. Should the model be allowed to convey "user authorized this implicitly" in the goal? v1: no — explicit per-action confirm always. v2 could relax. UX + safety.lua Phase 3 (verify)
Q30 :norris without a goal arg vs \C-n: should they share a single "ask for goal" code path? Yes; trivial. repl.lua Phase 3 (plan)

Resolved at formulate (in §2 table):

  • Q2 (planner shape) — iterative re-plan after each action.
  • Q8 inheritance — auto_approve from Phase 2 applies under Norris IF destructive heuristic clears.

Carried forward (not in §13 originally):

  • Norris's interaction with Phase 4's memory.jsonl — captured tasks could pre-populate context. Phase 4 concern.

12. Implementation Plan (commit-by-commit)

Bottom-up, same cadence as Phase 0/1/2. Six commits expected:

  1. safety.is_destructive — static pattern list only. Implement the ~20-pattern matcher + the tool-call shell-arg extraction. No LLM second-opinion yet. Returns (bool, reason). Test: unit-table of ~30 commands (mix of destructive + safe) → assertEqual on each.

  2. safety.is_destructive — LLM second-opinion + cache. Add the fast-model probe path with a session-scoped cache keyed by the normalized command string (mitigates Q23 latency). Broker-failure falls back to YES. Test: mock broker; verify cache hits don't re-call; verify failure-fallback is YES.

  3. renderer.lua — Norris frames. Add norris_begin/step/halt/end per §3. Visual parity with exec/tool frames. Update prompt to include when active. Test: one-liner script renders each frame visually.

  4. safety.norris_step — single-iteration planner. The norris_step function per §4. Caller provides ctx + dispatch helpers; returns {status, reason}. No driver loop yet — that's the next commit. Test: mock broker emitting various model responses (text+actions, GOAL:complete, stalled, destructive action requiring HALT) and verify each return shape.

  5. repl.lua — Norris driver + \C-n real binding + :norris meta. The while-loop driver consuming safety.norris_step, the rebound \C-n (replacing Phase 1 placeholder), the :norris <goal> / :norris off meta cmds, and \C-x\C-c abort handler. Also extends the interactive CMD: confirm path to consult is_destructive first (per Q24 resolution). Test: mocked-broker end-to-end — submit a multi-step goal, verify driver loops correctly, hits GOAL:complete, returns to interactive.

  6. config.luasafety example block. Commented-out example showing llm_second_opinion, llm_model, destructive_patterns, max_norris_steps. Documentation only.

Risk / non-obvious

  • Catastrophic false-negative in is_destructive: the static list is patterned; a creative model could write bash -c "rm -rf /tmp" or r"m" -rf etc. Static is the floor, LLM second-opinion is the net. Both check.
  • LLM second-opinion model itself being autonomous in a Norris run would be circular. Mitigation: the second-opinion call uses broker.chat (no tools, no streaming, dedicated prompt) — distinct call path from the Norris planning stream. No tool-call recursion possible.
  • Norris loop runs the LLM N times: each step is a full broker round-trip plus optionally an LLM second-opinion. A 16-step Norris goal could be ~32 LLM calls on the fast model. Visible as latency but no economic surprise on local models.
  • Destructive check on interactive CMD: extraction (Q24) is a behavior change to Phase 0/1 (confirm_cmd users will see the prompt automatically for destructive commands even with confirm_cmd=false). Documented in §9. Defensible: the worst case is a confirm prompt the user dismisses.
  • GOAL: complete extraction uses the same ^GOAL: complete$ regex on emitted text. Substrate-aligned with CMD: extraction.

Open at plan; resolve at review

  • Whether to ship the LLM second-opinion on by default or off by default with a config opt-in. Default on is safer; off saves latency. Recommend on; Phase 7 verify will quantify the overhead.
  • Whether :safety check <cmd> should also be reachable by \C-x keybinding for fast probing during interactive sessions. v2.

End of Phase 3 Manifest — aish