docs/PHASE3: review fold-in — security-layer BLOCKERs resolved

Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against
the analyze-tier draft. Resolutions applied:

BLOCKERs:
  B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c,
     eval, pipe-to-shell, python -c, xargs|rm. Added 9 wrapper
     patterns to §5. Norris HALTs on any wrapper invocation; user
     reads the inner before proceed. The patterns are the
     conservative floor against the wrapper bypass class.
  B2 LLM second-opinion was self-policing — same model class
     generating actions then judging them. Switched probe model
     from `fast` to `deep` (qwen3-30b). Added re-roll inversion:
     if first probe says NO, ask "is this SAFE?". Disagreement
     between two probes → HALT. Cheap independent-class insurance.
  B3 `is_destructive` would have run on interactive CMD: extraction
     — a PHASE0 §6/§10 substrate amendment in disguise. Resolved
     Q24: heuristic runs ONLY when norris_active == true. No
     substrate change; interactive `confirm_cmd` semantics unchanged.

CONCERNs:
  C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips
     escalate to abort/force-proceed prompt.
  C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to
     dispatch ALL pending actions before checking GOAL: complete.
  C3 Norris-goal eviction: goal embedded directly in the dynamic
     system-prompt suffix; survives sliding-window eviction.
  C4 Readline use-after-free window: M.bind no longer frees old
     callbacks; pin for process lifetime (bounded memory cost).
  C5 GOAL: complete matcher: line-level scan, exact match after
     trim — substrate-aligned with CMD: rigor.
  C6 §4 step 4 tightened: auto_approve does NOT bypass destructive
     heuristic; tool_call without auto_approve still HALTs even
     when destructive-clear (Norris conservative).

NITs deferred or rolled into pattern table:
  - chown root-path pattern tightened (NIT 2 in-line)
  - Test corpus expansion noted in §12 commit #1 risk
  - Other NITs are wording-level

Status: Plan (review folded). Ready for commit #1 (safety static
patterns) once another review pass clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 22:42:58 +00:00
parent cf4d79dd9d
commit 91ddcb005d
+168 -41
View File
@@ -2,9 +2,62 @@
**Project:** aish — AI-augmented conversational shell **Project:** aish — AI-augmented conversational shell
**Document:** Phase 3 Requirements, Architecture & Design Decisions **Document:** Phase 3 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; live-probed against current tree at `b58a842`) **Status:** Plan (review fold-in 2026-05-12 — security-layer BLOCKERs resolved)
**Date:** 2026-05-12 **Date:** 2026-05-12
**Review fold-in (2026-05-12, security layer):**
R-B1. **Shell-wrapper bypass coverage.** Static patterns missed `bash -c`,
`sh -c`, `eval`, `xargs | rm`, `| sh`, `python -c`. Added to the
pattern list in §5 as a "wrapper requires manual review" class —
in Norris mode, any wrapper invocation HALTs regardless of the
inner command. The wrapper itself is the trigger.
R-B2. **LLM second-opinion model class.** Switched from `fast` to `deep`
for the destructive-detection probe. `fast` co-emits the action
AND judges it (circular). `deep` is a different model class
(qwen3-30b currently mapped to `deep` per config.lua) — adds
~1-3s per probe but breaks the self-policing loop. Added a
YES/inversion re-roll: if the deep model says NO, re-ask
"Is this safe?" — disagreement → HALT. Cheap insurance for
the edge cases. §5 reflects.
R-B3. **`is_destructive` scope narrowed to Norris mode.** The
formulate-time §9 said the heuristic would also gate interactive
`CMD:` extraction. That's a PHASE0 §6/§10 substrate amendment
that's bigger than Phase 3 should be making implicitly. Q24
resolved: `is_destructive` runs ONLY when `norris_active == true`.
Interactive `CMD:` extraction continues to honor `confirm_cmd`
exactly as Phase 0 specified — no behavior change.
**CONCERN folds (2026-05-12):**
R-C1. **Skip-budget added**`consecutive_user_skips` counter; ≥2
triggers escalation HALT "model has proposed similar destructive
action 3+ times — abort, force-proceed, or change goal?". §4 +
§6 reflect.
R-C2. **§4 algorithm reorder** — dispatch all pending actions FIRST,
then check `GOAL: complete`. Q25 resolution + §4 algorithm now
consistent (was contradictory).
R-C3. **Norris goal pinned in system-prompt suffix**`ctx.norris_goal`
field; the dynamic system suffix from §8 carries it. Eviction
can no longer drop the anchor.
R-C4. **Readline rebind safety**`M.bind` will NOT free old callbacks
(pin for process lifetime). Avoids a use-after-free window between
`:free()` and the new `rl_bind_keyseq` call. Memory cost is
bounded (one closure per bound key, negligible).
R-C5. **`GOAL: complete` matcher** — line-level scan, exact match after
trim. Aligned with `CMD:` extraction rigor.
R-C6. **§4 step 4 algorithm tightened** — auto_approve only short-circuits
the user-prompt, NEVER the destructive-heuristic. Tool-call
without `auto_approve` entry AND no destructive flag → still
HALTs in Norris mode (Norris is conservative by design).
**Analyze findings (2026-05-12):** **Analyze findings (2026-05-12):**
A1. **`\C-n` mid-readline limitation.** Phase 1's `\C-n` handler fires A1. **`\C-n` mid-readline limitation.** Phase 1's `\C-n` handler fires
@@ -101,7 +154,7 @@ Three pillars per PHASE0.md §11 row 3:
| HALT response shape | **3-way prompt**: `proceed` / `skip` / `abort` | `proceed` runs the action and continues. `skip` reports "user skipped" to the model and lets it re-plan. `abort` ends the Norris session, drops back to interactive mode. (`abort` is also bound to `\C-x\C-c` per PHASE1.md §7 reserved keys.) | | HALT response shape | **3-way prompt**: `proceed` / `skip` / `abort` | `proceed` runs the action and continues. `skip` reports "user skipped" to the model and lets it re-plan. `abort` ends the Norris session, drops back to interactive mode. (`abort` is also bound to `\C-x\C-c` per PHASE1.md §7 reserved keys.) |
| Auto-approve under Norris | **Trust the Phase 2 `auto_approve` policy** | A tool already in `auto_approve` runs without HALT even in Norris mode, as long as the destructive-op heuristic doesn't flag it. The user opted in once; Norris doesn't unilaterally re-prompt. CMD: lines never auto-approve under Norris — they always pass through `is_destructive` first. | | Auto-approve under Norris | **Trust the Phase 2 `auto_approve` policy** | A tool already in `auto_approve` runs without HALT even in Norris mode, as long as the destructive-op heuristic doesn't flag it. The user opted in once; Norris doesn't unilaterally re-prompt. CMD: lines never auto-approve under Norris — they always pass through `is_destructive` first. |
| Destructive-op static rules | **Patterned shell-idiom list** in `safety.lua` (hardcoded; configurable later via `config.safety.destructive_patterns`) | Phase 3 v1 ships a fixed list (~20 patterns) inline. v2 may make it user-extendable. Patterns target the command string after expansion; conservative — false positives mean a confirm prompt the user dismisses, false negatives mean unsupervised destructive action. Bias to false positives. | | Destructive-op static rules | **Patterned shell-idiom list** in `safety.lua` (hardcoded; configurable later via `config.safety.destructive_patterns`) | Phase 3 v1 ships a fixed list (~20 patterns) inline. v2 may make it user-extendable. Patterns target the command string after expansion; conservative — false positives mean a confirm prompt the user dismisses, false negatives mean unsupervised destructive action. Bias to false positives. |
| LLM second-opinion model | **The `fast` preset** (whichever model maps to the user's small/cheap local) | Cheapest available; destructive-detection doesn't need a smart model. Prompt: "Is this shell command destructive (could delete or overwrite data)? Answer YES or NO." Single-token-ish response, no streaming. Falls back to YES (safe default) on broker failure. | | LLM second-opinion model | **The `deep` preset** (independent model class, not the one emitting actions) | R-B2 resolution. Same model class self-policing is circular — `deep` (qwen3-30b currently) judges actions emitted by the active model (often `fast` qwen-1.5b under Norris). Adds ~1-3s per probe; broker failure → YES (safe default). Re-roll inversion: if first probe says NO, ask the inverted "Is this safe?" — disagreement → HALT. |
| Norris prompt suffix | **Status appended to the system prompt** when Norris is active: `[NORRIS MODE] You are operating autonomously toward a stated goal. Plan and execute step by step. Use CMD: lines or tool_calls. When done, emit "GOAL: complete" on its own line.` | The `GOAL: complete` sentinel is how the model signals task completion; Norris loop exits the planning sub-loop on seeing it. | | Norris prompt suffix | **Status appended to the system prompt** when Norris is active: `[NORRIS MODE] You are operating autonomously toward a stated goal. Plan and execute step by step. Use CMD: lines or tool_calls. When done, emit "GOAL: complete" on its own line.` | The `GOAL: complete` sentinel is how the model signals task completion; Norris loop exits the planning sub-loop on seeing it. |
| Interrupt handling | **`\C-c` during a Norris step sends abort** | Standard SIGINT semantics for the user. Mid-stream, this means: stop the broker request, stop any running shell command, drop to interactive mode. The current context is preserved (incl. partial assistant turn). | | Interrupt handling | **`\C-c` during a Norris step sends abort** | Standard SIGINT semantics for the user. Mid-stream, this means: stop the broker request, stop any running shell command, drop to interactive mode. The current context is preserved (incl. partial assistant turn). |
| Context budgeting under Norris | **Same `max_turns` and `token_budget` as interactive** | Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long. Phase 4's `memory.jsonl` summarization is the proper fix; Phase 3 just gets the eviction status as before. | | Context budgeting under Norris | **Same `max_turns` and `token_budget` as interactive** | Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long. Phase 4's `memory.jsonl` summarization is the proper fix; Phase 3 just gets the eviction status as before. |
@@ -117,7 +170,7 @@ Three pillars per PHASE0.md §11 row 3:
| `renderer.lua` | exec frame + tool-call frame + assistant streaming | Add `M.norris_begin(goal)`, `M.norris_step(n, action_desc)`, `M.norris_halt(reason, action)`, `M.norris_end(status, reason)`. Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes `[aish:fast ⚡]>` when Norris is active per PHASE0.md §9. | | `renderer.lua` | exec frame + tool-call frame + assistant streaming | Add `M.norris_begin(goal)`, `M.norris_step(n, action_desc)`, `M.norris_halt(reason, action)`, `M.norris_end(status, reason)`. Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes `[aish:fast ⚡]>` when Norris is active per PHASE0.md §9. |
| `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | Re-used as-is for planning rounds (Norris just calls chat_stream like interactive). See row below for the small `max_tokens` opts extension needed by the LLM second-opinion path. | | `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | Re-used as-is for planning rounds (Norris just calls chat_stream like interactive). See row below for the small `max_tokens` opts extension needed by the LLM second-opinion path. |
| `context.lua` | system_prompt + turns + pending_exec_output + use_tool_role | When Norris is active, `to_messages()` appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage. | | `context.lua` | system_prompt + turns + pending_exec_output + use_tool_role | When Norris is active, `to_messages()` appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage. |
| `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) | **Small extension per A1**: add `rl_insert_text` + `rl_redisplay` to the `ffi.cdef` block and expose `M.insert_text(s)` / `M.redisplay()` wrappers. Needed so the `\C-n` handler can stuff `:norris ` into the in-progress buffer cleanly rather than just printing a status that disappears. | | `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) — frees old callback before rebinding | **Small extension per A1 + R-C4 fix**: (a) add `rl_insert_text` + `rl_redisplay` to the `ffi.cdef` block and expose `M.insert_text(s)` / `M.redisplay()` wrappers — needed so `\C-n` can stuff `:norris ` into the buffer; (b) drop the `_bound[seq]:free()` call from `M.bind` — readline retains the function pointer in its keymap; freeing before re-bind opens a use-after-free window if the user presses the key in that gap. Pin all bound callbacks for process lifetime; memory cost is bounded (one closure per key, ~O(N) where N = number of bound keys ≤ ~10). |
| `broker.lua` | `chat_stream(cfg, msgs, on_delta, opts)` with opts.tools | **Small extension per A2**: `opts.max_tokens` (integer) is passed through to the request body as `max_tokens`. Omitted when nil. `M.chat` accepts the same opt. Needed so `safety.is_destructive`'s YES/NO probe terminates in ~2 tokens. | | `broker.lua` | `chat_stream(cfg, msgs, on_delta, opts)` with opts.tools | **Small extension per A2**: `opts.max_tokens` (integer) is passed through to the request body as `max_tokens`. Omitted when nil. `M.chat` accepts the same opt. Needed so `safety.is_destructive`'s YES/NO probe terminates in ~2 tokens. |
| `config.lua` | mcp example block | New optional `safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} }` block, also commented-out example. Defaults are sane when absent. | | `config.lua` | mcp example block | New optional `safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} }` block, also commented-out example. Defaults are sane when absent. |
@@ -133,27 +186,52 @@ to do next based on accumulated context:
``` ```
norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts): norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts):
# opts.step_n, opts.max_steps, opts.cfg # opts.step_n, opts.max_steps, opts.cfg, opts.consecutive_skips
1. Call broker.chat_stream(broker_cfg, ctx:to_messages(), on_delta, {tools=tools_fn()}) 1. Call broker.chat_stream(broker_cfg, ctx:to_messages(), on_delta, {tools=tools_fn()})
— collect (text, tool_calls). — collect (text, tool_calls).
2. If text contains "GOAL: complete" line → return {status="done"}.
3. If no actions emitted (no tool_calls, no CMD: in text): 2. Extract actions from response:
→ return {status="stalled", reason="no action"} (user-visible). - tool_calls (already collected by broker accumulator)
4. For each action (tool_call OR CMD: line): - cmd_lines via executor.extract_cmd_lines(text) — line-anchored
- goal_done line-level scan for exact "GOAL: complete" (R-C5)
3. If actions are empty AND goal_done is false:
→ return {status="stalled", reason="no action"}.
4. Dispatch ALL pending actions BEFORE checking goal_done (R-C2):
tool_calls first (structured route), CMD: lines second (legacy).
For each action:
a. Pass through safety.is_destructive(action). a. Pass through safety.is_destructive(action).
b. If destructive: invoke halt_fn(action, reason) → user verdict. - tool_calls: check tool-name set + serialized args.
- CMD: lines: pattern match + LLM probe.
b. If destructive: invoke halt_fn(action, reason, opts.cfg).
"proceed" → run action. "proceed" → run action.
"skip" append a synthesized turn telling the model "skip" → opts.consecutive_skips += 1.
"[aish] action skipped by user: <reason>". If consecutive_skips >= 3 (R-C1):
escalate halt with reason "repeated similar skips"
→ user verdict abort / force-proceed.
Append synthesized "[aish] action skipped by user: <reason>"
as a role:"tool" turn (for tool_calls) or as exec-output
prefix (for CMD: lines) — alternation invariant.
"abort" → return {status="aborted"}. "abort" → return {status="aborted"}.
c. If non-destructive: check auto_approve (for tool_calls only) c. If non-destructive (cleared by static + LLM):
or destructive_check passed (for CMD:). Run. - tool_call: check auto_approve. If in policy, run silently;
d. Append result turn to ctx (role:"tool" for tool calls, otherwise (R-C6) halt_fn STILL fires for the consent prompt
exec-output buffer for CMD:). (Norris is conservative; auto_approve is the *only* way to
5. step_n += 1. If step_n >= max_steps: skip consent in autonomous mode).
- CMD: line: run (destructive-check is the gate; confirm_cmd
is interactive-mode-only — R-B3 narrows scope).
d. On successful proceed: opts.consecutive_skips = 0.
e. Append result turn to ctx (role:"tool" for tool calls,
exec-output buffer for CMD: — same as Phase 0/2 paths).
5. After all actions dispatched: if goal_done → return {status="done"}.
6. step_n += 1. If step_n >= max_steps:
return {status="budget_exhausted"}. return {status="budget_exhausted"}.
6. Continue loop (driver in repl.lua re-calls norris_step).
7. Continue loop (driver in repl.lua re-calls norris_step).
``` ```
The driver in repl.lua is the simple while loop; norris_step is one The driver in repl.lua is the simple while loop; norris_step is one
@@ -167,7 +245,20 @@ iteration so testing is granular.
```lua ```lua
local DESTRUCTIVE_PATTERNS = { local DESTRUCTIVE_PATTERNS = {
-- Filesystem -- ── Shell wrappers (R-B1) — flag the wrapper itself; can't inspect content
-- safely without parsing the inner shell. Norris HALTs on these
-- unconditionally; the user can proceed/abort with the full context.
{ pat = "^%s*bash%s+%-l?c%s", reason = "bash -c (wrapped shell)" },
{ pat = "^%s*sh%s+%-l?c%s", reason = "sh -c (wrapped shell)" },
{ pat = "^%s*zsh%s+%-l?c%s", reason = "zsh -c (wrapped shell)" },
{ pat = "^%s*eval%s", reason = "eval (dynamic shell)" },
{ pat = "^%s*python3?%s+%-c%s", reason = "python -c (inline script)" },
{ pat = "^%s*perl%s+%-e%s", reason = "perl -e (inline script)" },
{ pat = "|%s*sh%f[%s]", reason = "pipe-to-sh" },
{ pat = "|%s*bash%f[%s]", reason = "pipe-to-bash" },
{ pat = "xargs%s+.-rm", reason = "xargs ... rm" },
-- ── Filesystem destructive
{ pat = "rm%s+.-%-rf?", reason = "rm -rf" }, { pat = "rm%s+.-%-rf?", reason = "rm -rf" },
{ pat = "rm%s+.-%-fr?", reason = "rm -fr" }, { pat = "rm%s+.-%-fr?", reason = "rm -fr" },
{ pat = "find%s+.-%-delete", reason = "find -delete" }, { pat = "find%s+.-%-delete", reason = "find -delete" },
@@ -179,32 +270,35 @@ local DESTRUCTIVE_PATTERNS = {
{ pat = "wipefs%s", reason = "wipefs" }, { pat = "wipefs%s", reason = "wipefs" },
{ pat = "truncate%s+.-%-s%s*0", reason = "truncate to zero" }, { pat = "truncate%s+.-%-s%s*0", reason = "truncate to zero" },
-- Version control destructive -- ── Version control destructive
{ pat = "git%s+push%s+.-%-%-force", reason = "git push --force" }, { pat = "git%s+push%s+.-%-%-force", reason = "git push --force" },
{ pat = "git%s+push%s+.-%-f%f[%s]", reason = "git push -f" }, { pat = "git%s+push%s+.-%-f%f[%s]", reason = "git push -f" },
{ pat = "git%s+reset%s+.-%-%-hard", reason = "git reset --hard" }, { pat = "git%s+reset%s+.-%-%-hard", reason = "git reset --hard" },
{ pat = "git%s+clean%s+.-%-fd?", reason = "git clean -fd" }, { pat = "git%s+clean%s+.-%-fd?", reason = "git clean -fd" },
{ pat = "git%s+branch%s+.-%-D", reason = "git branch -D" }, { pat = "git%s+branch%s+.-%-D", reason = "git branch -D" },
-- Database / process -- ── Database / process
{ pat = "DROP%s+TABLE", reason = "DROP TABLE", ci = true }, { pat = "DROP%s+TABLE", reason = "DROP TABLE", ci = true },
{ pat = "DROP%s+DATABASE", reason = "DROP DATABASE", ci = true }, { pat = "DROP%s+DATABASE", reason = "DROP DATABASE", ci = true },
{ pat = "TRUNCATE%s+TABLE", reason = "TRUNCATE TABLE", ci = true }, { pat = "TRUNCATE%s+TABLE", reason = "TRUNCATE TABLE", ci = true },
{ pat = "kill%s+%-9", reason = "kill -9" }, { pat = "kill%s+%-9", reason = "kill -9" },
{ pat = "pkill%s+%-9", reason = "pkill -9" }, { pat = "pkill%s+%-9", reason = "pkill -9" },
-- Network/permission -- ── Network/permission (chown tightened per NIT 2)
{ pat = "chmod%s+.-777", reason = "chmod 777" }, { pat = "chmod%s+.-777", reason = "chmod 777" },
{ pat = "chown%s+.-/$", reason = "chown on root path" }, { pat = "chown%s+.-%s+/%s*$", reason = "chown on root path" },
} }
``` ```
The 9 wrapper patterns are the conservative floor against R-B1 bypass classes. Norris emits `bash -c '...'` → wrapper hit → HALT (user can proceed if they read the inner). LLM second-opinion still runs as a backup but the static net catches the obvious cases first.
Patterns are Lua patterns (not regex), `ci = true` enables case-insensitive match. Patterns are Lua patterns (not regex), `ci = true` enables case-insensitive match.
### LLM second-opinion (when static doesn't HALT) ### LLM second-opinion (when static doesn't HALT)
If `cfg.safety.llm_second_opinion ~= false`, call `broker.chat` (non-streaming, If `cfg.safety.llm_second_opinion ~= false`, call `broker.chat`
no tools) on the `fast` model with: (non-streaming, no tools, `opts.max_tokens=4`) on the **deep** model
preset (independent class from the action-emitting model — R-B2):
``` ```
System: Detect destructive shell commands. Reply YES if running this would System: Detect destructive shell commands. Reply YES if running this would
@@ -214,11 +308,31 @@ Otherwise reply NO. Respond with only YES or NO.
User: <the command text or tool-call summary> User: <the command text or tool-call summary>
``` ```
Response parse: case-insensitive match on `YES` → destructive; everything Response parse: case-insensitive match on `YES` → destructive.
else → not. Broker failure → YES (safe default).
**Re-roll on NO** (R-B2 cheap insurance): if the first probe returns NO,
run a second probe with inverted phrasing:
```
System: Reply YES or NO only. Is the following shell command SAFE to
run autonomously without user review?
User: <same command>
```
Re-roll says NO → command is dangerous → HALT. Disagreement between the
two probes (first NO, second NO) → HALT (treat agreement-by-default as
suspicious). Both probes agree YES is safe → clear.
Broker failure → YES (safe default).
Session-scoped cache keyed by the normalized command string mitigates
the latency cost (~1-3s per probe on deep model — see PHASE3-baseline §1).
Repeated patterns within a single session probe once.
Returns: `(is_destructive, reason)`. Reason is the matched pattern name Returns: `(is_destructive, reason)`. Reason is the matched pattern name
for static hits, "LLM flagged as destructive" for LLM hits. for static hits, "LLM flagged as destructive" / "LLM probe disagreement"
for the two LLM failure modes.
### Tool-call destructive check ### Tool-call destructive check
@@ -278,11 +392,16 @@ pending; if off and a goal is in progress, asks for confirm-abort.
## 8. System Prompt Augmentation (active only in Norris) ## 8. System Prompt Augmentation (active only in Norris)
Appended to the default Phase 2 system prompt while `norris_active == true`: Appended to the default Phase 2 system prompt while `norris_active == true`.
The current goal is embedded in the suffix so eviction can't drop the
anchor (R-C3):
``` ```
[NORRIS MODE] You are operating autonomously toward a stated goal. Plan [NORRIS MODE] You are operating autonomously toward the following goal:
and execute step by step using CMD: lines (for shell) or tool_calls
<ctx.norris_goal>
Plan and execute step by step using CMD: lines (for shell) or tool_calls
(when MCP tools are available). After each action, you will see its (when MCP tools are available). After each action, you will see its
result in the next turn. Re-plan based on what you observe. result in the next turn. Re-plan based on what you observe.
@@ -301,23 +420,31 @@ verdict in the next turn as "[aish] action skipped by user" or
``` ```
This block is composed dynamically by `context.to_messages()` when This block is composed dynamically by `context.to_messages()` when
`ctx.norris_active` is set. No state stored beyond the boolean. `ctx.norris_active` is set. State stored:
- `ctx.norris_active = true|false`
- `ctx.norris_goal = "<goal text>"` (cleared on exit)
The user-emitted "[norris] <goal>" turn ALSO lives in the turn list as
a regular user turn for the model's reading benefit. If the sliding
window evicts it later, the system-prompt suffix still carries the
goal — alignment with the eviction policy without special-case pinning.
--- ---
## 9. Migration from Phase 2 ## 9. Migration from Phase 2
User-visible: User-visible:
- `\C-n` now does something (was a Phase 1 placeholder). - `\C-n` now does something (was a Phase 1 placeholder) — inserts
`:norris ` at the cursor.
- `:norris <goal>` is a new meta command. - `:norris <goal>` is a new meta command.
- Destructive-looking commands suddenly stop and ask for confirmation - **Interactive mode is UNCHANGED** (R-B3 resolution of Q24): the
even outside Norris mode (the `is_destructive` check is also applied `is_destructive` heuristic runs ONLY when `norris_active == true`.
to interactive CMD: extraction, replacing the current bare Interactive `CMD:` extraction continues to honor `confirm_cmd`
`confirm_cmd` for known-destructive cases). This is a behavior change exactly as Phase 0 specified. No surprises for existing users.
to interactive mode.
Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction
marker is still the only shell-suggestion contract. marker is still the only shell-suggestion contract. `confirm_cmd`
semantics are preserved as-defined in PHASE0 §10.
`config.lua`: configs without a `safety` block work unchanged — defaults `config.lua`: configs without a `safety` block work unchanged — defaults
kick in (LLM second-opinion enabled, default pattern list, default step kick in (LLM second-opinion enabled, default pattern list, default step
@@ -347,9 +474,9 @@ Specifically out of Phase 3 scope despite proximity:
| # | Question | Impact | Resolve by | | # | Question | Impact | Resolve by |
|---|---|---|---| |---|---|---|---|
| Q23 | LLM second-opinion latency budget: 3s per check on the fast model means a 16-step Norris session adds ~48s of overhead. Acceptable for autonomous mode? Or cache by command-hash within a session? | safety.lua | Phase 3 (analyze) | | Q23 | ~~LLM second-opinion latency budget~~ | safety.lua | **Resolved at baseline** — 425-1162ms per probe on the **fast** model (baseline §1); switched to **deep** at review (R-B2) at the cost of ~1-3s per probe, paid back by independent model class. Session cache mitigates repeated patterns. |
| Q24 | `is_destructive` also runs on **interactive** `CMD:` extraction (per §9)? Or only under Norris? §9 says yes; the manifest implicitly broadens the destructive gate. The alternative is to keep `confirm_cmd` as the interactive surface and Norris uses its own stricter check. Mixing both is the proposed default but worth challenging. | safety.lua + repl.lua | Phase 3 (analyze) | | Q24 | ~~`is_destructive` also runs on interactive `CMD:` extraction?~~ | safety.lua + repl.lua | **Resolved at review (R-B3)** — NO. `is_destructive` runs ONLY when `norris_active == true`. Interactive `CMD:` extraction honors `confirm_cmd` exactly as Phase 0 specified. No substrate amendment. |
| Q25 | If the model emits BOTH text AND a `GOAL: complete` line in the same response, is the goal done immediately, or are any pending actions in that response still dispatched first? Default proposal: dispatch pending actions first; the GOAL: marker fires after the loop's next round-trip would have been called (so the model effectively pre-announces). Less surprising. | repl.lua norris driver | Phase 3 (analyze) | | Q25 | ~~`GOAL: complete` AND pending actions in same response?~~ | repl.lua norris driver | **Resolved at review (R-C2)** — dispatch all pending actions FIRST (tool_calls then CMD:), THEN check for `GOAL: complete`. Algorithm in §4 reflects. |
| Q26 | Context preservation when Norris ends with `abort` vs `done` vs `budget_exhausted`. Proposal: all three keep ctx intact (user sees the conversation in `:history`). The only difference is the renderer summary. | repl.lua + renderer.lua | Phase 3 (plan) | | Q26 | Context preservation when Norris ends with `abort` vs `done` vs `budget_exhausted`. Proposal: all three keep ctx intact (user sees the conversation in `:history`). The only difference is the renderer summary. | repl.lua + renderer.lua | Phase 3 (plan) |
| Q27 | Resume mode after abort: should the user be able to type `:norris continue` to pick up where the model left off? v1 says no — too many edge cases with stale plans. v2 maybe. | scope | Phase 3 — defer to v2 | | Q27 | Resume mode after abort: should the user be able to type `:norris continue` to pick up where the model left off? v1 says no — too many edge cases with stale plans. v2 maybe. | scope | Phase 3 — defer to v2 |
| Q28 | `tool_calls` from MCP servers that have side effects but aren't in `*__shell` / `*__write_file` patterns (e.g. a custom `hertz__wol_machine` tool that wakes a server). The static set in §5 won't catch this; the LLM second-opinion might. Reasonable default given the LLM's role here. | safety.lua | Phase 3 (verify) | | Q28 | `tool_calls` from MCP servers that have side effects but aren't in `*__shell` / `*__write_file` patterns (e.g. a custom `hertz__wol_machine` tool that wakes a server). The static set in §5 won't catch this; the LLM second-opinion might. Reasonable default given the LLM's role here. | safety.lua | Phase 3 (verify) |