125f800513
Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded: N1 BLOCKER: `|%s*sh%f[%s]` missed `curl x | sh` (end-of-string canonical wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace, which doesn't happen at EOL). Replaced with two patterns each for sh and bash: `|%s*sh%s` (followed by whitespace/args) and `|%s*sh%s*$` (end-of-string). Same for bash. Verified against 18 wrapper-bypass test cases — all canonical idioms now HALT. N2 CONCERN: `ci=true` rule flag had no implementation note. Added one sentence to §5 explaining the matcher lowercases the input string when ci is set. N3 CONCERN: §12 commit #5 description was stale — still said "extends interactive CMD: extraction to consult is_destructive" which contradicts the R-B3 resolution (Norris-only). Rewrote commit #5 description to match R-B3, and bundled the ffi/readline.lua `_bound[seq]:free()` removal into commit #5's scope with explicit "Phase 1 amendment" callout. Same for the §12 risk note that still referenced the dropped behavior change. Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model swap interaction, N7 commit-attribution wording) are cosmetic and will fold in-flight during implement if material. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
580 lines
33 KiB
Markdown
580 lines
33 KiB
Markdown
# aish — Phase 3 Manifest
|
|
|
|
**Project:** aish — AI-augmented conversational shell
|
|
**Document:** Phase 3 Requirements, Architecture & Design Decisions
|
|
**Status:** Plan (review fold-in 2026-05-12 — security-layer BLOCKERs resolved)
|
|
**Date:** 2026-05-12
|
|
|
|
**Review fold-in (2026-05-12, security layer):**
|
|
|
|
R-B1. **Shell-wrapper bypass coverage.** Static patterns missed `bash -c`,
|
|
`sh -c`, `eval`, `xargs | rm`, `| sh`, `python -c`. Added to the
|
|
pattern list in §5 as a "wrapper requires manual review" class —
|
|
in Norris mode, any wrapper invocation HALTs regardless of the
|
|
inner command. The wrapper itself is the trigger.
|
|
|
|
R-B2. **LLM second-opinion model class.** Switched from `fast` to `deep`
|
|
for the destructive-detection probe. `fast` co-emits the action
|
|
AND judges it (circular). `deep` is a different model class
|
|
(qwen3-30b currently mapped to `deep` per config.lua) — adds
|
|
~1-3s per probe but breaks the self-policing loop. Added a
|
|
YES/inversion re-roll: if the deep model says NO, re-ask
|
|
"Is this safe?" — disagreement → HALT. Cheap insurance for
|
|
the edge cases. §5 reflects.
|
|
|
|
R-B3. **`is_destructive` scope narrowed to Norris mode.** The
|
|
formulate-time §9 said the heuristic would also gate interactive
|
|
`CMD:` extraction. That's a PHASE0 §6/§10 substrate amendment
|
|
that's bigger than Phase 3 should be making implicitly. Q24
|
|
resolved: `is_destructive` runs ONLY when `norris_active == true`.
|
|
Interactive `CMD:` extraction continues to honor `confirm_cmd`
|
|
exactly as Phase 0 specified — no behavior change.
|
|
|
|
**CONCERN folds (2026-05-12):**
|
|
|
|
R-C1. **Skip-budget added** — `consecutive_user_skips` counter; ≥2
|
|
triggers escalation HALT "model has proposed similar destructive
|
|
action 3+ times — abort, force-proceed, or change goal?". §4 +
|
|
§6 reflect.
|
|
|
|
R-C2. **§4 algorithm reorder** — dispatch all pending actions FIRST,
|
|
then check `GOAL: complete`. Q25 resolution + §4 algorithm now
|
|
consistent (was contradictory).
|
|
|
|
R-C3. **Norris goal pinned in system-prompt suffix** — `ctx.norris_goal`
|
|
field; the dynamic system suffix from §8 carries it. Eviction
|
|
can no longer drop the anchor.
|
|
|
|
R-C4. **Readline rebind safety** — `M.bind` will NOT free old callbacks
|
|
(pin for process lifetime). Avoids a use-after-free window between
|
|
`:free()` and the new `rl_bind_keyseq` call. Memory cost is
|
|
bounded (one closure per bound key, negligible).
|
|
|
|
R-C5. **`GOAL: complete` matcher** — line-level scan, exact match after
|
|
trim. Aligned with `CMD:` extraction rigor.
|
|
|
|
R-C6. **§4 step 4 algorithm tightened** — auto_approve only short-circuits
|
|
the user-prompt, NEVER the destructive-heuristic. Tool-call
|
|
without `auto_approve` entry AND no destructive flag → still
|
|
HALTs in Norris mode (Norris is conservative by design).
|
|
|
|
**Analyze findings (2026-05-12):**
|
|
|
|
A1. **`\C-n` mid-readline limitation.** Phase 1's `\C-n` handler fires
|
|
synchronously from inside the readline keystroke callback (via
|
|
`rl_bind_keyseq` → ffi-cast Lua closure). The current binding API
|
|
only exposes `rl_bind_keyseq` — no `rl_insert_text`,
|
|
`rl_replace_line`, or `rl_redisplay`. So a `\C-n` callback cannot
|
|
cleanly mutate the in-progress prompt buffer or end the
|
|
readline call early to "transition into Norris mode".
|
|
**Resolution**: bind `rl_insert_text` + `rl_redisplay` (single cdef
|
|
+ 2 wrapper lines in `ffi/readline.lua`) so the `\C-n` handler
|
|
inserts `:norris ` at the cursor and refreshes the display. User
|
|
then types the goal + Enter, routing through the existing meta
|
|
dispatch normally. `\C-n` becomes a typing shortcut, not a state
|
|
toggle.
|
|
|
|
A2. **`broker.chat` lacks `max_tokens`.** The LLM second-opinion path
|
|
in `safety.is_destructive` needs a tight YES/NO completion (2
|
|
tokens max). The proxy + small models honor `max_tokens`
|
|
correctly (verified vs hossenfelder: `max_tokens=4` returned a
|
|
clean "YES" in 2 completion tokens). Phase 2's broker doesn't
|
|
surface this option. **Resolution**: add `opts.max_tokens` to
|
|
`M.chat_stream`'s opts table (Phase 2 already widened opts);
|
|
`M.chat` passes through. Defaults nil → field omitted from the
|
|
request body — Phase 1/2 callers unaffected.
|
|
|
|
A3. **Tool-sub-loop is structurally reusable.** Phase 2's `ask_ai` sub-
|
|
loop (stream → collect text + tool_calls → dispatch → append → loop
|
|
until pure-text response or cap) IS the planner shape Phase 3 wants.
|
|
`safety.norris_step` per §4 is essentially this iteration extracted
|
|
behind a function call, plus the `GOAL: complete` sentinel check.
|
|
No structural refactor of Phase 2 needed — Norris is additive.
|
|
|
|
These findings tighten §3's module-changes table and §12's commit #1
|
|
scope (adds a small `ffi/readline.lua` extension to commit #5) — see
|
|
inline notes below where the change matters.
|
|
|
|
PHASE0.md is the locked substrate; PHASE1.md and PHASE2.md are layered
|
|
on top. This manifest specifies what Phase 3 adds — **Chuck Norris
|
|
autonomous mode**, the **destructive-op safety heuristic** that gates
|
|
it, and the **HALT/confirm protocol** for human-in-the-loop control.
|
|
Section numbers reference back to earlier phases where relevant.
|
|
|
|
---
|
|
|
|
## 1. Scope of Phase 3
|
|
|
|
Three pillars per PHASE0.md §11 row 3:
|
|
|
|
1. **Norris autonomous mode** (`safety.norris_step` + `repl.lua`
|
|
integration) — a planning-and-execution loop where the model
|
|
pursues a user-stated goal across multiple shell-exec and
|
|
tool-call turns without per-turn user prompting. Triggered by
|
|
`\C-n` (Phase 1 reserved key) or `:norris <goal>`. Iterative
|
|
re-plan after each action.
|
|
|
|
2. **Destructive-op heuristic** (`safety.is_destructive`) — hybrid
|
|
gate that combines (a) a static pattern allowlist of obviously
|
|
destructive shell idioms (`rm -rf`, `dd of=`, `mkfs`, `git push
|
|
--force`, etc.) with (b) an LLM second-opinion via the `fast`
|
|
model for ambiguous cases. Any positive hit forces HALT before
|
|
execution, regardless of Norris-mode policy.
|
|
|
|
3. **HALT/confirm protocol** — a uniform way for the Norris loop to
|
|
surface decisions to the user. HALT means: stop generation, drop
|
|
to a `[Norris] proceed / skip / abort?` prompt with the proposed
|
|
action displayed. User decides on each gate; abort returns control
|
|
to the interactive REPL with the conversation intact.
|
|
|
|
**Phase 3 is done when:**
|
|
|
|
- `\C-n` toggles Norris mode (replacing the Phase 1 status no-op).
|
|
- `:norris <goal>` launches an autonomous task explicitly.
|
|
- The model can plan + execute a multi-step task (e.g. "find all
|
|
Python files modified in the last week and count them") through
|
|
iterative CMD:/tool_call cycles without per-step user confirms
|
|
for safe operations.
|
|
- `rm -rf /tmp/foo`, `dd of=/dev/sda`, and equivalent destructive
|
|
operations HALT and require explicit user approval.
|
|
- The LLM second-opinion catches at least one realistic ambiguous
|
|
case the static patterns miss (e.g. `find . -delete`,
|
|
`truncate -s 0 important.log`).
|
|
- HALT-abort returns to interactive mode without context loss.
|
|
|
|
---
|
|
|
|
## 2. Technology Decisions (delta from Phase 2)
|
|
|
|
| Decision | Choice | Rationale |
|
|
|---|---|---|
|
|
| Planning model | **Iterative re-plan after each action** | Resolves PHASE0.md §13 Q2. Top-down task trees are brittle to dynamic environments — a shell command's output frequently changes what the next step should be. Iterative re-plan piggybacks the existing Phase 2 tool-sub-loop pattern: model emits next action, gets result, decides next. Depth-bounded by `max_norris_steps` (default 16, configurable). |
|
|
| Action sources | **`CMD:` lines + MCP `tool_calls`** | Per PHASE0.md §11 row 3 ("now able to use MCP tools as well as CMD: lines"). Norris consumes both kinds equally. The Phase 2 system prompt already biases toward tools when available; that bias carries into Norris mode unchanged. |
|
|
| HALT trigger | **Static-pattern hit OR LLM-second-opinion flag** | Either gate fires HALT independently. Static for speed and predictability on known footguns; LLM for novel/ambiguous patterns. Cost of an LLM second-opinion call: one fast-model round-trip (≤3s on local Q4). Only invoked when static doesn't already HALT. |
|
|
| HALT response shape | **3-way prompt**: `proceed` / `skip` / `abort` | `proceed` runs the action and continues. `skip` reports "user skipped" to the model and lets it re-plan. `abort` ends the Norris session, drops back to interactive mode. (`abort` is also bound to `\C-x\C-c` per PHASE1.md §7 reserved keys.) |
|
|
| Auto-approve under Norris | **Trust the Phase 2 `auto_approve` policy** | A tool already in `auto_approve` runs without HALT even in Norris mode, as long as the destructive-op heuristic doesn't flag it. The user opted in once; Norris doesn't unilaterally re-prompt. CMD: lines never auto-approve under Norris — they always pass through `is_destructive` first. |
|
|
| Destructive-op static rules | **Patterned shell-idiom list** in `safety.lua` (hardcoded; configurable later via `config.safety.destructive_patterns`) | Phase 3 v1 ships a fixed list (~20 patterns) inline. v2 may make it user-extendable. Patterns target the command string after expansion; conservative — false positives mean a confirm prompt the user dismisses, false negatives mean unsupervised destructive action. Bias to false positives. |
|
|
| LLM second-opinion model | **The `deep` preset** (independent model class, not the one emitting actions) | R-B2 resolution. Same model class self-policing is circular — `deep` (qwen3-30b currently) judges actions emitted by the active model (often `fast` qwen-1.5b under Norris). Adds ~1-3s per probe; broker failure → YES (safe default). Re-roll inversion: if first probe says NO, ask the inverted "Is this safe?" — disagreement → HALT. |
|
|
| Norris prompt suffix | **Status appended to the system prompt** when Norris is active: `[NORRIS MODE] You are operating autonomously toward a stated goal. Plan and execute step by step. Use CMD: lines or tool_calls. When done, emit "GOAL: complete" on its own line.` | The `GOAL: complete` sentinel is how the model signals task completion; Norris loop exits the planning sub-loop on seeing it. |
|
|
| Interrupt handling | **`\C-c` during a Norris step sends abort** | Standard SIGINT semantics for the user. Mid-stream, this means: stop the broker request, stop any running shell command, drop to interactive mode. The current context is preserved (incl. partial assistant turn). |
|
|
| Context budgeting under Norris | **Same `max_turns` and `token_budget` as interactive** | Sliding window evicts oldest non-system turns when budget exceeded — including mid-Norris-session if the loop runs long. Phase 4's `memory.jsonl` summarization is the proper fix; Phase 3 just gets the eviction status as before. |
|
|
|
|
---
|
|
|
|
## 3. Module Changes
|
|
|
|
| File | State after Phase 2 | Phase 3 changes |
|
|
|---|---|---|
|
|
| `safety.lua` | `confirm_tool_call` (Phase 2 surface only) + Phase 3 stubs `is_destructive` / `norris_step` raising error() | Implement the stubs: (a) `is_destructive(cmd_or_tool_call) -> (bool, reason)` with static pattern matching + optional LLM second-opinion (controlled by `cfg.safety.llm_second_opinion`, default true); (b) `norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts) -> {status, reason}` — single iteration of the Norris loop. Pattern list is module-local; LLM second-opinion uses `broker.chat` (non-streaming, no tools, single-shot). |
|
|
| `repl.lua` | tool-sub-loop + `:mcp` meta + Phase 1 `\C-n` no-op binding | Replace `\C-n` body with a Norris toggle. Add `:norris <goal>` meta cmd as the explicit-launch variant. New module-local `norris_active` flag. Implement the Norris driver loop: while active, call `safety.norris_step`; handle HALT decisions; exit on `GOAL: complete`, `abort`, or step budget exceeded. Auto_approve policy from `confirm_tool_call` is consulted in-line. |
|
|
| `renderer.lua` | exec frame + tool-call frame + assistant streaming | Add `M.norris_begin(goal)`, `M.norris_step(n, action_desc)`, `M.norris_halt(reason, action)`, `M.norris_end(status, reason)`. Visual: bold cyan banner on enter, indented step counter per iteration, red HALT banner on intercept, dim summary on exit. Phase 0 prompt becomes `[aish:fast ⚡]>` when Norris is active per PHASE0.md §9. |
|
|
| `broker.lua` | `chat_stream` with opts.tools, `chat` non-streaming | Re-used as-is for planning rounds (Norris just calls chat_stream like interactive). See row below for the small `max_tokens` opts extension needed by the LLM second-opinion path. |
|
|
| `context.lua` | system_prompt + turns + pending_exec_output + use_tool_role | When Norris is active, `to_messages()` appends the Norris suffix (§2 row "Norris prompt suffix") to the system message. The suffix is computed dynamically — when Norris exits, subsequent broker calls revert to plain system prompt. No additional storage. |
|
|
| `ffi/readline.lua` | `bind(seq, fn)` (Phase 1) — frees old callback before rebinding | **Small extension per A1 + R-C4 fix**: (a) add `rl_insert_text` + `rl_redisplay` to the `ffi.cdef` block and expose `M.insert_text(s)` / `M.redisplay()` wrappers — needed so `\C-n` can stuff `:norris ` into the buffer; (b) drop the `_bound[seq]:free()` call from `M.bind` — readline retains the function pointer in its keymap; freeing before re-bind opens a use-after-free window if the user presses the key in that gap. Pin all bound callbacks for process lifetime; memory cost is bounded (one closure per key, ~O(N) where N = number of bound keys ≤ ~10). |
|
|
| `broker.lua` | `chat_stream(cfg, msgs, on_delta, opts)` with opts.tools | **Small extension per A2**: `opts.max_tokens` (integer) is passed through to the request body as `max_tokens`. Omitted when nil. `M.chat` accepts the same opt. Needed so `safety.is_destructive`'s YES/NO probe terminates in ~2 tokens. |
|
|
| `config.lua` | mcp example block | New optional `safety = { llm_second_opinion = true, llm_model = "fast", destructive_patterns = {...} }` block, also commented-out example. Defaults are sane when absent. |
|
|
|
|
No new module files beyond what already exists. The `\C-x\C-c` abort keybinding (PHASE1.md §7 reserved) gets wired here.
|
|
|
|
---
|
|
|
|
## 4. The Planning Loop (`safety.norris_step`)
|
|
|
|
One iteration of Norris is exactly one round-trip with the model — same
|
|
shape as Phase 2's tool-sub-loop iteration, with the model deciding what
|
|
to do next based on accumulated context:
|
|
|
|
```
|
|
norris_step(ctx, broker_cfg, executor_fn, tools_fn, halt_fn, opts):
|
|
# opts.step_n, opts.max_steps, opts.cfg, opts.consecutive_skips
|
|
|
|
1. Call broker.chat_stream(broker_cfg, ctx:to_messages(), on_delta, {tools=tools_fn()})
|
|
— collect (text, tool_calls).
|
|
|
|
2. Extract actions from response:
|
|
- tool_calls (already collected by broker accumulator)
|
|
- cmd_lines via executor.extract_cmd_lines(text) — line-anchored
|
|
- goal_done line-level scan for exact "GOAL: complete" (R-C5)
|
|
|
|
3. If actions are empty AND goal_done is false:
|
|
→ return {status="stalled", reason="no action"}.
|
|
|
|
4. Dispatch ALL pending actions BEFORE checking goal_done (R-C2):
|
|
tool_calls first (structured route), CMD: lines second (legacy).
|
|
For each action:
|
|
a. Pass through safety.is_destructive(action).
|
|
- tool_calls: check tool-name set + serialized args.
|
|
- CMD: lines: pattern match + LLM probe.
|
|
b. If destructive: invoke halt_fn(action, reason, opts.cfg).
|
|
"proceed" → run action.
|
|
"skip" → opts.consecutive_skips += 1.
|
|
If consecutive_skips >= 3 (R-C1):
|
|
escalate halt with reason "repeated similar skips"
|
|
→ user verdict abort / force-proceed.
|
|
Append synthesized "[aish] action skipped by user: <reason>"
|
|
as a role:"tool" turn (for tool_calls) or as exec-output
|
|
prefix (for CMD: lines) — alternation invariant.
|
|
"abort" → return {status="aborted"}.
|
|
c. If non-destructive (cleared by static + LLM):
|
|
- tool_call: check auto_approve. If in policy, run silently;
|
|
otherwise (R-C6) halt_fn STILL fires for the consent prompt
|
|
(Norris is conservative; auto_approve is the *only* way to
|
|
skip consent in autonomous mode).
|
|
- CMD: line: run (destructive-check is the gate; confirm_cmd
|
|
is interactive-mode-only — R-B3 narrows scope).
|
|
d. On successful proceed: opts.consecutive_skips = 0.
|
|
e. Append result turn to ctx (role:"tool" for tool calls,
|
|
exec-output buffer for CMD: — same as Phase 0/2 paths).
|
|
|
|
5. After all actions dispatched: if goal_done → return {status="done"}.
|
|
|
|
6. step_n += 1. If step_n >= max_steps:
|
|
return {status="budget_exhausted"}.
|
|
|
|
7. Continue loop (driver in repl.lua re-calls norris_step).
|
|
```
|
|
|
|
The driver in repl.lua is the simple while loop; norris_step is one
|
|
iteration so testing is granular.
|
|
|
|
---
|
|
|
|
## 5. Destructive-Op Heuristic (`safety.is_destructive`)
|
|
|
|
### Static pattern list (v1, ~20 entries)
|
|
|
|
```lua
|
|
local DESTRUCTIVE_PATTERNS = {
|
|
-- ── Shell wrappers (R-B1) — flag the wrapper itself; can't inspect content
|
|
-- safely without parsing the inner shell. Norris HALTs on these
|
|
-- unconditionally; the user can proceed/abort with the full context.
|
|
{ pat = "^%s*bash%s+%-l?c%s", reason = "bash -c (wrapped shell)" },
|
|
{ pat = "^%s*sh%s+%-l?c%s", reason = "sh -c (wrapped shell)" },
|
|
{ pat = "^%s*zsh%s+%-l?c%s", reason = "zsh -c (wrapped shell)" },
|
|
{ pat = "^%s*eval%s", reason = "eval (dynamic shell)" },
|
|
{ pat = "^%s*python3?%s+%-c%s", reason = "python -c (inline script)" },
|
|
{ pat = "^%s*perl%s+%-e%s", reason = "perl -e (inline script)" },
|
|
{ pat = "|%s*sh%s", reason = "pipe-to-sh" },
|
|
{ pat = "|%s*sh%s*$", reason = "pipe-to-sh (eol)" },
|
|
{ pat = "|%s*bash%s", reason = "pipe-to-bash" },
|
|
{ pat = "|%s*bash%s*$", reason = "pipe-to-bash (eol)" },
|
|
{ pat = "xargs%s+.-rm", reason = "xargs ... rm" },
|
|
|
|
-- ── Filesystem destructive
|
|
{ pat = "rm%s+.-%-rf?", reason = "rm -rf" },
|
|
{ pat = "rm%s+.-%-fr?", reason = "rm -fr" },
|
|
{ pat = "find%s+.-%-delete", reason = "find -delete" },
|
|
{ pat = "find%s+.-%-exec%s+rm", reason = "find -exec rm" },
|
|
{ pat = ">%s*/dev/sd[a-z]", reason = "write to raw disk" },
|
|
{ pat = "dd%s+.-of=/dev/", reason = "dd to device" },
|
|
{ pat = "mkfs%.", reason = "mkfs (format)" },
|
|
{ pat = "shred%s", reason = "shred" },
|
|
{ pat = "wipefs%s", reason = "wipefs" },
|
|
{ pat = "truncate%s+.-%-s%s*0", reason = "truncate to zero" },
|
|
|
|
-- ── Version control destructive
|
|
{ pat = "git%s+push%s+.-%-%-force", reason = "git push --force" },
|
|
{ pat = "git%s+push%s+.-%-f%f[%s]", reason = "git push -f" },
|
|
{ pat = "git%s+reset%s+.-%-%-hard", reason = "git reset --hard" },
|
|
{ pat = "git%s+clean%s+.-%-fd?", reason = "git clean -fd" },
|
|
{ pat = "git%s+branch%s+.-%-D", reason = "git branch -D" },
|
|
|
|
-- ── Database / process
|
|
{ pat = "DROP%s+TABLE", reason = "DROP TABLE", ci = true },
|
|
{ pat = "DROP%s+DATABASE", reason = "DROP DATABASE", ci = true },
|
|
{ pat = "TRUNCATE%s+TABLE", reason = "TRUNCATE TABLE", ci = true },
|
|
{ pat = "kill%s+%-9", reason = "kill -9" },
|
|
{ pat = "pkill%s+%-9", reason = "pkill -9" },
|
|
|
|
-- ── Network/permission (chown tightened per NIT 2)
|
|
{ pat = "chmod%s+.-777", reason = "chmod 777" },
|
|
{ pat = "chown%s+.-%s+/%s*$", reason = "chown on root path" },
|
|
}
|
|
```
|
|
|
|
The 9 wrapper patterns are the conservative floor against R-B1 bypass classes. Norris emits `bash -c '...'` → wrapper hit → HALT (user can proceed if they read the inner). LLM second-opinion still runs as a backup but the static net catches the obvious cases first.
|
|
|
|
Patterns are Lua patterns (not regex). `ci = true` enables case-insensitive
|
|
match — the matcher loop lowercases the input string when `ci` is set on
|
|
the rule, so `DROP TABLE` and `drop table x` and `Drop Table` all match
|
|
the same rule. Without `ci`, patterns are case-sensitive (the default).
|
|
|
|
### LLM second-opinion (when static doesn't HALT)
|
|
|
|
If `cfg.safety.llm_second_opinion ~= false`, call `broker.chat`
|
|
(non-streaming, no tools, `opts.max_tokens=4`) on the **deep** model
|
|
preset (independent class from the action-emitting model — R-B2):
|
|
|
|
```
|
|
System: Detect destructive shell commands. Reply YES if running this would
|
|
delete, overwrite, or irreversibly modify data, processes, or system state.
|
|
Otherwise reply NO. Respond with only YES or NO.
|
|
|
|
User: <the command text or tool-call summary>
|
|
```
|
|
|
|
Response parse: case-insensitive match on `YES` → destructive.
|
|
|
|
**Re-roll on NO** (R-B2 cheap insurance): if the first probe returns NO,
|
|
run a second probe with inverted phrasing:
|
|
|
|
```
|
|
System: Reply YES or NO only. Is the following shell command SAFE to
|
|
run autonomously without user review?
|
|
|
|
User: <same command>
|
|
```
|
|
|
|
Re-roll says NO → command is dangerous → HALT. Disagreement between the
|
|
two probes (first NO, second NO) → HALT (treat agreement-by-default as
|
|
suspicious). Both probes agree YES is safe → clear.
|
|
|
|
Broker failure → YES (safe default).
|
|
|
|
Session-scoped cache keyed by the normalized command string mitigates
|
|
the latency cost (~1-3s per probe on deep model — see PHASE3-baseline §1).
|
|
Repeated patterns within a single session probe once.
|
|
|
|
Returns: `(is_destructive, reason)`. Reason is the matched pattern name
|
|
for static hits, "LLM flagged as destructive" / "LLM probe disagreement"
|
|
for the two LLM failure modes.
|
|
|
|
### Tool-call destructive check
|
|
|
|
For MCP tool_calls, `is_destructive` checks:
|
|
1. Tool name against an "always destructive" set (configurable; v1 includes
|
|
`*__shell` / `*__write_file` / `*__edit_file` / `*__shell_bg` patterns).
|
|
2. Arguments serialized as JSON against the static shell patterns (in case
|
|
a `shell` tool's command argument is destructive).
|
|
3. LLM second-opinion on the JSON-serialized call.
|
|
|
|
---
|
|
|
|
## 6. HALT Protocol
|
|
|
|
When `is_destructive` returns true OR a non-auto_approve tool_call is
|
|
attempted under Norris (auto_approve is the user's explicit consent
|
|
that DOES apply):
|
|
|
|
```
|
|
─── NORRIS HALT ───────────────────────────────
|
|
step 7/16
|
|
reason: rm -rf
|
|
action: rm -rf /var/log/old
|
|
[N] proceed / skip / abort? p
|
|
```
|
|
|
|
User types `p` (proceed) / `s` (skip) / `a` (abort).
|
|
|
|
- **proceed**: run the action, append result to context, continue loop.
|
|
- **skip**: append a synthesized turn explaining the user skipped this
|
|
step (gives the model a chance to re-plan); continue loop.
|
|
- **abort**: exit Norris mode; the conversation context is preserved.
|
|
Drop back to the interactive prompt.
|
|
|
|
`\C-x\C-c` at any prompt also aborts.
|
|
|
|
Auto-approved tools (per `cfg.mcp.auto_approve`) skip the HALT entirely
|
|
IF AND ONLY IF the destructive-op heuristic doesn't flag them. The
|
|
heuristic is the final word — auto_approve is a confirmation bypass,
|
|
not a destructive bypass.
|
|
|
|
---
|
|
|
|
## 7. Meta Commands (Phase 3 additions)
|
|
|
|
| Command | Action |
|
|
|---|---|
|
|
| `:norris <goal>` | Launch Norris mode with an explicit goal text (same as `\C-n` after typing a goal but works on previously-issued goals too) |
|
|
| `:norris off` | Exit Norris mode mid-loop (alternative to abort prompt) |
|
|
| `:safety patterns` | Show the active destructive-op pattern list |
|
|
| `:safety check <cmd>` | Probe `is_destructive` against a hypothetical command without running it (debug aid) |
|
|
|
|
`\C-n` toggles Norris on/off in-place. If on, prompts for a goal if none
|
|
pending; if off and a goal is in progress, asks for confirm-abort.
|
|
|
|
---
|
|
|
|
## 8. System Prompt Augmentation (active only in Norris)
|
|
|
|
Appended to the default Phase 2 system prompt while `norris_active == true`.
|
|
The current goal is embedded in the suffix so eviction can't drop the
|
|
anchor (R-C3):
|
|
|
|
```
|
|
[NORRIS MODE] You are operating autonomously toward the following goal:
|
|
|
|
<ctx.norris_goal>
|
|
|
|
Plan and execute step by step using CMD: lines (for shell) or tool_calls
|
|
(when MCP tools are available). After each action, you will see its
|
|
result in the next turn. Re-plan based on what you observe.
|
|
|
|
When the goal is achieved, emit a single line:
|
|
GOAL: complete
|
|
on its own line, optionally followed by a brief summary.
|
|
|
|
If the goal is unreachable or you need user input, emit:
|
|
GOAL: blocked
|
|
with a one-line reason.
|
|
|
|
Avoid destructive operations unless the goal explicitly requires them.
|
|
The user will be prompted to confirm destructive actions; expect their
|
|
verdict in the next turn as "[aish] action skipped by user" or
|
|
"[aish] action approved".
|
|
```
|
|
|
|
This block is composed dynamically by `context.to_messages()` when
|
|
`ctx.norris_active` is set. State stored:
|
|
- `ctx.norris_active = true|false`
|
|
- `ctx.norris_goal = "<goal text>"` (cleared on exit)
|
|
|
|
The user-emitted "[norris] <goal>" turn ALSO lives in the turn list as
|
|
a regular user turn for the model's reading benefit. If the sliding
|
|
window evicts it later, the system-prompt suffix still carries the
|
|
goal — alignment with the eviction policy without special-case pinning.
|
|
|
|
---
|
|
|
|
## 9. Migration from Phase 2
|
|
|
|
User-visible:
|
|
- `\C-n` now does something (was a Phase 1 placeholder) — inserts
|
|
`:norris ` at the cursor.
|
|
- `:norris <goal>` is a new meta command.
|
|
- **Interactive mode is UNCHANGED** (R-B3 resolution of Q24): the
|
|
`is_destructive` heuristic runs ONLY when `norris_active == true`.
|
|
Interactive `CMD:` extraction continues to honor `confirm_cmd`
|
|
exactly as Phase 0 specified. No surprises for existing users.
|
|
|
|
Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction
|
|
marker is still the only shell-suggestion contract. `confirm_cmd`
|
|
semantics are preserved as-defined in PHASE0 §10.
|
|
|
|
`config.lua`: configs without a `safety` block work unchanged — defaults
|
|
kick in (LLM second-opinion enabled, default pattern list, default step
|
|
budget).
|
|
|
|
---
|
|
|
|
## 10. Out of Scope (Phase 3)
|
|
|
|
Per PHASE0.md §11, these belong to later phases:
|
|
- `memory.jsonl` summarization across sessions (Phase 4).
|
|
- Multi-model routing / cloud fallback (Phase 5) — but Norris's
|
|
LLM second-opinion uses the `fast` model regardless of active model.
|
|
- Tree-sitter syntax highlighting (Phase 6).
|
|
|
|
Specifically out of Phase 3 scope despite proximity:
|
|
- Per-session destructive-pattern learning (user-corrects-LLM feedback
|
|
loop). v2.
|
|
- Parallel exploration / branching Norris sessions. v3+.
|
|
- User-extendable pattern list via config. v2 — Phase 3 ships hardcoded.
|
|
- Goal-decomposition for very long-running tasks (multi-day, persistent
|
|
state). Out of aish's scope entirely; that's a different tool.
|
|
|
|
---
|
|
|
|
## 11. Open Questions
|
|
|
|
| # | Question | Impact | Resolve by |
|
|
|---|---|---|---|
|
|
| Q23 | ~~LLM second-opinion latency budget~~ | safety.lua | **Resolved at baseline** — 425-1162ms per probe on the **fast** model (baseline §1); switched to **deep** at review (R-B2) at the cost of ~1-3s per probe, paid back by independent model class. Session cache mitigates repeated patterns. |
|
|
| Q24 | ~~`is_destructive` also runs on interactive `CMD:` extraction?~~ | safety.lua + repl.lua | **Resolved at review (R-B3)** — NO. `is_destructive` runs ONLY when `norris_active == true`. Interactive `CMD:` extraction honors `confirm_cmd` exactly as Phase 0 specified. No substrate amendment. |
|
|
| Q25 | ~~`GOAL: complete` AND pending actions in same response?~~ | repl.lua norris driver | **Resolved at review (R-C2)** — dispatch all pending actions FIRST (tool_calls then CMD:), THEN check for `GOAL: complete`. Algorithm in §4 reflects. |
|
|
| Q26 | Context preservation when Norris ends with `abort` vs `done` vs `budget_exhausted`. Proposal: all three keep ctx intact (user sees the conversation in `:history`). The only difference is the renderer summary. | repl.lua + renderer.lua | Phase 3 (plan) |
|
|
| Q27 | Resume mode after abort: should the user be able to type `:norris continue` to pick up where the model left off? v1 says no — too many edge cases with stale plans. v2 maybe. | scope | Phase 3 — defer to v2 |
|
|
| Q28 | `tool_calls` from MCP servers that have side effects but aren't in `*__shell` / `*__write_file` patterns (e.g. a custom `hertz__wol_machine` tool that wakes a server). The static set in §5 won't catch this; the LLM second-opinion might. Reasonable default given the LLM's role here. | safety.lua | Phase 3 (verify) |
|
|
| Q29 | Norris response when `is_destructive` returns YES but the user-stated goal explicitly authorizes destruction (e.g. "clean up old logs in /var/log"). Currently the HALT still fires. Should the model be allowed to convey "user authorized this implicitly" in the goal? v1: no — explicit per-action confirm always. v2 could relax. | UX + safety.lua | Phase 3 (verify) |
|
|
| Q30 | `:norris` without a goal arg vs `\C-n`: should they share a single "ask for goal" code path? Yes; trivial. | repl.lua | Phase 3 (plan) |
|
|
|
|
Resolved at formulate (in §2 table):
|
|
- Q2 (planner shape) — iterative re-plan after each action.
|
|
- Q8 inheritance — auto_approve from Phase 2 applies under Norris IF destructive heuristic clears.
|
|
|
|
Carried forward (not in §13 originally):
|
|
- Norris's interaction with Phase 4's memory.jsonl — captured tasks could pre-populate context. Phase 4 concern.
|
|
|
|
---
|
|
|
|
## 12. Implementation Plan (commit-by-commit)
|
|
|
|
Bottom-up, same cadence as Phase 0/1/2. Six commits expected:
|
|
|
|
1. **`safety.is_destructive` — static pattern list only.** Implement the
|
|
~20-pattern matcher + the tool-call shell-arg extraction. No LLM
|
|
second-opinion yet. Returns `(bool, reason)`. **Test**: unit-table of
|
|
~30 commands (mix of destructive + safe) → assertEqual on each.
|
|
|
|
2. **`safety.is_destructive` — LLM second-opinion + cache.** Add the
|
|
fast-model probe path with a session-scoped cache keyed by the
|
|
normalized command string (mitigates Q23 latency). Broker-failure
|
|
falls back to YES. **Test**: mock broker; verify cache hits don't
|
|
re-call; verify failure-fallback is YES.
|
|
|
|
3. **`renderer.lua` — Norris frames.** Add `norris_begin/step/halt/end`
|
|
per §3. Visual parity with exec/tool frames. Update prompt to
|
|
include `⚡` when active. **Test**: one-liner script renders each
|
|
frame visually.
|
|
|
|
4. **`safety.norris_step` — single-iteration planner.** The
|
|
`norris_step` function per §4. Caller provides ctx + dispatch
|
|
helpers; returns `{status, reason}`. No driver loop yet — that's
|
|
the next commit. **Test**: mock broker emitting various model
|
|
responses (text+actions, GOAL:complete, stalled, destructive
|
|
action requiring HALT) and verify each return shape.
|
|
|
|
5. **`repl.lua` — Norris driver + `\C-n` real binding + `:norris` meta.**
|
|
The while-loop driver consuming `safety.norris_step`, the rebound
|
|
`\C-n` (replacing Phase 1 placeholder), the `:norris <goal>` /
|
|
`:norris off` meta cmds, and `\C-x\C-c` abort handler. **Interactive
|
|
`CMD:` extraction is UNCHANGED** — `is_destructive` runs ONLY when
|
|
`norris_active == true` (R-B3 resolution of Q24); `confirm_cmd`
|
|
semantics from PHASE0 §10 are preserved exactly. Bundled with this
|
|
commit: `ffi/readline.lua` extension per §3 row — `rl_insert_text` +
|
|
`rl_redisplay` cdefs + `M.insert_text` / `M.redisplay` wrappers,
|
|
AND removal of the `_bound[seq]:free()` call from `M.bind` (R-C4 —
|
|
small Phase 1 amendment, called out here so the commit body cites
|
|
it). **Test**: mocked-broker end-to-end — submit a multi-step goal,
|
|
verify driver loops correctly, hits GOAL:complete, returns to
|
|
interactive.
|
|
|
|
6. **`config.lua` — `safety` example block.** Commented-out example
|
|
showing `llm_second_opinion`, `llm_model`, `destructive_patterns`,
|
|
`max_norris_steps`. Documentation only.
|
|
|
|
### Risk / non-obvious
|
|
|
|
- **Catastrophic false-negative in `is_destructive`**: the static list
|
|
is patterned; a creative model could write `bash -c "rm -rf /tmp"` or
|
|
`r"m" -rf` etc. Static is the floor, LLM second-opinion is the
|
|
net. Both check.
|
|
- **LLM second-opinion model itself being autonomous** in a Norris run
|
|
would be circular. Mitigation: the second-opinion call uses
|
|
`broker.chat` (no tools, no streaming, dedicated prompt) — distinct
|
|
call path from the Norris planning stream. No tool-call recursion
|
|
possible.
|
|
- **Norris loop runs the LLM N times**: each step is a full broker
|
|
round-trip plus optionally an LLM second-opinion. A 16-step Norris
|
|
goal could be ~32 LLM calls on the fast model. Visible as latency
|
|
but no economic surprise on local models.
|
|
- **Q24 resolution (R-B3)**: `is_destructive` runs ONLY in Norris
|
|
mode. Interactive `CMD:` extraction continues to honor `confirm_cmd`
|
|
exactly as Phase 0 specified. No substrate amendment; no surprises
|
|
for users of `confirm_cmd=false` setups.
|
|
- **`GOAL: complete` extraction** uses the same `^GOAL: complete$` regex
|
|
on emitted text. Substrate-aligned with CMD: extraction.
|
|
|
|
### Open at plan; resolve at review
|
|
|
|
- Whether to ship the LLM second-opinion **on by default** or
|
|
**off by default with a config opt-in**. Default on is safer; off
|
|
saves latency. Recommend on; Phase 7 verify will quantify the
|
|
overhead.
|
|
- Whether `:safety check <cmd>` should also be reachable by `\C-x`
|
|
keybinding for fast probing during interactive sessions. v2.
|
|
|
|
---
|
|
|
|
*End of Phase 3 Manifest — aish*
|