safety: permission policy DSL — allow/confirm/deny rule lists (closes #9)

The confirm_cmd boolean was too coarse: true interrupts every harmless
ls; false ungates everything. Most workflows want trust for read-only
ops while still gating writes/network/sudo.

New config:

    permissions = {
        allow   = { "^ls%s", "^cat%s", "^git status" },
        confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" },
        deny    = { "^ssh%s+root@", "^curl%s+http[^s]" },
    }

Verdict order: deny > confirm > allow. First match in the chosen
category wins. Unmatched defaults to "confirm". Patterns are Lua
patterns (not regex) per PHASE0.md §3 — no compiled extensions.

Verdict behavior in the interactive CMD: loop:
  - allow   → run without prompt
  - deny    → status line, skip
  - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true)

Backward compat:
  - permissions unset + confirm_cmd=true  → always confirm
  - permissions unset + confirm_cmd=false → always allow
  - permissions set                        → policy table is authoritative

Scope deliberately limited to the interactive AI-suggested CMD: gate.
Norris autonomous mode keeps its own safety.is_destructive machinery
(combining the two would double-gate or replace the LLM probe — both
non-obvious behavioral changes that belong in their own issues).
User-typed shell-routed lines (`router.classify → "shell"`) and
:exec also bypass the policy by design — those are direct user intent.

New introspection:
  :perms list           — show the configured rule lists
  :perms check <cmd>    — report verdict + matching rule (debug)

safety.classify_command is exported and unit-tested with 12 cases
covering each category, priority order (deny > allow on overlap),
and both fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 21:20:56 +00:00
parent 518c01a9f5
commit 17e62c0326
3 changed files with 86 additions and 4 deletions
+44 -4
View File
@@ -125,6 +125,8 @@ Meta commands:
:plan on / :plan off set plan mode explicitly
:safety patterns list active destructive-op patterns
:safety check <cmd> probe is_destructive against <cmd> without running
:perms list show configured permission rules (allow/confirm/deny)
:perms check <cmd> report which permission verdict <cmd> would receive
:remember <text> shortcut: :memory add fact <text>
:memory list show active memory items (id, ts, kind, content)
:memory add <kind> <t> add a memory item (kind: fact | pref | context)
@@ -755,12 +757,19 @@ function M.run(config)
renderer.status(("PLAN: %s"):format(cmd))
ctx:append_exec_output(("[plan] would run: %s"):format(cmd))
else
local doit
if config.shell and config.shell.confirm_cmd then
-- Issue #9: permission policy DSL — verdict drives the gate.
-- Falls back to shell.confirm_cmd boolean when config.permissions
-- is unset (backward compat).
local verdict, rule = safety.classify_command(cmd, config)
local doit = false
if verdict == "allow" then
doit = true
elseif verdict == "deny" then
renderer.status(("denied by policy [%s]: %s")
:format(rule or "default", cmd))
else -- "confirm"
local ans = rl.readline(("execute '%s'? [y/N] "):format(cmd)) or ""
doit = (ans:lower():sub(1, 1) == "y")
else
doit = true
end
if doit then run_shell(cmd) end
end
@@ -1293,6 +1302,37 @@ function M.run(config)
renderer.status("usage: :safety {patterns|check}")
end
end,
perms = function(args)
local sub, sub_args = args:match("^%s*(%S*)%s*(.*)$")
if sub == "list" or sub == "" then
local p = config.permissions
if not p then
renderer.status(("(no permissions set; fallback: confirm_cmd=%s)")
:format(tostring(config.shell and config.shell.confirm_cmd or false)))
return
end
local function dump(label, rules)
if not rules or #rules == 0 then return end
io.write((" %s:\n"):format(label))
for i, r in ipairs(rules) do
io.write((" %2d. %s\n"):format(i, r))
end
end
renderer.status("permissions (deny > confirm > allow; default verdict: confirm):")
dump("deny", p.deny)
dump("confirm", p.confirm)
dump("allow", p.allow)
elseif sub == "check" then
local cmd = sub_args:match("^%s*(.-)%s*$")
if not cmd or cmd == "" then
renderer.status("usage: :perms check <cmd>"); return
end
local v, rule = safety.classify_command(cmd, config)
renderer.status(("verdict=%s rule=%s"):format(v, rule or "(default)"))
else
renderer.status("usage: :perms {list|check}")
end
end,
route = function(args)
local sub, sub_args = args:match("^%s*(%S*)%s*(.*)$")
config.routing = config.routing or {}