safety: confirm_tool_call gate with auto-approve policy

Phase 2 commit #2 per docs/PHASE2.md §12. Implements just the per-call
confirm-gate surface; Phase 3 stubs (is_destructive, norris_step) stay
unimplemented with their error() bodies.

M.confirm_tool_call(name, args, cfg) checks cfg.mcp.auto_approve for:
  - exact match on "<alias>.<tool>"
  - "<alias>.*" glob covering a whole server

Miss falls back to a [y/N] readline prompt. Empty or non-"y" answer
rejects (matches the existing confirm_cmd UX from PHASE0 §10).

Pretty-printing renders args as compact JSON, truncated at 80 chars
with "..." suffix so one-line prompts stay readable.

Smoke-test passes all eight cases per §12 verify-row #2:
  exact match / alias glob → auto-approve, no prompt
  miss + y / n / empty / nil-cfg → prompt shown, expected verdict
  empty args / long args → clean rendering, truncation works

Note: PHASE0 §4 module-layout had a "lands in Phase 2" hint on the
norris_step stub; the actual landing is Phase 3 per PHASE0 §11 row 3.
Comment in safety.lua updated to clarify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 13:07:57 +00:00
parent 6c194deea0
commit 0fde77fe35
+44 -9
View File
@@ -1,18 +1,53 @@
-- safety.lua — destructive op heuristic + Chuck Norris autonomous gate.
-- Phase 0: stub. Lands in Phase 2.
-- See docs/PHASE0.md §11 (Phase 2), §12 (security posture is workflow-not-OS).
-- safety.lua — workflow safeguards for tool execution.
-- Phase 2: M.confirm_tool_call only (per-call confirm gate, with config-driven
-- auto-approve policy). See docs/PHASE2.md §6.
-- Phase 3 (deferred): destructive-op heuristic + Norris autonomous gate.
local rl = require("ffi.readline")
local json = require("dkjson")
local M = {}
-- Returns true if cmd matches the destructive-op heuristic and should HALT
-- in Norris mode pending user confirmation.
function M.is_destructive(cmd)
error("safety.is_destructive: not implemented (Phase 2)")
-- Render the call as `name({"path":"/tmp"})` for the confirm prompt.
-- Truncate to keep one-line prompts.
local function pretty_call(name, args)
local body = ""
if args and next(args) then
local ok, encoded = pcall(json.encode, args)
if ok then
body = (#encoded <= 80) and encoded or (encoded:sub(1, 77) .. "...")
else
body = "..."
end
end
return name .. "(" .. body .. ")"
end
-- Ask the user whether tool `name` may be called with `args`, consulting
-- `cfg.mcp.auto_approve` first. Policy keys:
-- "<alias>.<tool>" → exact-match auto-approve
-- "<alias>.*" → whole-server auto-approve
-- Anything else falls back to a [y/N] prompt; empty / non-"y" answer rejects.
function M.confirm_tool_call(name, args, cfg)
local policy = (cfg and cfg.mcp and cfg.mcp.auto_approve) or {}
if policy[name] then return true end
local alias = name:match("^([^.]+)%.")
if alias and policy[alias .. ".*"] then return true end
local prompt = ("call '%s'? [y/N] "):format(pretty_call(name, args))
local ans = rl.readline(prompt) or ""
return ans:lower():sub(1, 1) == "y"
end
-- ---------------------------------------------------------------- Phase 3 stubs
-- Destructive-op heuristic for Norris autonomous mode. Not part of the
-- Phase 2 surface (see docs/PHASE2.md §10 / PHASE0.md §11 row 3).
function M.is_destructive(cmd)
error("safety.is_destructive: not implemented (Phase 3)")
end
-- Norris mode planning loop entry point.
function M.norris_step(plan, broker, executor)
error("safety.norris_step: not implemented (Phase 2)")
error("safety.norris_step: not implemented (Phase 3)")
end
return M