safety: norris_step planner (Phase 3 commit #4)

Phase 3 commit #4 per docs/PHASE3.md §12. Single-iteration planner. The driver loop in repl.lua (commit #5) calls this in a while loop, advancing step_n on every "continue" return. M.norris_step(ctx, model_cfg, helpers, opts): 1. One broker.chat_stream round-trip — text + tool_calls collected, text streamed via helpers.render_assistant_delta. 2. Parse actions from response: tool_calls (already collected), CMD: lines (via helpers.extract_cmd_lines), GOAL: complete sentinel (line-level exact match per R-C5). 3. Record the assistant turn (with tool_calls if any) and log it. If no actions AND no goal_done → status="stalled". 4. Dispatch tool_calls (structured route first): - is_destructive check on serialized call. - If destructive → halt_fn(proceed/skip/abort). - Else → auto_approve lookup; absent → halt for consent (R-C6: Norris is conservative; auto_approve is the only consent bypass). - On skip: synthesize role:tool turn "[aish] tool call skipped by user" — alternation preserved per C5/C7. - On abort: return status="aborted". - On proceed: dispatch via helpers.dispatch_tool, append role:tool turn with result content. - Argument JSON parse failure also synthesizes a tool turn (same alternation rationale). 5. Dispatch CMD: lines (legacy route): - is_destructive check. - Destructive → halt_fn. - Non-destructive → run directly (Norris user accepted autonomy for non-destructive shell). - skip → ctx:append_exec_output "[aish] CMD skipped by user". - proceed → exec via helpers.exec_cmd, frame via render_exec_begin/end. 6. Skip-budget escalation (R-C1): after dispatch, if ctx.norris_consecutive_skips >= 3 → escalation halt; abort exits, proceed resets counter. 7. Goal-done check AFTER all dispatch (R-C2 / Q25 resolution). 8. Budget check: step_n >= max_steps → status="budget_exhausted". 9. Otherwise → status="continue", driver advances. Helpers are passed in as injected functions rather than directly requiring repl/renderer/executor — keeps safety.lua's coupling clean and norris_step testable with a mocked helpers table. State carried across iterations on the ctx: - ctx.norris_consecutive_skips (resets on any successful proceed) - ctx.norris_goal / ctx.norris_active (set/cleared by the driver) Existing test_safety.lua corpus (87 cases) still passes — norris_step addition doesn't touch is_destructive's behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:37:53 +00:00
parent d2a53d2fc7
commit 11b1f566b3
1 changed files with 199 additions and 3 deletions
@@ -243,9 +243,205 @@ M._match_static   = match_static       -- testable in isolation
 M._reset_cache    = function() _llm_cache = {} end

 -- ---------------------------------------------------------------- norris_step
-- Phase 3 commit #4 lands the planner. Stub stays for now.
-function M.norris_step(plan, broker, executor)
-    error("safety.norris_step: not implemented yet (lands in Phase 3 commit #4)")
+-- One iteration of the Norris planning loop per PHASE3.md §4.
+-- The driver in repl.lua calls this in a while loop, advancing on every
+-- non-terminal status.
+--
+-- Inputs:
+--   ctx          aish Context (read & written: turns + pending_exec_output)
+--   model_cfg    the active broker model config (model_cfg.endpoint/.model/etc.)
+--   helpers      table of injected dispatch helpers:
+--                  .tools_schema()         → tools array for opts.tools
+--                  .exec_cmd(cmd)          → run shell cmd; returns (out, exit_code)
+--                  .dispatch_tool(call,args)→ run an MCP tool; returns (content, is_error)
+--                  .extract_cmd_lines(text)→ executor.extract_cmd_lines (passed in)
+--                  .halt(step_n, max_n, reason, action) → "proceed"|"skip"|"abort"
+--                  .render_step(n, max_n, descr)        (renderer.norris_step)
+--                  .render_tool_begin(name, args)       (renderer.tool_call_begin)
+--                  .render_tool_end(content, is_error)  (renderer.tool_call_end)
+--                  .render_exec_begin()                 (renderer.exec_begin)
+--                  .render_exec_end(code)               (renderer.exec_end)
+--                  .render_assistant_delta(chunk)       (renderer.assistant_delta)
+--                  .render_assistant_flush()            (renderer.assistant_flush)
+--                  .log_turn(turn)                      (session log append)
+--   opts:
+--                  .step_n             current step (1-based)
+--                  .max_steps          budget cap (default 8)
+--                  .cfg                full aish config (for is_destructive)
+--
+-- Returns: { status, reason } where status ∈ {
+--    "continue"          — keep looping (driver bumps step_n)
+--    "done"              — model emitted GOAL: complete
+--    "aborted"           — user typed abort at a halt prompt
+--    "stalled"           — model emitted nothing actionable
+--    "budget_exhausted"  — step_n >= max_steps after this iteration
+--    "broker_error"      — broker.chat_stream returned (nil, err)
+-- }
+function M.norris_step(ctx, model_cfg, helpers, opts)
+    local step_n    = opts.step_n or 1
+    local max_steps = opts.max_steps or 8
+    local cfg       = opts.cfg
+
+    helpers.render_step(step_n, max_steps)
+
+    -- (1) one broker round-trip — stream text + collect tool_calls
+    local text_parts      = {}
+    local tool_calls_seen = {}
+    local ok, err = broker.chat_stream(model_cfg, ctx:to_messages(),
+        function(kind, payload)
+            if kind == "text" then
+                text_parts[#text_parts + 1] = payload
+                helpers.render_assistant_delta(payload)
+            elseif kind == "tool_call" then
+                tool_calls_seen[#tool_calls_seen + 1] = payload
+            end
+        end,
+        { tools = helpers.tools_schema() })
+    helpers.render_assistant_flush()
+
+    if not ok then
+        return { status = "broker_error", reason = tostring(err) }
+    end
+
+    local resp_text = table.concat(text_parts)
+
+    -- (2) parse actions from response
+    local cmd_lines = helpers.extract_cmd_lines(resp_text) or {}
+    local goal_done = false
+    for line in (resp_text .. "\n"):gmatch("([^\n]*)\n") do
+        local trimmed = line:gsub("^%s+", ""):gsub("%s+$", "")
+        if trimmed == "GOAL: complete" then
+            goal_done = true; break
+        end
+    end
+
+    local n_actions = #tool_calls_seen + #cmd_lines
+
+    -- (3) record assistant turn (with optional tool_calls)
+    if #tool_calls_seen > 0 then
+        ctx:append({ role = "assistant", content = resp_text,
+                     tool_calls = tool_calls_seen })
+    else
+        ctx:append({ role = "assistant", content = resp_text })
+    end
+    helpers.log_turn(ctx.turns[#ctx.turns])
+
+    if n_actions == 0 and not goal_done then
+        return { status = "stalled", reason = "no action emitted" }
+    end
+
+    -- (4) dispatch tool_calls first (structured route)
+    for _, call in ipairs(tool_calls_seen) do
+        local args_table = {}
+        if call.arguments and call.arguments ~= "" then
+            local d, _, derr = json.decode(call.arguments)
+            if d then args_table = d
+            else
+                -- Argument JSON parse failure: synthesize tool turn (alternation)
+                ctx:append({ role = "tool", tool_call_id = call.id,
+                             content = "[aish] tool arguments not "
+                                       .. "parseable as JSON: " .. tostring(derr) })
+                helpers.log_turn(ctx.turns[#ctx.turns])
+                goto continue_tool
+            end
+        end
+
+        -- Probe destructive on the JSON-serialized call as a proxy.
+        local call_repr = (call.name or "?") .. " " .. (call.arguments or "")
+        local destr, reason = M.is_destructive(call_repr, cfg)
+
+        local verdict
+        if destr then
+            verdict = helpers.halt(step_n, max_steps, reason or "destructive",
+                                   call_repr)
+        else
+            -- Non-destructive tool_call: auto_approve OR halt for consent
+            local policy = cfg and cfg.mcp and cfg.mcp.auto_approve or {}
+            local alias = (call.name or ""):match("^(.-)__")
+            local auto = policy[call.name]
+                         or (alias and alias ~= "" and policy[alias .. "__*"])
+            if auto then
+                verdict = "proceed"
+            else
+                verdict = helpers.halt(step_n, max_steps, "tool consent",
+                                       call_repr)
+            end
+        end
+
+        if verdict == "abort" then
+            return { status = "aborted", reason = "user abort at halt" }
+        elseif verdict == "skip" then
+            ctx.norris_consecutive_skips = (ctx.norris_consecutive_skips or 0) + 1
+            ctx:append({ role = "tool", tool_call_id = call.id,
+                         content = "[aish] tool call skipped by user: "
+                                   .. (reason or "no reason") })
+            helpers.log_turn(ctx.turns[#ctx.turns])
+        else  -- proceed
+            ctx.norris_consecutive_skips = 0
+            helpers.render_tool_begin(call.name, call.arguments)
+            local content, is_error = helpers.dispatch_tool(call.name, args_table)
+            helpers.render_tool_end(content, is_error)
+            ctx:append({ role = "tool", tool_call_id = call.id,
+                         content = content or "" })
+            helpers.log_turn(ctx.turns[#ctx.turns])
+        end
+        ::continue_tool::
+    end
+
+    -- (5) dispatch CMD: lines (legacy route)
+    for _, cmd in ipairs(cmd_lines) do
+        local destr, reason = M.is_destructive(cmd, cfg)
+        local verdict
+        if destr then
+            verdict = helpers.halt(step_n, max_steps, reason or "destructive",
+                                   cmd)
+        else
+            verdict = "proceed"  -- non-destructive CMD: runs without consent
+                                 -- in Norris (Norris user accepted autonomy)
+        end
+
+        if verdict == "abort" then
+            return { status = "aborted", reason = "user abort at halt" }
+        elseif verdict == "skip" then
+            ctx.norris_consecutive_skips = (ctx.norris_consecutive_skips or 0) + 1
+            -- CMD: skip → synthesize exec-output line so the model sees it
+            ctx:append_exec_output("[aish] CMD skipped by user: "
+                                   .. (reason or "no reason"))
+        else  -- proceed
+            ctx.norris_consecutive_skips = 0
+            helpers.render_exec_begin()
+            local out, code = helpers.exec_cmd(cmd)
+            helpers.render_exec_end(code)
+            if cfg and cfg.shell and cfg.shell.capture_output then
+                ctx:append_exec_output(out)
+            end
+        end
+    end
+
+    -- Skip-budget escalation: R-C1
+    if (ctx.norris_consecutive_skips or 0) >= 3 then
+        local verdict = helpers.halt(step_n, max_steps,
+            ("%d consecutive user skips"):format(ctx.norris_consecutive_skips),
+            "(repeated similar destructive proposals)")
+        if verdict == "abort" then
+            return { status = "aborted", reason = "user abort on skip-escalation" }
+        end
+        -- Else: reset the counter and continue (user said proceed)
+        ctx.norris_consecutive_skips = 0
+    end
+
+    -- (6) goal_done after dispatch
+    if goal_done then
+        return { status = "done", reason = "GOAL: complete" }
+    end
+
+    -- (7) budget
+    if step_n >= max_steps then
+        return { status = "budget_exhausted",
+                 reason = ("%d step limit reached"):format(max_steps) }
+    end
+
+    return { status = "continue" }
 end

 return M