safety + repl: wire secrets into safety.lua (closes #52)

Closes the last #13 gap — Norris broker call + is_destructive LLM second-opinion probe were the two egress points NOT covered by the scrub-at-egress design in commit d852aca. Approach: option (b) per #52's fix sketch — callback-via-helpers/opts. safety.lua does NOT gain a require("secrets") dependency (acceptance criteria 3); integration is purely through the convention the rest of the helpers table already uses. safety.lua changes: - llm_probe gains an opts table. When opts.scrub_msgs is set, the {system, user(cmd)} message pair is scrubbed before broker.chat. When opts.rehydrate is set, the YES/NO reply is rehydrated before parsing (defensive — the verdict shouldn't carry placeholders but rehydration is a safe no-op if it doesn't). - llm_second_opinion threads opts through to llm_probe. - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is backwards-compatible (no scrub, original behavior). - M.norris_step: * outbound broker.chat_stream message scrubbed via helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided. * on_delta wrapped with helpers.streaming_rehydrator():push / :flush so the user sees rehydrated text AND text_parts accumulates rehydrated chunks (parity with ask_ai in repl.lua). * both M.is_destructive call sites (tool_call probe + CMD: probe) now pass probe_opts = {scrub_msgs, rehydrate} when the helpers carry them. repl.lua changes: - Norris helpers table gains scrub_msgs / rehydrate / streaming_rehydrator closures, all nil-safe (return identity / nil when secrets_session is nil). - :safety check meta passes probe_opts to is_destructive when secrets_session is configured. Without secrets, behavior unchanged. Unit-test verified end-to-end: - Stubbed broker.chat captures the messages it receives. - Without opts: probe SEES `ghp_realsecretvalue_...` (control). - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:40:30 +00:00
parent ac58b19da2
commit 955bd82efb
2 changed files with 86 additions and 16 deletions
@@ -1208,6 +1208,21 @@ function M.run(config)
            render_assistant_delta   = renderer.assistant_delta,
            render_assistant_flush   = renderer.assistant_flush,
            log_turn                 = log_turn,
+            -- Issue #52: pass secrets-aware callbacks so safety.lua
+            -- can scrub outbound Norris broker messages + LLM probe
+            -- inputs + rehydrate streamed replies. All three are nil-
+            -- safe; safety.lua only wires them in when present.
+            scrub_msgs = function(msgs, mode_cfg)
+                return scrub_messages(msgs, secrets_mode_for(mode_cfg or active_cfg))
+            end,
+            rehydrate = function(text)
+                return secrets_session and secrets_session:rehydrate(text) or text
+            end,
+            streaming_rehydrator = function()
+                return secrets_session
+                    and secrets.streaming_rehydrator(secrets_session)
+                    or  nil
+            end,
        }

        local step_n = 1
@@ -1641,7 +1656,19 @@ function M.run(config)
                end
                -- Pass cfg so the LLM probe runs; user can opt-out via
                -- :safety check --no-llm <cmd> if added in v2.
-                local hit, reason = safety.is_destructive(cmd, config)
+                -- Issue #52: thread secrets scrub/rehydrate so the probe
+                -- model sees placeholders for any secrets in `cmd`.
+                local probe_opts
+                if secrets_session then
+                    probe_opts = {
+                        scrub_msgs = function(msgs, mode_cfg)
+                            return scrub_messages(msgs,
+                                secrets_mode_for(mode_cfg or active_cfg))
+                        end,
+                        rehydrate = function(t) return secrets_session:rehydrate(t) end,
+                    }
+                end
+                local hit, reason = safety.is_destructive(cmd, config, probe_opts)
                if hit then
                    renderer.status(("DESTRUCTIVE — %s"):format(reason or "?"))
                else