safety + repl: wire secrets into safety.lua (closes #52)

Closes the last #13 gap — Norris broker call + is_destructive LLM
second-opinion probe were the two egress points NOT covered by the
scrub-at-egress design in commit d852aca.

Approach: option (b) per #52's fix sketch — callback-via-helpers/opts.
safety.lua does NOT gain a require("secrets") dependency (acceptance
criteria 3); integration is purely through the convention the rest
of the helpers table already uses.

safety.lua changes:

  - llm_probe gains an opts table. When opts.scrub_msgs is set, the
    {system, user(cmd)} message pair is scrubbed before broker.chat.
    When opts.rehydrate is set, the YES/NO reply is rehydrated before
    parsing (defensive — the verdict shouldn't carry placeholders but
    rehydration is a safe no-op if it doesn't).

  - llm_second_opinion threads opts through to llm_probe.

  - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is
    backwards-compatible (no scrub, original behavior).

  - M.norris_step:
      * outbound broker.chat_stream message scrubbed via
        helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided.
      * on_delta wrapped with helpers.streaming_rehydrator():push /
        :flush so the user sees rehydrated text AND text_parts
        accumulates rehydrated chunks (parity with ask_ai in repl.lua).
      * both M.is_destructive call sites (tool_call probe + CMD: probe)
        now pass probe_opts = {scrub_msgs, rehydrate} when the
        helpers carry them.

repl.lua changes:

  - Norris helpers table gains scrub_msgs / rehydrate /
    streaming_rehydrator closures, all nil-safe (return identity /
    nil when secrets_session is nil).

  - :safety check meta passes probe_opts to is_destructive when
    secrets_session is configured. Without secrets, behavior unchanged.

Unit-test verified end-to-end:
  - Stubbed broker.chat captures the messages it receives.
  - Without opts: probe SEES `ghp_realsecretvalue_...` (control).
  - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 22:40:30 +00:00
parent ac58b19da2
commit 955bd82efb
2 changed files with 86 additions and 16 deletions
+28 -1
View File
@@ -1208,6 +1208,21 @@ function M.run(config)
render_assistant_delta = renderer.assistant_delta,
render_assistant_flush = renderer.assistant_flush,
log_turn = log_turn,
-- Issue #52: pass secrets-aware callbacks so safety.lua
-- can scrub outbound Norris broker messages + LLM probe
-- inputs + rehydrate streamed replies. All three are nil-
-- safe; safety.lua only wires them in when present.
scrub_msgs = function(msgs, mode_cfg)
return scrub_messages(msgs, secrets_mode_for(mode_cfg or active_cfg))
end,
rehydrate = function(text)
return secrets_session and secrets_session:rehydrate(text) or text
end,
streaming_rehydrator = function()
return secrets_session
and secrets.streaming_rehydrator(secrets_session)
or nil
end,
}
local step_n = 1
@@ -1641,7 +1656,19 @@ function M.run(config)
end
-- Pass cfg so the LLM probe runs; user can opt-out via
-- :safety check --no-llm <cmd> if added in v2.
local hit, reason = safety.is_destructive(cmd, config)
-- Issue #52: thread secrets scrub/rehydrate so the probe
-- model sees placeholders for any secrets in `cmd`.
local probe_opts
if secrets_session then
probe_opts = {
scrub_msgs = function(msgs, mode_cfg)
return scrub_messages(msgs,
secrets_mode_for(mode_cfg or active_cfg))
end,
rehydrate = function(t) return secrets_session:rehydrate(t) end,
}
end
local hit, reason = safety.is_destructive(cmd, config, probe_opts)
if hit then
renderer.status(("DESTRUCTIVE — %s"):format(reason or "?"))
else