Wire #13 secret redaction into safety.lua broker call sites #52
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Issue #13 (commits
e4b818b+d852aca) landed secret redaction across the main egress points:ask_aimain broker calldispatch_tool_call(rehydrate args beforesess:call_tool)DELEGATE:and:delegate:memory summarizeTwo broker call sites in
safety.luawere deliberately deferred from that PR because the scrubbing convention is owned byrepl.lua's closure (thesecrets_sessionlocal) andsafety.luais a separate module that doesn't see it. This issue tracks closing that gap.What's missing
1.
safety.norris_stepmain broker callsafety.lua:321— the Norris autonomous-mode planner round-trips throughbroker.chat_stream(model_cfg, ctx:to_messages(), ...)directly. No scrub on outbound; no rehydrate on the stream. A user running Norris on a goal that touches secrets in context (e.g. a goal whose pre-state includescat ~/.env) sends them plain to the broker even with #13 configured.2.
safety.is_destructiveLLM second-opinion probesafety.lua:177— the deep-model probe sends the user's literalCMD:line tobroker.chat(model_cfg, ...)for YES/NO destructive classification. If the model emitted a CMD: that quotes a secret (e.g.CMD: curl -H "Authorization: Bearer sk-or-v1-..."), that whole line goes plain to the probe model.This is especially leaky because the recommended setting is
safety.llm_model = "cloud"(better verdict reliability per the small-model false-positive issue documented inreference_safety_destructive_patterns.md) — so the probe is the most likely paid-cloud destination on the fleet.Why it wasn't covered in #13
safety.luais invoked fromrepl.luavia thehelperstable fornorris_step, and via a directrequire("safety")call foris_destructive. Neither path currently carries the secrets session — adding it means either:(a) thread
helpers.secrets_sessionthroughsafety.norris_stepandsafety.is_destructive, OR(b) provide
helpers.scrub_msgs(messages, mode)andhelpers.rehydrate(text)callbacks the wayhelpers.exec_cmdis already threaded.Option (b) is the cleaner separation —
safety.luadoesn't need to know about the secrets module; it just calls a callback. Matches the existing helpers convention.Fix sketch (option b)
In
repl.luawhere the helpers table is built forsafety.norris_step:In
safety.norris_step:For
safety.is_destructive: the probe is called fromrepl.lua's CMD: gate. Addcfg.helpersaccess OR pass the scrub callbacks explicitly. The function is small enough that explicit pass via opts is fine:Acceptance criteria
safety.norris_stepscrubs outbound messages per the active model's redact policy; the streamed reply is rehydrated for display, accumulator stores rehydrated text (parity withask_ai).safety.is_destructiveLLM probe scrubs the command string before send; the YES/NO verdict response is rehydrated before parsing (the verdict text itself shouldn't contain placeholders, but rehydration is a safe no-op there).safety.lua(it stays clean ofrequire("secrets")); the integration goes through the helpers/opts callback convention.:secrets statusreporting accurately covers Norris (e.g. mapping count grows when a Norris session scrubs new values).Scope notes
This is the last gap from the #13 design. Out of scope for THIS issue:
helpers.exec_cmdpath itself doesn't need scrubbing (it runs locally, no broker).safety.norris_stepoutbound call).dispatch_toolin Norris's helpers table inherits fromrepl.lua'sdispatch_tool_callwhich already rehydrates args (#13). Confirm this still works once the scrub-via-helpers wiring lands.Estimated size: ~80 LOC across
safety.lua+repl.luahelpers expansion.References
secrets.lua— the module to plug intodocs/PHASE3.md§4 — Norris broker contractreference_safety_destructive_patterns.md(project memory) — whyllm_model = "cloud"is recommended