Secret redaction: scrub credentials before broker, rehydrate in reply #13

Closed
opened 2026-05-10 11:57:21 +00:00 by claude-noether · 1 comment
Collaborator

Suggestion

Add a secret-redaction layer between aish and the broker. Substitute credentials/keys/tokens with stable placeholders before sending to the LLM, then rehydrate the placeholders in the reply before showing the user.

The motivation: as soon as config.lua includes free OpenRouter routes (openrouter/*-alpha, *:free), the provider explicitly logs prompts and completions for model training. Any token/key in the conversation — whether typed by the user, expanded via @filename, or surfaced by CMD: output (env dumps, cat /etc/*.conf) — gets baked into someone else's training set. A scrub layer caps that risk without forcing users to maintain a separate "safe" workflow.

Mechanism

Vault

~/.aish/secrets.lua returns a list of literal strings to scrub, each optionally named:

return {
    {name = "OPENROUTER",  value = "sk-or-v1-..."},
    {name = "GH_TOKEN",    value = "ghp_..."},
    {name = "WIFI_PSK",    value = "..."},
    -- bare strings allowed too:
    "some-other-literal-secret",
}

Pre-broker scrub (in broker.lua)

Before the POST, every outbound messages[] content gets scanned. Matches are replaced with $AISH_SECRET_001, $AISH_SECRET_002, … and the mapping is kept in a per-turn table.

Post-broker rehydrate (in renderer.lua)

Scan the reply for $AISH_SECRET_NNN and re-substitute literal values before display. User sees real values; only placeholders ever crossed the wire.

Optional auto-detect layer

Regex/entropy heuristic for common formats:

  • sk-..., sk-or-v1-... (OpenAI / OpenRouter)
  • ghp_..., gho_..., ghs_... (GitHub PATs)
  • AKIA... (AWS access key)
  • JWT-shaped strings (3-segment dot-separated base64url)
  • SSH private key headers (-----BEGIN ... PRIVATE KEY-----)
  • High-entropy strings >= 32 chars

When auto-detect fires, show the user [redacted 3 candidates: 'sk-or-…', 'AKIA…', high-entropy 47-char blob] and require :y (or auto-confirm if confirm_send=false).

Per-broker policy

Ties into the permission DSL in #9:

brokers = {
    cloud_paid = { url = ..., redact = "vault" },              -- vault only
    cloud_free = { url = ..., redact = "vault+autodetect" },   -- both
    local      = { url = ..., redact = "off" },                -- LAN, trusted
}

Default: vault if vault exists, else off.

Gotchas

These need explicit design decisions before implementation:

  1. Placeholder mangling. Model may emit $AISH_SECRET_01 instead of _001, or wrap in backticks/quotes. Unsubstitution must tolerate trailing punctuation, case variations, and short suffix typos — with logging when a placeholder appears in reply but doesn't match the vault.
  2. Metadata leak. The placeholder convention itself signals "there's a secret here." Better than leaking the value but still observable. A redact = "stealth" mode could replace with plausible decoys (xxxxxx-fake-token-xxxxxx) for true zero-information — at the cost of breaking unsubstitution if the model echoes back.
  3. Newly pasted secrets. Secrets the user types directly never reached the vault first. Autodetect must fire on input, not just stored values.
  4. Model can't reason about scrubbed values. "Fix my .env" workflows break when the model only sees $AISH_SECRET_001=$AISH_SECRET_002. That's a feature for redact=on brokers, a bug for trusted ones — hence the per-broker policy.
  5. CMD: output is the biggest leak. User asks cat ~/.config/foo/credentials; output gets piped into next-turn context; secret leaves the box. Scrubbing must run on tool/exec output before context append, not only on user input.
  6. MCP tool-call interaction (Phase 2). When MCP tools land, their arguments and return values flow through the broker too. Redaction must hook the tool-call path or it's bypassed.
  7. Vault file is itself a secret. Mode 0600, owner-only. Don't write a :secrets meta-command that prints contents. Don't include it in sessions/*.jsonl exports.

Phase target

Phase 4 or 5 — lands cleanly after safety.lua is established (Phase 2 for tool gating) and integrates with permission policy DSL (#9) and per-broker config. Skeleton (vault loader + scrub function + rehydrate function) is small and could ship earlier as Phase 1+ behind redact = "off" default.

Out of scope (this issue)

  • Encrypted vault / GPG-wrapped secrets file. Phase N+1 if anyone cares.
  • OS-level secret managers (pass, gnome-keyring). Could be a vault backend later.
  • Forwarding the redacted prompt to a second model for verification. Out of scope; if you don't trust your model, don't talk to it.
## Suggestion Add a **secret-redaction layer** between aish and the broker. Substitute credentials/keys/tokens with stable placeholders before sending to the LLM, then rehydrate the placeholders in the reply before showing the user. The motivation: as soon as `config.lua` includes free OpenRouter routes (`openrouter/*-alpha`, `*:free`), the provider explicitly logs prompts and completions for model training. Any token/key in the conversation — whether typed by the user, expanded via `@filename`, or surfaced by `CMD:` output (env dumps, `cat /etc/*.conf`) — gets baked into someone else's training set. A scrub layer caps that risk without forcing users to maintain a separate "safe" workflow. ## Mechanism ### Vault `~/.aish/secrets.lua` returns a list of literal strings to scrub, each optionally named: ```lua return { {name = "OPENROUTER", value = "sk-or-v1-..."}, {name = "GH_TOKEN", value = "ghp_..."}, {name = "WIFI_PSK", value = "..."}, -- bare strings allowed too: "some-other-literal-secret", } ``` ### Pre-broker scrub (in `broker.lua`) Before the POST, every outbound `messages[]` content gets scanned. Matches are replaced with `$AISH_SECRET_001`, `$AISH_SECRET_002`, … and the mapping is kept in a per-turn table. ### Post-broker rehydrate (in `renderer.lua`) Scan the reply for `$AISH_SECRET_NNN` and re-substitute literal values before display. User sees real values; only placeholders ever crossed the wire. ### Optional auto-detect layer Regex/entropy heuristic for common formats: - `sk-...`, `sk-or-v1-...` (OpenAI / OpenRouter) - `ghp_...`, `gho_...`, `ghs_...` (GitHub PATs) - `AKIA...` (AWS access key) - JWT-shaped strings (3-segment dot-separated base64url) - SSH private key headers (`-----BEGIN ... PRIVATE KEY-----`) - High-entropy strings >= 32 chars When auto-detect fires, show the user `[redacted 3 candidates: 'sk-or-…', 'AKIA…', high-entropy 47-char blob]` and require `:y` (or auto-confirm if `confirm_send=false`). ### Per-broker policy Ties into the permission DSL in #9: ```lua brokers = { cloud_paid = { url = ..., redact = "vault" }, -- vault only cloud_free = { url = ..., redact = "vault+autodetect" }, -- both local = { url = ..., redact = "off" }, -- LAN, trusted } ``` Default: `vault` if vault exists, else `off`. ## Gotchas These need explicit design decisions before implementation: 1. **Placeholder mangling.** Model may emit `$AISH_SECRET_01` instead of `_001`, or wrap in backticks/quotes. Unsubstitution must tolerate trailing punctuation, case variations, and short suffix typos — with logging when a placeholder appears in reply but doesn't match the vault. 2. **Metadata leak.** The placeholder convention itself signals "there's a secret here." Better than leaking the value but still observable. A `redact = "stealth"` mode could replace with plausible decoys (`xxxxxx-fake-token-xxxxxx`) for true zero-information — at the cost of breaking unsubstitution if the model echoes back. 3. **Newly pasted secrets.** Secrets the user types directly never reached the vault first. Autodetect must fire on input, not just stored values. 4. **Model can't reason about scrubbed values.** "Fix my `.env`" workflows break when the model only sees `$AISH_SECRET_001=$AISH_SECRET_002`. That's a feature for `redact=on` brokers, a bug for trusted ones — hence the per-broker policy. 5. **CMD: output is the biggest leak.** User asks `cat ~/.config/foo/credentials`; output gets piped into next-turn context; secret leaves the box. Scrubbing must run on tool/exec output before context append, not only on user input. 6. **MCP tool-call interaction (Phase 2).** When MCP tools land, their arguments and return values flow through the broker too. Redaction must hook the tool-call path or it's bypassed. 7. **Vault file is itself a secret.** Mode 0600, owner-only. Don't write a `:secrets` meta-command that prints contents. Don't include it in `sessions/*.jsonl` exports. ## Phase target Phase 4 or 5 — lands cleanly after `safety.lua` is established (Phase 2 for tool gating) and integrates with permission policy DSL (#9) and per-broker config. Skeleton (`vault loader + scrub function + rehydrate function`) is small and could ship earlier as Phase 1+ behind `redact = "off"` default. ## Out of scope (this issue) - Encrypted vault / GPG-wrapped secrets file. Phase N+1 if anyone cares. - OS-level secret managers (`pass`, `gnome-keyring`). Could be a vault backend later. - Forwarding the redacted prompt to a *second* model for verification. Out of scope; if you don't trust your model, don't talk to it.
claude-noether added the feature requestsecurity labels 2026-05-10 11:57:21 +00:00
Author
Collaborator

Closed by commits e4b818b (module) + d852aca (wiring).

Coverage vs. the issue body

Gotcha Status
1. Placeholder mangling Streaming rehydrator buffers across SSE chunks; tolerates $ mid-prose; reverses on per-delta + flush
2. Metadata leak redact = "stealth" per-broker emits opaque xxxxxx-fake-<label>-NNN-xxxxxx decoys (one-way; no rehydrate)
3. Newly pasted secrets Autodetect heuristics fire on every outbound scrub: OpenAI sk-, OpenRouter sk-or-v1-, GitHub ghp_/gho_/ghs_, AWS AKIA<16>, JWT eyJ...x.y.z, SSH -----BEGIN ... PRIVATE KEY-----
4. Per-broker policy models[*].redact field; fallback chain cfg.redact -> config.secrets.default -> "vault+autodetect" if vault else "off"
5. CMD: output leak Exec output is appended PLAIN to ctx and scrubbed at egress alongside the user turn (single scrub point keeps the invariant simple)
6. MCP tool-call leak tool_call arguments scrubbed via scrub_messages on outbound; dispatch_tool_call rehydrates args before sess:call_tool so the trusted MCP server gets real values
7. Vault file is itself a secret secrets.load refuses to load if mode != 0600 (matches ssh); :secrets meta never prints values

Hook points (egress-only scrub design)

ctx stores PLAIN values throughout. Scrub happens at the broker call site so per-broker policy is naturally honored.

Site Scrub Rehydrate
ask_ai main broker call scrub_messages(ctx:to_messages(), mode) streaming_rehydrator wraps on_delta for display + tail flush
MCP dispatch_tool_call (model already emitted placeholder-bearing args via prior scrub) rehydrate_args(args) before sess:call_tool
DELEGATE: / :delegate scrub sub_msgs rehydrate sub_text before context append
Phase 5 summarize-on-evict scrub sum_msgs rehydrate reply that becomes ctx.summary
:memory summarize scrub sum_msgs rehydrate reply before candidate parsing

Mode resolution

secrets_mode_for(model_cfg):
  model_cfg.redact                          (highest priority)
  -> config.secrets.default
  -> "vault+autodetect" if vault loaded
  -> "off"

New meta

:secrets [status] -- vault entries, placeholder count, active broker mode. Never prints values.

:secrets check <text> -- dry-run scrub against the active broker's mode; shows the transformation without sending anywhere.

Validation

Module unit tests: 20/20 (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder dollar-sign pass-through).

E2E on higgs:

  • Vault at ~/.aish/secrets.lua (chmod 600 enforced -- 644 rejected with chmod hint)
  • :secrets check against vault literal, OpenRouter pattern, GitHub PAT -- all three scrub correctly to $AISH_SECRET_001/002/003
  • Cloud broker round-trip: model replied with the rehydrated value (back-substituted from a placeholder it sent), confirming both scrub-on-egress and rehydrate-on-display work

Deferred to follow-up (clearly scoped)

  • safety.lua broker call sites (Norris main loop + is_destructive LLM second-opinion probe) -- same wiring pattern, but safety doesn't currently see secrets_session. Needs threading through the helpers table. Reasonable narrow follow-up.

Out of scope (per the issue body)

  • Encrypted vault / GPG-wrapped secrets
  • OS-level secret managers (pass, gnome-keyring) as vault backends
  • Forwarding the redacted prompt to a second model for verification
Closed by commits `e4b818b` (module) + `d852aca` (wiring). ## Coverage vs. the issue body | Gotcha | Status | |---|---| | 1. Placeholder mangling | Streaming rehydrator buffers across SSE chunks; tolerates `$` mid-prose; reverses on per-delta + flush | | 2. Metadata leak | `redact = "stealth"` per-broker emits opaque `xxxxxx-fake-<label>-NNN-xxxxxx` decoys (one-way; no rehydrate) | | 3. Newly pasted secrets | Autodetect heuristics fire on every outbound scrub: OpenAI `sk-`, OpenRouter `sk-or-v1-`, GitHub `ghp_/gho_/ghs_`, AWS `AKIA<16>`, JWT `eyJ...x.y.z`, SSH `-----BEGIN ... PRIVATE KEY-----` | | 4. Per-broker policy | `models[*].redact` field; fallback chain `cfg.redact` -> `config.secrets.default` -> `"vault+autodetect"` if vault else `"off"` | | 5. CMD: output leak | Exec output is appended PLAIN to ctx and scrubbed at egress alongside the user turn (single scrub point keeps the invariant simple) | | 6. MCP tool-call leak | tool_call arguments scrubbed via `scrub_messages` on outbound; `dispatch_tool_call` rehydrates args before `sess:call_tool` so the trusted MCP server gets real values | | 7. Vault file is itself a secret | `secrets.load` refuses to load if mode != 0600 (matches ssh); `:secrets` meta never prints values | ## Hook points (egress-only scrub design) ctx stores PLAIN values throughout. Scrub happens at the broker call site so per-broker policy is naturally honored. | Site | Scrub | Rehydrate | |---|---|---| | `ask_ai` main broker call | `scrub_messages(ctx:to_messages(), mode)` | streaming_rehydrator wraps on_delta for display + tail flush | | MCP `dispatch_tool_call` | (model already emitted placeholder-bearing args via prior scrub) | `rehydrate_args(args)` before `sess:call_tool` | | DELEGATE: / `:delegate` | scrub sub_msgs | rehydrate sub_text before context append | | Phase 5 summarize-on-evict | scrub sum_msgs | rehydrate reply that becomes ctx.summary | | `:memory summarize` | scrub sum_msgs | rehydrate reply before candidate parsing | ## Mode resolution ``` secrets_mode_for(model_cfg): model_cfg.redact (highest priority) -> config.secrets.default -> "vault+autodetect" if vault loaded -> "off" ``` ## New meta `:secrets [status]` -- vault entries, placeholder count, active broker mode. Never prints values. `:secrets check <text>` -- dry-run scrub against the active broker's mode; shows the transformation without sending anywhere. ## Validation Module unit tests: 20/20 (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder dollar-sign pass-through). E2E on higgs: - Vault at `~/.aish/secrets.lua` (chmod 600 enforced -- 644 rejected with chmod hint) - `:secrets check` against vault literal, OpenRouter pattern, GitHub PAT -- all three scrub correctly to `$AISH_SECRET_001/002/003` - Cloud broker round-trip: model replied with the rehydrated value (back-substituted from a placeholder it sent), confirming both scrub-on-egress and rehydrate-on-display work ## Deferred to follow-up (clearly scoped) - `safety.lua` broker call sites (Norris main loop + `is_destructive` LLM second-opinion probe) -- same wiring pattern, but `safety` doesn't currently see `secrets_session`. Needs threading through the helpers table. Reasonable narrow follow-up. ## Out of scope (per the issue body) - Encrypted vault / GPG-wrapped secrets - OS-level secret managers (`pass`, `gnome-keyring`) as vault backends - Forwarding the redacted prompt to a second model for verification
Sign in to join this conversation.