Secret redaction: scrub credentials before broker, rehydrate in reply #13

New Issue

2026-05-10T11:57:21Z

claude-noether commented

2026-05-10 11:57:21 +00:00

Suggestion

Add a secret-redaction layer between aish and the broker. Substitute credentials/keys/tokens with stable placeholders before sending to the LLM, then rehydrate the placeholders in the reply before showing the user.

The motivation: as soon as config.lua includes free OpenRouter routes (openrouter/*-alpha, *:free), the provider explicitly logs prompts and completions for model training. Any token/key in the conversation — whether typed by the user, expanded via @filename, or surfaced by CMD: output (env dumps, cat /etc/*.conf) — gets baked into someone else's training set. A scrub layer caps that risk without forcing users to maintain a separate "safe" workflow.

Mechanism

Vault

~/.aish/secrets.lua returns a list of literal strings to scrub, each optionally named:

return {
    {name = "OPENROUTER",  value = "sk-or-v1-..."},
    {name = "GH_TOKEN",    value = "ghp_..."},
    {name = "WIFI_PSK",    value = "..."},
    -- bare strings allowed too:
    "some-other-literal-secret",
}

Pre-broker scrub (in `broker.lua`)

Before the POST, every outbound messages[] content gets scanned. Matches are replaced with $AISH_SECRET_001, $AISH_SECRET_002, … and the mapping is kept in a per-turn table.

Post-broker rehydrate (in `renderer.lua`)

Scan the reply for $AISH_SECRET_NNN and re-substitute literal values before display. User sees real values; only placeholders ever crossed the wire.

Optional auto-detect layer

Regex/entropy heuristic for common formats:

sk-..., sk-or-v1-... (OpenAI / OpenRouter)
ghp_..., gho_..., ghs_... (GitHub PATs)
AKIA... (AWS access key)
JWT-shaped strings (3-segment dot-separated base64url)
SSH private key headers (-----BEGIN ... PRIVATE KEY-----)
High-entropy strings >= 32 chars

When auto-detect fires, show the user [redacted 3 candidates: 'sk-or-…', 'AKIA…', high-entropy 47-char blob] and require :y (or auto-confirm if confirm_send=false).

Per-broker policy

Ties into the permission DSL in #9:

brokers = {
    cloud_paid = { url = ..., redact = "vault" },              -- vault only
    cloud_free = { url = ..., redact = "vault+autodetect" },   -- both
    local      = { url = ..., redact = "off" },                -- LAN, trusted
}

Default: vault if vault exists, else off.

Gotchas

These need explicit design decisions before implementation:

Placeholder mangling. Model may emit $AISH_SECRET_01 instead of _001, or wrap in backticks/quotes. Unsubstitution must tolerate trailing punctuation, case variations, and short suffix typos — with logging when a placeholder appears in reply but doesn't match the vault.
Metadata leak. The placeholder convention itself signals "there's a secret here." Better than leaking the value but still observable. A redact = "stealth" mode could replace with plausible decoys (xxxxxx-fake-token-xxxxxx) for true zero-information — at the cost of breaking unsubstitution if the model echoes back.
Newly pasted secrets. Secrets the user types directly never reached the vault first. Autodetect must fire on input, not just stored values.
Model can't reason about scrubbed values. "Fix my .env" workflows break when the model only sees $AISH_SECRET_001=$AISH_SECRET_002. That's a feature for redact=on brokers, a bug for trusted ones — hence the per-broker policy.
CMD: output is the biggest leak. User asks cat ~/.config/foo/credentials; output gets piped into next-turn context; secret leaves the box. Scrubbing must run on tool/exec output before context append, not only on user input.
MCP tool-call interaction (Phase 2). When MCP tools land, their arguments and return values flow through the broker too. Redaction must hook the tool-call path or it's bypassed.
Vault file is itself a secret. Mode 0600, owner-only. Don't write a :secrets meta-command that prints contents. Don't include it in sessions/*.jsonl exports.

Phase target

Phase 4 or 5 — lands cleanly after safety.lua is established (Phase 2 for tool gating) and integrates with permission policy DSL (#9) and per-broker config. Skeleton (vault loader + scrub function + rehydrate function) is small and could ship earlier as Phase 1+ behind redact = "off" default.

Out of scope (this issue)

Encrypted vault / GPG-wrapped secrets file. Phase N+1 if anyone cares.
OS-level secret managers (pass, gnome-keyring). Could be a vault backend later.
Forwarding the redacted prompt to a second model for verification. Out of scope; if you don't trust your model, don't talk to it.

## Suggestion Add a **secret-redaction layer** between aish and the broker. Substitute credentials/keys/tokens with stable placeholders before sending to the LLM, then rehydrate the placeholders in the reply before showing the user. The motivation: as soon as `config.lua` includes free OpenRouter routes (`openrouter/*-alpha`, `*:free`), the provider explicitly logs prompts and completions for model training. Any token/key in the conversation — whether typed by the user, expanded via `@filename`, or surfaced by `CMD:` output (env dumps, `cat /etc/*.conf`) — gets baked into someone else's training set. A scrub layer caps that risk without forcing users to maintain a separate "safe" workflow. ## Mechanism ### Vault `~/.aish/secrets.lua` returns a list of literal strings to scrub, each optionally named: ```lua return { {name = "OPENROUTER", value = "sk-or-v1-..."}, {name = "GH_TOKEN", value = "ghp_..."}, {name = "WIFI_PSK", value = "..."}, -- bare strings allowed too: "some-other-literal-secret", } ``` ### Pre-broker scrub (in `broker.lua`) Before the POST, every outbound `messages[]` content gets scanned. Matches are replaced with `$AISH_SECRET_001`, `$AISH_SECRET_002`, … and the mapping is kept in a per-turn table. ### Post-broker rehydrate (in `renderer.lua`) Scan the reply for `$AISH_SECRET_NNN` and re-substitute literal values before display. User sees real values; only placeholders ever crossed the wire. ### Optional auto-detect layer Regex/entropy heuristic for common formats: - `sk-...`, `sk-or-v1-...` (OpenAI / OpenRouter) - `ghp_...`, `gho_...`, `ghs_...` (GitHub PATs) - `AKIA...` (AWS access key) - JWT-shaped strings (3-segment dot-separated base64url) - SSH private key headers (`-----BEGIN ... PRIVATE KEY-----`) - High-entropy strings >= 32 chars When auto-detect fires, show the user `[redacted 3 candidates: 'sk-or-…', 'AKIA…', high-entropy 47-char blob]` and require `:y` (or auto-confirm if `confirm_send=false`). ### Per-broker policy Ties into the permission DSL in #9: ```lua brokers = { cloud_paid = { url = ..., redact = "vault" }, -- vault only cloud_free = { url = ..., redact = "vault+autodetect" }, -- both local = { url = ..., redact = "off" }, -- LAN, trusted } ``` Default: `vault` if vault exists, else `off`. ## Gotchas These need explicit design decisions before implementation: 1. **Placeholder mangling.** Model may emit `$AISH_SECRET_01` instead of `_001`, or wrap in backticks/quotes. Unsubstitution must tolerate trailing punctuation, case variations, and short suffix typos — with logging when a placeholder appears in reply but doesn't match the vault. 2. **Metadata leak.** The placeholder convention itself signals "there's a secret here." Better than leaking the value but still observable. A `redact = "stealth"` mode could replace with plausible decoys (`xxxxxx-fake-token-xxxxxx`) for true zero-information — at the cost of breaking unsubstitution if the model echoes back. 3. **Newly pasted secrets.** Secrets the user types directly never reached the vault first. Autodetect must fire on input, not just stored values. 4. **Model can't reason about scrubbed values.** "Fix my `.env`" workflows break when the model only sees `$AISH_SECRET_001=$AISH_SECRET_002`. That's a feature for `redact=on` brokers, a bug for trusted ones — hence the per-broker policy. 5. **CMD: output is the biggest leak.** User asks `cat ~/.config/foo/credentials`; output gets piped into next-turn context; secret leaves the box. Scrubbing must run on tool/exec output before context append, not only on user input. 6. **MCP tool-call interaction (Phase 2).** When MCP tools land, their arguments and return values flow through the broker too. Redaction must hook the tool-call path or it's bypassed. 7. **Vault file is itself a secret.** Mode 0600, owner-only. Don't write a `:secrets` meta-command that prints contents. Don't include it in `sessions/*.jsonl` exports. ## Phase target Phase 4 or 5 — lands cleanly after `safety.lua` is established (Phase 2 for tool gating) and integrates with permission policy DSL (#9) and per-broker config. Skeleton (`vault loader + scrub function + rehydrate function`) is small and could ship earlier as Phase 1+ behind `redact = "off"` default. ## Out of scope (this issue) - Encrypted vault / GPG-wrapped secrets file. Phase N+1 if anyone cares. - OS-level secret managers (`pass`, `gnome-keyring`). Could be a vault backend later. - Forwarding the redacted prompt to a *second* model for verification. Out of scope; if you don't trust your model, don't talk to it.

claude-noether added the feature request security labels 2026-05-10 11:57:21 +00:00

claude-noether referenced this issue from a commit

2026-05-16 21:38:25 +00:00

secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1)

claude-noether referenced this issue from a commit

2026-05-16 21:38:25 +00:00

repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args

claude-noether commented

2026-05-16 21:39:25 +00:00

Closed by commits e4b818b (module) + d852aca (wiring).

Coverage vs. the issue body

Gotcha	Status
1. Placeholder mangling	Streaming rehydrator buffers across SSE chunks; tolerates `$` mid-prose; reverses on per-delta + flush
2. Metadata leak	`redact = "stealth"` per-broker emits opaque `xxxxxx-fake-<label>-NNN-xxxxxx` decoys (one-way; no rehydrate)
3. Newly pasted secrets	Autodetect heuristics fire on every outbound scrub: OpenAI `sk-`, OpenRouter `sk-or-v1-`, GitHub `ghp_/gho_/ghs_`, AWS `AKIA<16>`, JWT `eyJ...x.y.z`, SSH `-----BEGIN ... PRIVATE KEY-----`
4. Per-broker policy	`models[*].redact` field; fallback chain `cfg.redact` -> `config.secrets.default` -> `"vault+autodetect"` if vault else `"off"`
5. CMD: output leak	Exec output is appended PLAIN to ctx and scrubbed at egress alongside the user turn (single scrub point keeps the invariant simple)
6. MCP tool-call leak	tool_call arguments scrubbed via `scrub_messages` on outbound; `dispatch_tool_call` rehydrates args before `sess:call_tool` so the trusted MCP server gets real values
7. Vault file is itself a secret	`secrets.load` refuses to load if mode != 0600 (matches ssh); `:secrets` meta never prints values

Hook points (egress-only scrub design)

ctx stores PLAIN values throughout. Scrub happens at the broker call site so per-broker policy is naturally honored.

Site	Scrub	Rehydrate
`ask_ai` main broker call	`scrub_messages(ctx:to_messages(), mode)`	streaming_rehydrator wraps on_delta for display + tail flush
MCP `dispatch_tool_call`	(model already emitted placeholder-bearing args via prior scrub)	`rehydrate_args(args)` before `sess:call_tool`
DELEGATE: / `:delegate`	scrub sub_msgs	rehydrate sub_text before context append
Phase 5 summarize-on-evict	scrub sum_msgs	rehydrate reply that becomes ctx.summary
`:memory summarize`	scrub sum_msgs	rehydrate reply before candidate parsing

Mode resolution

secrets_mode_for(model_cfg):
  model_cfg.redact                          (highest priority)
  -> config.secrets.default
  -> "vault+autodetect" if vault loaded
  -> "off"

New meta

:secrets [status] -- vault entries, placeholder count, active broker mode. Never prints values.

:secrets check <text> -- dry-run scrub against the active broker's mode; shows the transformation without sending anywhere.

Validation

Module unit tests: 20/20 (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder dollar-sign pass-through).

E2E on higgs:

Vault at ~/.aish/secrets.lua (chmod 600 enforced -- 644 rejected with chmod hint)
:secrets check against vault literal, OpenRouter pattern, GitHub PAT -- all three scrub correctly to $AISH_SECRET_001/002/003
Cloud broker round-trip: model replied with the rehydrated value (back-substituted from a placeholder it sent), confirming both scrub-on-egress and rehydrate-on-display work

Deferred to follow-up (clearly scoped)

safety.lua broker call sites (Norris main loop + is_destructive LLM second-opinion probe) -- same wiring pattern, but safety doesn't currently see secrets_session. Needs threading through the helpers table. Reasonable narrow follow-up.

Out of scope (per the issue body)

Encrypted vault / GPG-wrapped secrets
OS-level secret managers (pass, gnome-keyring) as vault backends
Forwarding the redacted prompt to a second model for verification

Closed by commits `e4b818b` (module) + `d852aca` (wiring). ## Coverage vs. the issue body | Gotcha | Status | |---|---| | 1. Placeholder mangling | Streaming rehydrator buffers across SSE chunks; tolerates `$` mid-prose; reverses on per-delta + flush | | 2. Metadata leak | `redact = "stealth"` per-broker emits opaque `xxxxxx-fake-<label>-NNN-xxxxxx` decoys (one-way; no rehydrate) | | 3. Newly pasted secrets | Autodetect heuristics fire on every outbound scrub: OpenAI `sk-`, OpenRouter `sk-or-v1-`, GitHub `ghp_/gho_/ghs_`, AWS `AKIA<16>`, JWT `eyJ...x.y.z`, SSH `-----BEGIN ... PRIVATE KEY-----` | | 4. Per-broker policy | `models[*].redact` field; fallback chain `cfg.redact` -> `config.secrets.default` -> `"vault+autodetect"` if vault else `"off"` | | 5. CMD: output leak | Exec output is appended PLAIN to ctx and scrubbed at egress alongside the user turn (single scrub point keeps the invariant simple) | | 6. MCP tool-call leak | tool_call arguments scrubbed via `scrub_messages` on outbound; `dispatch_tool_call` rehydrates args before `sess:call_tool` so the trusted MCP server gets real values | | 7. Vault file is itself a secret | `secrets.load` refuses to load if mode != 0600 (matches ssh); `:secrets` meta never prints values | ## Hook points (egress-only scrub design) ctx stores PLAIN values throughout. Scrub happens at the broker call site so per-broker policy is naturally honored. | Site | Scrub | Rehydrate | |---|---|---| | `ask_ai` main broker call | `scrub_messages(ctx:to_messages(), mode)` | streaming_rehydrator wraps on_delta for display + tail flush | | MCP `dispatch_tool_call` | (model already emitted placeholder-bearing args via prior scrub) | `rehydrate_args(args)` before `sess:call_tool` | | DELEGATE: / `:delegate` | scrub sub_msgs | rehydrate sub_text before context append | | Phase 5 summarize-on-evict | scrub sum_msgs | rehydrate reply that becomes ctx.summary | | `:memory summarize` | scrub sum_msgs | rehydrate reply before candidate parsing | ## Mode resolution ``` secrets_mode_for(model_cfg): model_cfg.redact (highest priority) -> config.secrets.default -> "vault+autodetect" if vault loaded -> "off" ``` ## New meta `:secrets [status]` -- vault entries, placeholder count, active broker mode. Never prints values. `:secrets check <text>` -- dry-run scrub against the active broker's mode; shows the transformation without sending anywhere. ## Validation Module unit tests: 20/20 (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder dollar-sign pass-through). E2E on higgs: - Vault at `~/.aish/secrets.lua` (chmod 600 enforced -- 644 rejected with chmod hint) - `:secrets check` against vault literal, OpenRouter pattern, GitHub PAT -- all three scrub correctly to `$AISH_SECRET_001/002/003` - Cloud broker round-trip: model replied with the rehydrated value (back-substituted from a placeholder it sent), confirming both scrub-on-egress and rehydrate-on-display work ## Deferred to follow-up (clearly scoped) - `safety.lua` broker call sites (Norris main loop + `is_destructive` LLM second-opinion probe) -- same wiring pattern, but `safety` doesn't currently see `secrets_session`. Needs threading through the helpers table. Reasonable narrow follow-up. ## Out of scope (per the issue body) - Encrypted vault / GPG-wrapped secrets - OS-level secret managers (`pass`, `gnome-keyring`) as vault backends - Forwarding the redacted prompt to a second model for verification

claude-noether closed this issue

2026-05-16 21:39:25 +00:00

claude-noether referenced this issue

2026-05-16 21:43:21 +00:00

Wire #13 secret redaction into safety.lua broker call sites #52

claude-noether referenced this issue from a commit

2026-05-16 22:40:32 +00:00

safety + repl: wire secrets into safety.lua (closes #52)

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marfrit/aish#13

Secret redaction: scrub credentials before broker, rehydrate in reply #13

Suggestion

Mechanism

Vault

Pre-broker scrub (in broker.lua)

Post-broker rehydrate (in renderer.lua)

Optional auto-detect layer

Per-broker policy

Gotchas

Phase target

Out of scope (this issue)

Coverage vs. the issue body

Hook points (egress-only scrub design)

Mode resolution

New meta

Validation

Deferred to follow-up (clearly scoped)

Out of scope (per the issue body)

Pre-broker scrub (in `broker.lua`)

Post-broker rehydrate (in `renderer.lua`)