Files
bot-bottle/docs/research/secret-minimization-over-dlp.md
T
didericis-codex cdb1870b1c
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 43s
docs(agent): clarify claude oauth env
2026-05-28 18:20:09 -04:00

14 KiB

Secret minimization beats DLP for the encoding/splitting attack class

Why pipelock's body DLP and gitleaks's pre-receive scan cannot stop a determined exfil attempt, and what architectural shifts actually bound the damage. Companion to secret-exfil-tripwire-encodings.md (which covers the inverse — detecting a known secret across encodings) and to agent-credential-proxy-landscape.md (which establishes the gate / auth-injecting-proxy pattern, the per-service mechanics for Anthropic / Gitea / GitHub, and the existing tools in that space).

Summary

bot-bottle's v1 egress story is: pipelock allowlists hostnames, intercepts TLS, body-scans every request against 48 builtin DLP patterns, and blocks on hit. Gitleaks does the analog on git push. Both are signature-based. Against a determined compromised or prompt-injected agent, both lose: base64 / hex / gzip / split-across- requests / custom-encoding all evade signature match, and one exfil-channel — the Anthropic API conversation body — is deliberately not scanned at all because pipelock's BIP-39 detector false-positives on ordinary English prose.

The fix is not "scan harder." It is the same shift that the git-gate already makes for upstream push credentials: keep the secret out of the bottle in the first place. The agent talks to a per-bottle gate; the gate holds the credential and uses it on the agent's behalf. Whatever the agent can encode and exfil, it cannot encode bytes it never had.

The encoding/splitting attack is fundamentally unbounded for any secret the bottle holds, but it stops mattering once the bottle holds fewer secrets.

Why egress scanning is a losing position

Three independent failure modes, each sufficient on its own:

1. Encoding evades signature match

The 48 builtin pipelock patterns (and gitleaks's analog set) recognize known credential shapes: ghp_…, sk_live_…, AWS access-key prefixes, PEM block markers, etc. The same bytes under any of:

  • base64 / base64url / base32 / hex / ascii-85
  • gzip + base64
  • per-char numeric encoding
  • ROT13, custom alphabet substitution
  • splitting across request boundaries (first 20 chars in request A, last 20 in request B)
  • chunking via path segments instead of body bytes
  • DNS subdomain encoding (out-of-scope for pipelock entirely; only the v1 dnsmasq / iptables layer in network-egress-guard.md would catch it)

…do not match any pipelock or gitleaks pattern, because the patterns are byte-literal on the canonical form. secret-exfil-tripwire-encodings.md documents the detection side of this gap: no off-the-shelf tool emits N encoded forms of a known secret for downstream matching, and even the multi-encoding decoders in gitleaks 8.20+ / TruffleHog only undo one layer.

A motivated attacker controls the encoding. A static signature set cannot win that race.

2. The api.anthropic.com passthrough is unscanned by design

pipelock.py:DEFAULT_TLS_PASSTHROUGH exempts api.anthropic.com from TLS interception because pipelock's BIP-39 seed-phrase detector false-positives on any 12+ consecutive English words from the BIP-39 wordlist that happen to checksum cleanly — i.e., on a lot of normal prose. The recommended treatment per pipelock's own docs is passthrough_domains: the CONNECT host gate still applies, but body is not decrypted.

The agent's conversation channel is therefore wide open as an exfil path. A prompt-injected agent that has been told a secret can ship it to Anthropic as conversation text, formatted however it likes, and pipelock sees only CONNECT api.anthropic.com:443. The BOT_BOTTLE_CLAUDE_OAUTH_TOKEN itself rides this exact path.

3. Out-of-band channels exist regardless

Even with body DLP fully on, the v1 design knowingly leaves several non-HTTP egress paths unscanned (called out as non-goals in PRD 0001 §Non-goals): SSH session content via ssh-gate, raw TCP, UDP, ICMP, TLS-SNI domain fronting on allowlisted hosts. The ssh-gate is dumb L4 socat — scp, -L/-R/-D port forwards, and arbitrary remote-shell content are all opaque to it. Any one of these is sufficient to ship a secret the agent holds.

The shift: stop holding the secret

The README already states the operative principle:

The container itself adds a layer between the agent and the host, but the v1 design leans more on secret minimization and egress allowlisting than on the container as a hardened boundary.

The encoding/splitting class is the residual that egress allowlisting cannot solve and that secret minimization can. The git-gate is the proof-of-concept: the agent's effective capability is "push to upstream," not "hold an SSH key." Moving the key from the agent into the gate left the capability intact and the credential bytes unreachable to the agent. The same move applies to anything where the agent's actual need is perform operation X against service Y, not hold a credential.

agent-credential-proxy-landscape.md covers the gate pattern in depth: kernel-boundary mechanics, the per-service wiring for CLAUDE_CODE_OAUTH_TOKEN / Gitea PAT / GitHub PAT, the four proxy topologies, and the landscape of existing tools. What follows is the same idea generalized to the full secret set in a bottle.

Concrete mitigations, ordered by leverage

1. Generalize the gate pattern to API credentials (highest leverage)

For every credential the agent currently holds, ask: does the agent need the bytes, or does it need to perform an operation the credential authorizes? If the latter, the credential moves into a per-bottle gate that:

  • Holds the secret in a process the agent cannot read (kernel-enforced cross-UID /proc/<pid>/environ deny on Docker; separate VM on smolmachines).
  • Exposes a narrow surface (a localhost URL, or a service name on the internal network).
  • Optionally enforces a method/path allowlist so the credential's full scope isn't blanket-granted.

Two concrete instances worth implementing:

Anthropic-API gate. Holds BOT_BOTTLE_CLAUDE_OAUTH_TOKEN. Agent's ANTHROPIC_BASE_URL points at the gate; gate injects Authorization: Bearer … and forwards to api.anthropic.com. The token is no longer in the bottle's env. Once the token is out, DEFAULT_TLS_PASSTHROUGH for api.anthropic.com can be dropped and pipelock can body-scan the conversation channel with the noisy patterns (BIP-39) disabled and only high-confidence ones (long high-entropy strings, known token formats) left on. The known gotchas — SSE streaming, header passthrough for anthropic-version / anthropic-beta / X-Claude-Code-Session-Id, out-of-band telemetry to statsig.anthropic.com — are documented in agent-credential-proxy-landscape.md §Anthropic / Claude Code.

Forge-API gate (Gitea / GitHub / GitLab). Holds the PAT; exposes a narrow REST surface. Token auth on all three is stateless Authorization-header injection — no CSRF, no request signing, no per-request nonce — so one proxy generalizes by config. Per-service mechanics (Gitea uses Authorization: token <…> not Bearer; tea bypasses git's credential.helper; etc.) are in agent-credential-proxy-landscape.md §Gitea / §GitHub / GitLab.

Both compose with the existing git-gate (which addresses the push credential), not replace it.

2. Stop passthrough'ing api.anthropic.com

A direct consequence of #1: once the token is out of the bottle, pipelock can decrypt and inspect the conversation channel. Replace the broad BIP-39 detector with a tighter ruleset — high-entropy strings ≥40 chars, known credential format prefixes, long base64 blobs — and accept the residual false-positive rate. The unscanned channel was the largest single exfil hole; closing it is worth a modest reduction in DLP precision.

If the gate work is deferred, an interim is to body-scan even with the token in place: the token will be in every Anthropic request's Authorization header anyway, and pipelock's header DLP is already on. The risk is false-positive blocks on legitimate conversation prose; opt-in initially.

3. Egress data budget

PRD 0001 §Resolution rules reserves data_budget as a future field under the existing egress block. Cap small (e.g. 1 MiB outbound per session to non-Anthropic hosts). This doesn't stop one-secret one-time exfil — a 4 KiB token still fits — but:

  • Bulk exfil (.git history dumps, full credential stores, model weights, large datasets) becomes visibly noisy or impossible.
  • Repeated small exfil attempts trip the budget and produce a log line / alert.
  • Forces the attacker to be efficient; efficient channels are easier to fingerprint after the fact.

4. Pre-clean .git history when --cwd is used

provision/git.py:_provision_cwd_git copies the host cwd's .git into /home/node/workspace/.git wholesale. Git history contains every secret ever committed, even if removed from HEAD. The agent reads them with git log --all -p — a free, unscanned exfil channel via any allowed host.

Add a provision-time gitleaks scan of the host .git before copy, and refuse to launch (or quarantine the offending commits into a read-only orphan branch view) on hit. Doesn't help the legitimate case where the user wants to debug a leaked-secret commit, but gates the common case where the user is unaware of historical secrets in the repo. Pairs with the push-side gitleaks already in the git-gate.

5. Manifest hygiene: explicit secret tagging

Today bottle.env accepts any entry. A ?prompt-at-runtime or ${HOST_VAR} interpolation has no special status vs. a literal — the manifest can't tell which entries are credential material and which are configuration.

Add a secret: true flag (or a secrets: sibling of env:) that:

  • Surfaces those entries prominently in the y/N preflight ("this bottle carries 3 secret values: GITEA_TOKEN, GH_PAT, AWS_SECRET_ACCESS_KEY").
  • Refuses to launch if egress.allowlist contains any host that is not source-controlled by the user (heuristic: not on a built-in KNOWN_FORGE_HOSTS list).
  • Forces an explicit acknowledgement that a credential is being placed into the bottle rather than behind a gate.

Forces the design pressure toward gates for any new credential type, rather than reaching for bottle.env.

6. Detection over prevention for the residual

For everything that remains after the above — and a residual will always remain — log every outbound request to allowlisted hosts in structured form: timestamp, host, path, byte count, content hash, request shape. Don't block; alert. Offline analysis surfaces per-host byte-count outliers, entropy histograms, base64-shaped path/body distributions, and repeat patterns.

This doesn't catch the first careful exfil. It does catch repeated abuse, scaled exfil, and the second-attempt-after-discovery, and provides forensic trail for the first incident. Pipelock's stdout-only logging today (PRD 0001 §Out of scope: "Audit logging or persistent log storage of pipelock decisions") is the precursor; this would be its persistent / structured follow-up.

7. Shorten OAuth token lifetime

The claude setup-token-issued OAuth token is ~1 year. If a leak happens, blast radius is a year of access to the user's Pro/Max subscription. Even without the gate (#1), encouraging users to rotate monthly bounds the damage. With the gate, the gate can in principle negotiate fresh short-lived tokens per session (depends on whether Anthropic ever ships an OAuth refresh / child-token flow — closed today per claude-code-token-revocation.md).

What you don't get

Even with all of the above:

  • An agent that can talk to any LLM can use the conversation as a covert channel — encode bits in word choice, response length, formatting. Pipelock can scan for known patterns but cannot stop arbitrary content over a channel whose purpose is arbitrary content.
  • An agent with shell access to any remote host via ssh-gate can exfil through the SSH session. The ssh-gate is L4 by design (PRD 0007 §Non-goals: "SSH-protocol awareness"); generalizing the gate pattern to SSH content inspection is the analog of building the git-gate for arbitrary SSH operations, which is out of scope.
  • A determined adversary always wins against a single boundary. The point of stacking the above is to make the cost of the first exfil higher than the value of the secrets in scope, and to make repeated exfil detectable.

In priority order:

  1. Anthropic-API gate (agent-credential-proxy-landscape.md §Recommended). Removes the highest-value secret and closes the passthrough hole as a side effect.
  2. Forge-API gate (same doc, same section — one proxy generalizes across Gitea / GitHub / GitLab by config).
  3. Egress data budget in pipelock — small lift, large damage bound.
  4. Pre-launch .git gitleaks scan for --cwd bottles.
  5. secret: true manifest flag + preflight surfacing.
  6. Structured pipelock logs persisted somewhere durable; offline analysis as a separate follow-up.

The encoding/splitting attack stops being load-bearing once the bottle doesn't have the secret the attacker wants to encode. Every item above is a step toward that property.

References

  • agent-credential-proxy-landscape.md — gate pattern, kernel boundary mechanics, per-service wiring (Anthropic / Gitea / GitHub / GitLab), proxy topologies, landscape of existing tools (Docker AI Sandboxes, Cloudflare Sandbox Auth, Infisical Agent Vault, nono, Aembit, LiteLLM, Portkey, …) and a build-vs-adopt verdict.
  • secret-exfil-tripwire-encodings.md — inverse problem: detecting a known secret across encodings; explains why generic DLP can't enumerate the encoded forms.
  • network-egress-guard.md — v1 iptables / dnsmasq baseline; the layer below pipelock that catches DNS-subdomain exfil pipelock doesn't see.
  • pipelock-assessment.md — why pipelock was chosen, what its DLP scanners do and don't cover.
  • claude-code-token-revocation.md — Anthropic OAuth refresh story; informs the token-lifetime mitigation.
  • PRD 0001 — pipelock topology and the data_budget reservation.
  • PRD 0006 — TLS interception; defines the DEFAULT_TLS_PASSTHROUGH set this doc proposes to drop.
  • PRD 0008 — git-gate; the proof-of-concept for the gate pattern.