Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.1 KiB
PRD prd-new: Strengthen outbound exfiltration detection
- Status: Draft
- Author: claude
- Created: 2026-06-25
- Issue: #259
Summary
The current DLP system has two detection tiers: weak (vendor token regexes)
and strong (exact matching of provisioned EGRESS_TOKEN_* values in multiple
encodings). This PRD adds four complementary hardening layers in ranked order
of impact:
- Canary tokens — inject a per-session fake secret into the agent's environment; any outbound appearance is a zero-false-positive exfil signal.
- Broadened known-value scanning — extend the strong tier beyond
EGRESS_TOKEN_*via aBOT_BOTTLE_SENSITIVE_PREFIXESsidecar env var. - Fragmentation-resistant matching — catch separator injection and long-substring leaks via alnum projection and sliding-window substring search.
- Entropy scoring — warn-only flag on high-entropy blobs as a complementary signal for encrypted exfil.
Problem
After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path,
query, headers, body). But the content-based strong tier only matches
EGRESS_TOKEN_* values and their encodings. Three gaps remain:
| Gap | How an agent exploits it |
|---|---|
| No canary | There is no "can't be false-positive" signal |
| Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to scan_known_secrets |
| Fragmentation blindness | Inserting -, space, or \n between secret characters turns an exact-match into a miss |
Goals / Success Criteria
- Each launched bottle has a unique canary token in the agent's environment
(
BOT_BOTTLE_CANARY) and the egress sidecar's environment (EGRESS_TOKEN_CANARY). Any outbound appearance of the canary blocks the request with reason"canary token". scan_known_secretsaccepts asensitive_prefixesparameter (default:("EGRESS_TOKEN_",)).scan_outboundreadsBOT_BOTTLE_SENSITIVE_PREFIXESfromenvironand merges those prefixes in, so operators can mark additional env vars as scanned values without changing the manifest schema.- For every secret that passes exact-match, a secondary alnum-projection pass
checks for the secret with all non-alphanumeric characters stripped. This
catches separator-injection evasion (
MY-SECRET→ body containsMY SECRET). - A sliding-window partial-match pass checks for long-enough contiguous
substrings of the secret's alnum projection in the text's alnum projection.
Any match ≥
PARTIAL_MATCH_MIN_LEN(12 chars) blocks with reason"partial match". - A new
scan_entropydetector flags outbound text windows with Shannon entropy ≥ENTROPY_BLOCK_THRESHOLD(5.5 bits/char) at warn severity only. It is registered under the new detector name"entropy"inOUTBOUND_DETECTOR_NAMESand disabled by default (routes must opt in). - Binary request bodies are decoded via
latin-1instead ofutf-8 errors="replace", preserving every byte value and allowing ASCII-range secrets to be found within binary payloads. - All new behaviour is unit-tested; existing tests pass unchanged.
Non-goals
- Rolling per-host buffer for split-across-requests detection (state in the stateless addon is complex; deferred).
- Additional vendor regexes.
- ML / embedding-based detection.
- Entropy-based hard blocks (warn only per the issue).
Design
Canary token flow
Egress.prepare()
canary = secrets.token_urlsafe(32)
EgressPlan(canary=canary, ...)
Docker compose render:
sidecar env: EGRESS_TOKEN_CANARY=<canary> ← scanned by existing known-secrets detector
agent env: BOT_BOTTLE_CANARY=<canary> ← visible to agent as a "secret"
macos-container launch: same literals added to sidecar + agent env entries
EGRESS_TOKEN_CANARY matches the EGRESS_TOKEN_ prefix already scanned by
scan_known_secrets, so no detector code changes are required for canary
detection — only the injection path.
Broadened known-value scanning
scan_known_secrets gains a sensitive_prefixes parameter:
def scan_known_secrets(
text: str,
*,
location: str = "body",
env: Mapping[str, str] | None = None,
sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",),
) -> ScanResult | None:
scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES (comma-separated list
of additional prefixes) from environ and appends them:
extra = tuple(
p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p
)
sensitive_prefixes = ("EGRESS_TOKEN_",) + extra
redact_tokens receives the same treatment for consistent redaction.
Fragmentation-resistant matching
A new helper _alnum_projection(text) strips all non-alphanumeric characters.
scan_known_secrets runs two passes per secret:
- Exact pass — existing encoded-variant loop (unchanged).
- Alnum-projection pass — if the secret's alnum projection has ≥ 8 chars,
check if it appears in the text's alnum projection. Match → block with
"fragmented match (separator injection)"reason. - Partial-substring pass — if the secret's alnum projection has ≥
PARTIAL_MATCH_MIN_LENchars (12), slide a window of that length across the secret's projection and look for each window in the text's alnum projection. First match → block with"partial match"reason.
All three passes run only for the "known_secrets" detector; the token-pattern
and entropy detectors are unchanged.
Entropy scoring
New public function:
def scan_entropy(
text: str,
*,
location: str = "body",
window: int = ENTROPY_WINDOW, # 64
threshold: float = ENTROPY_BLOCK_THRESHOLD, # 5.5
) -> ScanResult | None:
Slides a window of window characters across text in steps of window // 2.
If any window's Shannon entropy exceeds threshold, returns a warn-severity
ScanResult. Never blocks.
OUTBOUND_DETECTOR_NAMES gains "entropy". Routes opt in via their dlp
block; entropy scanning is off by default to avoid false-positive noise on
legitimate binary payloads.
Binary body handling
In scan_outbound, the bytes → str decoding changes from:
body.decode("utf-8", errors="replace")
to:
body.decode("utf-8") if body is str else body.decode("latin-1")
latin-1 is a bijective byte↔codepoint mapping; every byte value is preserved
as its corresponding Latin-1 code point, so ASCII-range secret strings remain
intact and str.find / regex still locate them correctly. The fallback from
strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully.
Implementation
Delivered in three commits on the same branch:
- DLP detector changes —
_alnum_projection, fragmentation passes,scan_entropy, broadenedscan_known_secrets, updatedscan_outboundandredact_tokens; all accompanying unit tests. - Canary injection —
EgressPlan.canary,Egress.prepare(), Docker compose + macos-container backend injection. - PRD flip —
Status: Draft → Active.