# PRD 0063: Strengthen outbound exfiltration detection - **Status:** Active - **Author:** claude - **Created:** 2026-06-25 - **Issue:** #259 ## Summary The current DLP system has two detection tiers: weak (vendor token regexes) and strong (exact matching of provisioned `EGRESS_TOKEN_*` values in multiple encodings). This PRD adds four complementary hardening layers in ranked order of impact: 1. **Canary tokens** — inject a per-session fake secret into the agent's environment; any outbound appearance is a zero-false-positive exfil signal. 2. **Broadened known-value scanning** — extend the strong tier beyond `EGRESS_TOKEN_*` via a `BOT_BOTTLE_SENSITIVE_PREFIXES` sidecar env var. 3. **Fragmentation-resistant matching** — catch separator injection and long-substring leaks via alnum projection and sliding-window substring search. 4. **Entropy scoring** — warn-only flag on high-entropy blobs as a complementary signal for encrypted exfil. ## Problem After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path, query, headers, body). But the content-based strong tier only matches `EGRESS_TOKEN_*` values and their encodings. Three gaps remain: | Gap | How an agent exploits it | |-----|--------------------------| | No canary | There is no "can't be false-positive" signal | | Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to `scan_known_secrets` | | Fragmentation blindness | Inserting `-`, space, or `\n` between secret characters turns an exact-match into a miss | ## Goals / Success Criteria 1. Each launched bottle has a unique canary token in the agent's environment under a randomized `WORD_WORD_SECRET` env var name. The egress sidecar gets the same env var and registers that exact name through `BOT_BOTTLE_SENSITIVE_PREFIXES`. Any outbound appearance of the canary blocks the request as a known-secret match. 2. `scan_known_secrets` accepts a `sensitive_prefixes` parameter (default: `("EGRESS_TOKEN_",)`). `scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` from `environ` and merges those prefixes in, so operators can mark additional env vars as scanned values without changing the manifest schema. 3. For every secret that passes exact-match, a secondary alnum-projection pass checks for the secret with all non-alphanumeric characters stripped. This catches separator-injection evasion (`MY-SECRET` → body contains `MY SECRET`). 4. A sliding-window partial-match pass checks for long-enough contiguous substrings of the secret's alnum projection in the text's alnum projection. Any match ≥ `PARTIAL_MATCH_MIN_LEN` (12 chars) blocks with reason `"partial match"`. 5. A new `scan_entropy` detector flags outbound text windows with Shannon entropy ≥ `ENTROPY_BLOCK_THRESHOLD` (5.5 bits/char) at **warn** severity only. It is registered under the new detector name `"entropy"` in `OUTBOUND_DETECTOR_NAMES` and disabled by default (routes must opt in). 6. Binary request bodies are decoded via `latin-1` instead of `utf-8 errors="replace"`, preserving every byte value and allowing ASCII-range secrets to be found within binary payloads. 7. All new behaviour is unit-tested; existing tests pass unchanged. ## Non-goals - Rolling per-host buffer for split-across-requests detection (state in the stateless addon is complex; deferred). - Additional vendor regexes. - ML / embedding-based detection. - Entropy-based hard blocks (warn only per the issue). ## Design ### Canary token flow ``` Egress.prepare() canary = secrets.token_urlsafe(32) canary_env = EgressPlan(canary=canary, canary_env=canary_env, ...) Docker compose render: sidecar env: = sidecar env: BOT_BOTTLE_SENSITIVE_PREFIXES= agent env: = ← visible to agent as a "secret" macos-container launch: same literals added to sidecar + agent env entries ``` The sidecar uses `BOT_BOTTLE_SENSITIVE_PREFIXES` to make the random canary env name part of the existing `scan_known_secrets` detector without adding a manifest schema field. ### Broadened known-value scanning `scan_known_secrets` gains a `sensitive_prefixes` parameter: ```python def scan_known_secrets( text: str, *, location: str = "body", env: Mapping[str, str] | None = None, sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",), ) -> ScanResult | None: ``` `scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` (comma-separated list of additional prefixes) from `environ` and appends them: ```python extra = tuple( p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p ) sensitive_prefixes = ("EGRESS_TOKEN_",) + extra ``` `redact_tokens` receives the same treatment for consistent redaction. ### Fragmentation-resistant matching A new helper `_alnum_projection(text)` strips all non-alphanumeric characters. `scan_known_secrets` runs two passes per secret: 1. **Exact pass** — existing encoded-variant loop (unchanged). 2. **Alnum-projection pass** — if the secret's alnum projection has ≥ 8 chars, check if it appears in the text's alnum projection. Match → block with `"fragmented match (separator injection)"` reason. 3. **Partial-substring pass** — if the secret's alnum projection has ≥ `PARTIAL_MATCH_MIN_LEN` chars (12), slide a window of that length across the secret's projection and look for each window in the text's alnum projection. First match → block with `"partial match"` reason. All three passes run only for the `"known_secrets"` detector; the token-pattern and entropy detectors are unchanged. ### Entropy scoring New public function: ```python def scan_entropy( text: str, *, location: str = "body", window: int = ENTROPY_WINDOW, # 64 threshold: float = ENTROPY_BLOCK_THRESHOLD, # 5.5 ) -> ScanResult | None: ``` Slides a window of `window` characters across `text` in steps of `window // 2`. If any window's Shannon entropy exceeds `threshold`, returns a **warn**-severity `ScanResult`. Never blocks. `OUTBOUND_DETECTOR_NAMES` gains `"entropy"`. Routes opt in via their `dlp` block; entropy scanning is **off by default** to avoid false-positive noise on legitimate binary payloads. ### Binary body handling In `scan_outbound`, the bytes → str decoding changes from: ```python body.decode("utf-8", errors="replace") ``` to: ```python body.decode("utf-8") if body is str else body.decode("latin-1") ``` `latin-1` is a bijective byte↔codepoint mapping; every byte value is preserved as its corresponding Latin-1 code point, so ASCII-range secret strings remain intact and `str.find` / regex still locate them correctly. The fallback from strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully. ## Implementation Delivered in three commits on the same branch: 1. **DLP detector changes** — `_alnum_projection`, fragmentation passes, `scan_entropy`, broadened `scan_known_secrets`, updated `scan_outbound` and `redact_tokens`; all accompanying unit tests. 2. **Canary injection** — `EgressPlan.canary`, `Egress.prepare()`, Docker compose + macos-container backend injection. 3. **PRD flip** — `Status: Draft → Active`.