diff --git a/docs/prds/prd-new-strengthen-outbound-exfil-detection.md b/docs/prds/prd-new-strengthen-outbound-exfil-detection.md new file mode 100644 index 0000000..1f55627 --- /dev/null +++ b/docs/prds/prd-new-strengthen-outbound-exfil-detection.md @@ -0,0 +1,186 @@ +# PRD prd-new: Strengthen outbound exfiltration detection + +- **Status:** Draft +- **Author:** claude +- **Created:** 2026-06-25 +- **Issue:** #259 + +## Summary + +The current DLP system has two detection tiers: weak (vendor token regexes) +and strong (exact matching of provisioned `EGRESS_TOKEN_*` values in multiple +encodings). This PRD adds four complementary hardening layers in ranked order +of impact: + +1. **Canary tokens** — inject a per-session fake secret into the agent's + environment; any outbound appearance is a zero-false-positive exfil signal. +2. **Broadened known-value scanning** — extend the strong tier beyond + `EGRESS_TOKEN_*` via a `BOT_BOTTLE_SENSITIVE_PREFIXES` sidecar env var. +3. **Fragmentation-resistant matching** — catch separator injection and + long-substring leaks via alnum projection and sliding-window substring + search. +4. **Entropy scoring** — warn-only flag on high-entropy blobs as a + complementary signal for encrypted exfil. + +## Problem + +After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path, +query, headers, body). But the content-based strong tier only matches +`EGRESS_TOKEN_*` values and their encodings. Three gaps remain: + +| Gap | How an agent exploits it | +|-----|--------------------------| +| No canary | There is no "can't be false-positive" signal | +| Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to `scan_known_secrets` | +| Fragmentation blindness | Inserting `-`, space, or `\n` between secret characters turns an exact-match into a miss | + +## Goals / Success Criteria + +1. Each launched bottle has a unique canary token in the agent's environment + (`BOT_BOTTLE_CANARY`) and the egress sidecar's environment + (`EGRESS_TOKEN_CANARY`). Any outbound appearance of the canary blocks the + request with reason `"canary token"`. +2. `scan_known_secrets` accepts a `sensitive_prefixes` parameter (default: + `("EGRESS_TOKEN_",)`). `scan_outbound` reads + `BOT_BOTTLE_SENSITIVE_PREFIXES` from `environ` and merges those prefixes + in, so operators can mark additional env vars as scanned values without + changing the manifest schema. +3. For every secret that passes exact-match, a secondary alnum-projection pass + checks for the secret with all non-alphanumeric characters stripped. This + catches separator-injection evasion (`MY-SECRET` → body contains + `MY SECRET`). +4. A sliding-window partial-match pass checks for long-enough contiguous + substrings of the secret's alnum projection in the text's alnum projection. + Any match ≥ `PARTIAL_MATCH_MIN_LEN` (12 chars) blocks with reason + `"partial match"`. +5. A new `scan_entropy` detector flags outbound text windows with Shannon + entropy ≥ `ENTROPY_BLOCK_THRESHOLD` (5.5 bits/char) at **warn** severity + only. It is registered under the new detector name `"entropy"` in + `OUTBOUND_DETECTOR_NAMES` and disabled by default (routes must opt in). +6. Binary request bodies are decoded via `latin-1` instead of + `utf-8 errors="replace"`, preserving every byte value and allowing + ASCII-range secrets to be found within binary payloads. +7. All new behaviour is unit-tested; existing tests pass unchanged. + +## Non-goals + +- Rolling per-host buffer for split-across-requests detection (state in the + stateless addon is complex; deferred). +- Additional vendor regexes. +- ML / embedding-based detection. +- Entropy-based hard blocks (warn only per the issue). + +## Design + +### Canary token flow + +``` +Egress.prepare() + canary = secrets.token_urlsafe(32) + EgressPlan(canary=canary, ...) + +Docker compose render: + sidecar env: EGRESS_TOKEN_CANARY= ← scanned by existing known-secrets detector + agent env: BOT_BOTTLE_CANARY= ← visible to agent as a "secret" + +macos-container launch: same literals added to sidecar + agent env entries +``` + +`EGRESS_TOKEN_CANARY` matches the `EGRESS_TOKEN_` prefix already scanned by +`scan_known_secrets`, so no detector code changes are required for canary +detection — only the injection path. + +### Broadened known-value scanning + +`scan_known_secrets` gains a `sensitive_prefixes` parameter: + +```python +def scan_known_secrets( + text: str, + *, + location: str = "body", + env: Mapping[str, str] | None = None, + sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",), +) -> ScanResult | None: +``` + +`scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` (comma-separated list +of additional prefixes) from `environ` and appends them: + +```python +extra = tuple( + p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p +) +sensitive_prefixes = ("EGRESS_TOKEN_",) + extra +``` + +`redact_tokens` receives the same treatment for consistent redaction. + +### Fragmentation-resistant matching + +A new helper `_alnum_projection(text)` strips all non-alphanumeric characters. +`scan_known_secrets` runs two passes per secret: + +1. **Exact pass** — existing encoded-variant loop (unchanged). +2. **Alnum-projection pass** — if the secret's alnum projection has ≥ 8 chars, + check if it appears in the text's alnum projection. Match → block with + `"fragmented match (separator injection)"` reason. +3. **Partial-substring pass** — if the secret's alnum projection has ≥ + `PARTIAL_MATCH_MIN_LEN` chars (12), slide a window of that length across the + secret's projection and look for each window in the text's alnum projection. + First match → block with `"partial match"` reason. + +All three passes run only for the `"known_secrets"` detector; the token-pattern +and entropy detectors are unchanged. + +### Entropy scoring + +New public function: + +```python +def scan_entropy( + text: str, + *, + location: str = "body", + window: int = ENTROPY_WINDOW, # 64 + threshold: float = ENTROPY_BLOCK_THRESHOLD, # 5.5 +) -> ScanResult | None: +``` + +Slides a window of `window` characters across `text` in steps of `window // 2`. +If any window's Shannon entropy exceeds `threshold`, returns a **warn**-severity +`ScanResult`. Never blocks. + +`OUTBOUND_DETECTOR_NAMES` gains `"entropy"`. Routes opt in via their `dlp` +block; entropy scanning is **off by default** to avoid false-positive noise on +legitimate binary payloads. + +### Binary body handling + +In `scan_outbound`, the bytes → str decoding changes from: + +```python +body.decode("utf-8", errors="replace") +``` + +to: + +```python +body.decode("utf-8") if body is str else body.decode("latin-1") +``` + +`latin-1` is a bijective byte↔codepoint mapping; every byte value is preserved +as its corresponding Latin-1 code point, so ASCII-range secret strings remain +intact and `str.find` / regex still locate them correctly. The fallback from +strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully. + +## Implementation + +Delivered in three commits on the same branch: + +1. **DLP detector changes** — `_alnum_projection`, fragmentation passes, + `scan_entropy`, broadened `scan_known_secrets`, updated `scan_outbound` and + `redact_tokens`; all accompanying unit tests. +2. **Canary injection** — `EgressPlan.canary`, `Egress.prepare()`, + Docker compose + macos-container backend injection. +3. **PRD flip** — `Status: Draft → Active`.