docs: draft PRD prd-new for strengthen-outbound-exfil-detection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 00:54:28 +00:00
parent ecaae708f7
commit ea6bc5a170
1 changed files with 186 additions and 0 deletions
@@ -0,0 +1,186 @@
+# PRD prd-new: Strengthen outbound exfiltration detection
+
+- **Status:** Draft
+- **Author:** claude
+- **Created:** 2026-06-25
+- **Issue:** #259
+
+## Summary
+
+The current DLP system has two detection tiers: weak (vendor token regexes)
+and strong (exact matching of provisioned `EGRESS_TOKEN_*` values in multiple
+encodings). This PRD adds four complementary hardening layers in ranked order
+of impact:
+
+1. **Canary tokens** — inject a per-session fake secret into the agent's
+   environment; any outbound appearance is a zero-false-positive exfil signal.
+2. **Broadened known-value scanning** — extend the strong tier beyond
+   `EGRESS_TOKEN_*` via a `BOT_BOTTLE_SENSITIVE_PREFIXES` sidecar env var.
+3. **Fragmentation-resistant matching** — catch separator injection and
+   long-substring leaks via alnum projection and sliding-window substring
+   search.
+4. **Entropy scoring** — warn-only flag on high-entropy blobs as a
+   complementary signal for encrypted exfil.
+
+## Problem
+
+After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path,
+query, headers, body). But the content-based strong tier only matches
+`EGRESS_TOKEN_*` values and their encodings. Three gaps remain:
+
+| Gap | How an agent exploits it |
+|-----|--------------------------|
+| No canary | There is no "can't be false-positive" signal |
+| Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to `scan_known_secrets` |
+| Fragmentation blindness | Inserting `-`, space, or `\n` between secret characters turns an exact-match into a miss |
+
+## Goals / Success Criteria
+
+1. Each launched bottle has a unique canary token in the agent's environment
+   (`BOT_BOTTLE_CANARY`) and the egress sidecar's environment
+   (`EGRESS_TOKEN_CANARY`). Any outbound appearance of the canary blocks the
+   request with reason `"canary token"`.
+2. `scan_known_secrets` accepts a `sensitive_prefixes` parameter (default:
+   `("EGRESS_TOKEN_",)`). `scan_outbound` reads
+   `BOT_BOTTLE_SENSITIVE_PREFIXES` from `environ` and merges those prefixes
+   in, so operators can mark additional env vars as scanned values without
+   changing the manifest schema.
+3. For every secret that passes exact-match, a secondary alnum-projection pass
+   checks for the secret with all non-alphanumeric characters stripped. This
+   catches separator-injection evasion (`MY-SECRET` → body contains
+   `MY SECRET`).
+4. A sliding-window partial-match pass checks for long-enough contiguous
+   substrings of the secret's alnum projection in the text's alnum projection.
+   Any match ≥ `PARTIAL_MATCH_MIN_LEN` (12 chars) blocks with reason
+   `"partial match"`.
+5. A new `scan_entropy` detector flags outbound text windows with Shannon
+   entropy ≥ `ENTROPY_BLOCK_THRESHOLD` (5.5 bits/char) at **warn** severity
+   only. It is registered under the new detector name `"entropy"` in
+   `OUTBOUND_DETECTOR_NAMES` and disabled by default (routes must opt in).
+6. Binary request bodies are decoded via `latin-1` instead of
+   `utf-8 errors="replace"`, preserving every byte value and allowing
+   ASCII-range secrets to be found within binary payloads.
+7. All new behaviour is unit-tested; existing tests pass unchanged.
+
+## Non-goals
+
+- Rolling per-host buffer for split-across-requests detection (state in the
+  stateless addon is complex; deferred).
+- Additional vendor regexes.
+- ML / embedding-based detection.
+- Entropy-based hard blocks (warn only per the issue).
+
+## Design
+
+### Canary token flow
+
+```
+Egress.prepare()
+  canary = secrets.token_urlsafe(32)
+  EgressPlan(canary=canary, ...)
+
+Docker compose render:
+  sidecar env: EGRESS_TOKEN_CANARY=<canary>   ← scanned by existing known-secrets detector
+  agent env:   BOT_BOTTLE_CANARY=<canary>      ← visible to agent as a "secret"
+
+macos-container launch: same literals added to sidecar + agent env entries
+```
+
+`EGRESS_TOKEN_CANARY` matches the `EGRESS_TOKEN_` prefix already scanned by
+`scan_known_secrets`, so no detector code changes are required for canary
+detection — only the injection path.
+
+### Broadened known-value scanning
+
+`scan_known_secrets` gains a `sensitive_prefixes` parameter:
+
+```python
+def scan_known_secrets(
+    text: str,
+    *,
+    location: str = "body",
+    env: Mapping[str, str] | None = None,
+    sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",),
+) -> ScanResult | None:
+```
+
+`scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` (comma-separated list
+of additional prefixes) from `environ` and appends them:
+
+```python
+extra = tuple(
+    p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p
+)
+sensitive_prefixes = ("EGRESS_TOKEN_",) + extra
+```
+
+`redact_tokens` receives the same treatment for consistent redaction.
+
+### Fragmentation-resistant matching
+
+A new helper `_alnum_projection(text)` strips all non-alphanumeric characters.
+`scan_known_secrets` runs two passes per secret:
+
+1. **Exact pass** — existing encoded-variant loop (unchanged).
+2. **Alnum-projection pass** — if the secret's alnum projection has ≥ 8 chars,
+   check if it appears in the text's alnum projection. Match → block with
+   `"fragmented match (separator injection)"` reason.
+3. **Partial-substring pass** — if the secret's alnum projection has ≥
+   `PARTIAL_MATCH_MIN_LEN` chars (12), slide a window of that length across the
+   secret's projection and look for each window in the text's alnum projection.
+   First match → block with `"partial match"` reason.
+
+All three passes run only for the `"known_secrets"` detector; the token-pattern
+and entropy detectors are unchanged.
+
+### Entropy scoring
+
+New public function:
+
+```python
+def scan_entropy(
+    text: str,
+    *,
+    location: str = "body",
+    window: int = ENTROPY_WINDOW,           # 64
+    threshold: float = ENTROPY_BLOCK_THRESHOLD,  # 5.5
+) -> ScanResult | None:
+```
+
+Slides a window of `window` characters across `text` in steps of `window // 2`.
+If any window's Shannon entropy exceeds `threshold`, returns a **warn**-severity
+`ScanResult`. Never blocks.
+
+`OUTBOUND_DETECTOR_NAMES` gains `"entropy"`. Routes opt in via their `dlp`
+block; entropy scanning is **off by default** to avoid false-positive noise on
+legitimate binary payloads.
+
+### Binary body handling
+
+In `scan_outbound`, the bytes → str decoding changes from:
+
+```python
+body.decode("utf-8", errors="replace")
+```
+
+to:
+
+```python
+body.decode("utf-8") if body is str else body.decode("latin-1")
+```
+
+`latin-1` is a bijective byte↔codepoint mapping; every byte value is preserved
+as its corresponding Latin-1 code point, so ASCII-range secret strings remain
+intact and `str.find` / regex still locate them correctly. The fallback from
+strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully.
+
+## Implementation
+
+Delivered in three commits on the same branch:
+
+1. **DLP detector changes** — `_alnum_projection`, fragmentation passes,
+   `scan_entropy`, broadened `scan_known_secrets`, updated `scan_outbound` and
+   `redact_tokens`; all accompanying unit tests.
+2. **Canary injection** — `EgressPlan.canary`, `Egress.prepare()`,
+   Docker compose + macos-container backend injection.
+3. **PRD flip** — `Status: Draft → Active`.