Files
bot-bottle/docs/prds/0063-strengthen-outbound-exfil-detection.md
2026-06-25 04:19:30 +00:00

7.2 KiB

PRD 0063: Strengthen outbound exfiltration detection

  • Status: Active
  • Author: claude
  • Created: 2026-06-25
  • Issue: #259

Summary

The current DLP system has two detection tiers: weak (vendor token regexes) and strong (exact matching of provisioned EGRESS_TOKEN_* values in multiple encodings). This PRD adds four complementary hardening layers in ranked order of impact:

  1. Canary tokens — inject a per-session fake secret into the agent's environment; any outbound appearance is a zero-false-positive exfil signal.
  2. Broadened known-value scanning — extend the strong tier beyond EGRESS_TOKEN_* via a BOT_BOTTLE_SENSITIVE_PREFIXES sidecar env var.
  3. Fragmentation-resistant matching — catch separator injection and long-substring leaks via alnum projection and sliding-window substring search.
  4. Entropy scoring — warn-only flag on high-entropy blobs as a complementary signal for encrypted exfil.

Problem

After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path, query, headers, body). But the content-based strong tier only matches EGRESS_TOKEN_* values and their encodings. Three gaps remain:

Gap How an agent exploits it
No canary There is no "can't be false-positive" signal
Narrow prefix filter Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to scan_known_secrets
Fragmentation blindness Inserting -, space, or \n between secret characters turns an exact-match into a miss

Goals / Success Criteria

  1. Each launched bottle has a unique canary token in the agent's environment under a randomized WORD_WORD_SECRET env var name. The egress sidecar gets the same env var and registers that exact name through BOT_BOTTLE_SENSITIVE_PREFIXES. Any outbound appearance of the canary blocks the request as a known-secret match.
  2. scan_known_secrets accepts a sensitive_prefixes parameter (default: ("EGRESS_TOKEN_",)). scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ and merges those prefixes in, so operators can mark additional env vars as scanned values without changing the manifest schema.
  3. For every secret that passes exact-match, a secondary alnum-projection pass checks for the secret with all non-alphanumeric characters stripped. This catches separator-injection evasion (MY-SECRET → body contains MY SECRET).
  4. A sliding-window partial-match pass checks for long-enough contiguous substrings of the secret's alnum projection in the text's alnum projection. Any match ≥ PARTIAL_MATCH_MIN_LEN (12 chars) blocks with reason "partial match".
  5. A new scan_entropy detector flags outbound text windows with Shannon entropy ≥ ENTROPY_BLOCK_THRESHOLD (5.5 bits/char) at warn severity only. It is registered under the new detector name "entropy" in OUTBOUND_DETECTOR_NAMES and disabled by default (routes must opt in).
  6. Binary request bodies are decoded via latin-1 instead of utf-8 errors="replace", preserving every byte value and allowing ASCII-range secrets to be found within binary payloads.
  7. All new behaviour is unit-tested; existing tests pass unchanged.

Non-goals

  • Rolling per-host buffer for split-across-requests detection (state in the stateless addon is complex; deferred).
  • Additional vendor regexes.
  • ML / embedding-based detection.
  • Entropy-based hard blocks (warn only per the issue).

Design

Canary token flow

Egress.prepare()
  canary = secrets.token_urlsafe(32)
  canary_env = <random WORD_WORD_SECRET>
  EgressPlan(canary=canary, canary_env=canary_env, ...)

Docker compose render:
  sidecar env: <canary_env>=<canary>
  sidecar env: BOT_BOTTLE_SENSITIVE_PREFIXES=<canary_env>
  agent env:   <canary_env>=<canary>      ← visible to agent as a "secret"

macos-container launch: same literals added to sidecar + agent env entries

The sidecar uses BOT_BOTTLE_SENSITIVE_PREFIXES to make the random canary env name part of the existing scan_known_secrets detector without adding a manifest schema field.

Broadened known-value scanning

scan_known_secrets gains a sensitive_prefixes parameter:

def scan_known_secrets(
    text: str,
    *,
    location: str = "body",
    env: Mapping[str, str] | None = None,
    sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",),
) -> ScanResult | None:

scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES (comma-separated list of additional prefixes) from environ and appends them:

extra = tuple(
    p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p
)
sensitive_prefixes = ("EGRESS_TOKEN_",) + extra

redact_tokens receives the same treatment for consistent redaction.

Fragmentation-resistant matching

A new helper _alnum_projection(text) strips all non-alphanumeric characters. scan_known_secrets runs two passes per secret:

  1. Exact pass — existing encoded-variant loop (unchanged).
  2. Alnum-projection pass — if the secret's alnum projection has ≥ 8 chars, check if it appears in the text's alnum projection. Match → block with "fragmented match (separator injection)" reason.
  3. Partial-substring pass — if the secret's alnum projection has ≥ PARTIAL_MATCH_MIN_LEN chars (12), slide a window of that length across the secret's projection and look for each window in the text's alnum projection. First match → block with "partial match" reason.

All three passes run only for the "known_secrets" detector; the token-pattern and entropy detectors are unchanged.

Entropy scoring

New public function:

def scan_entropy(
    text: str,
    *,
    location: str = "body",
    window: int = ENTROPY_WINDOW,           # 64
    threshold: float = ENTROPY_BLOCK_THRESHOLD,  # 5.5
) -> ScanResult | None:

Slides a window of window characters across text in steps of window // 2. If any window's Shannon entropy exceeds threshold, returns a warn-severity ScanResult. Never blocks.

OUTBOUND_DETECTOR_NAMES gains "entropy". Routes opt in via their dlp block; entropy scanning is off by default to avoid false-positive noise on legitimate binary payloads.

Binary body handling

In scan_outbound, the bytes → str decoding changes from:

body.decode("utf-8", errors="replace")

to:

body.decode("utf-8") if body is str else body.decode("latin-1")

latin-1 is a bijective byte↔codepoint mapping; every byte value is preserved as its corresponding Latin-1 code point, so ASCII-range secret strings remain intact and str.find / regex still locate them correctly. The fallback from strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully.

Implementation

Delivered in three commits on the same branch:

  1. DLP detector changes_alnum_projection, fragmentation passes, scan_entropy, broadened scan_known_secrets, updated scan_outbound and redact_tokens; all accompanying unit tests.
  2. Canary injectionEgressPlan.canary, Egress.prepare(), Docker compose + macos-container backend injection.
  3. PRD flipStatus: Draft → Active.