bot-bottle/docs/prds/prd-new-strengthen-outbound-exfil-detection.md

# PRD prd-new: Strengthen outbound exfiltration detection

- **Status:** Active
- **Author:** claude
- **Created:** 2026-06-25
- **Issue:** #259

## Summary

The current DLP system has two detection tiers: weak (vendor token regexes)
and strong (exact matching of provisioned `EGRESS_TOKEN_*` values in multiple
encodings). This PRD adds four complementary hardening layers in ranked order
of impact:

1. **Canary tokens** — inject a per-session fake secret into the agent's
   environment; any outbound appearance is a zero-false-positive exfil signal.
2. **Broadened known-value scanning** — extend the strong tier beyond
   `EGRESS_TOKEN_*` via a `BOT_BOTTLE_SENSITIVE_PREFIXES` sidecar env var.
3. **Fragmentation-resistant matching** — catch separator injection and
   long-substring leaks via alnum projection and sliding-window substring
   search.
4. **Entropy scoring** — warn-only flag on high-entropy blobs as a
   complementary signal for encrypted exfil.

## Problem

After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path,
query, headers, body). But the content-based strong tier only matches
`EGRESS_TOKEN_*` values and their encodings. Three gaps remain:

| Gap | How an agent exploits it |
|-----|--------------------------|
| No canary | There is no "can't be false-positive" signal |
| Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to `scan_known_secrets` |
| Fragmentation blindness | Inserting `-`, space, or `\n` between secret characters turns an exact-match into a miss |

## Goals / Success Criteria

1. Each launched bottle has a unique canary token in the agent's environment
   under a randomized `WORD_WORD_SECRET` env var name. The egress sidecar gets
   the same env var and registers that exact name through
   `BOT_BOTTLE_SENSITIVE_PREFIXES`. Any outbound appearance of the canary
   blocks the request as a known-secret match.
2. `scan_known_secrets` accepts a `sensitive_prefixes` parameter (default:
   `("EGRESS_TOKEN_",)`). `scan_outbound` reads
   `BOT_BOTTLE_SENSITIVE_PREFIXES` from `environ` and merges those prefixes
   in, so operators can mark additional env vars as scanned values without
   changing the manifest schema.
3. For every secret that passes exact-match, a secondary alnum-projection pass
   checks for the secret with all non-alphanumeric characters stripped. This
   catches separator-injection evasion (`MY-SECRET` → body contains
   `MY SECRET`).
4. A sliding-window partial-match pass checks for long-enough contiguous
   substrings of the secret's alnum projection in the text's alnum projection.
   Any match ≥ `PARTIAL_MATCH_MIN_LEN` (12 chars) blocks with reason
   `"partial match"`.
5. A new `scan_entropy` detector flags outbound text windows with Shannon
   entropy ≥ `ENTROPY_BLOCK_THRESHOLD` (5.5 bits/char) at **warn** severity
   only. It is registered under the new detector name `"entropy"` in
   `OUTBOUND_DETECTOR_NAMES` and disabled by default (routes must opt in).
6. Binary request bodies are decoded via `latin-1` instead of
   `utf-8 errors="replace"`, preserving every byte value and allowing
   ASCII-range secrets to be found within binary payloads.
7. All new behaviour is unit-tested; existing tests pass unchanged.

## Non-goals

- Rolling per-host buffer for split-across-requests detection (state in the
  stateless addon is complex; deferred).
- Additional vendor regexes.
- ML / embedding-based detection.
- Entropy-based hard blocks (warn only per the issue).

## Design

### Canary token flow

```
Egress.prepare()
  canary = secrets.token_urlsafe(32)
  canary_env = <random WORD_WORD_SECRET>
  EgressPlan(canary=canary, canary_env=canary_env, ...)

Docker compose render:
  sidecar env: <canary_env>=<canary>
  sidecar env: BOT_BOTTLE_SENSITIVE_PREFIXES=<canary_env>
  agent env:   <canary_env>=<canary>      ← visible to agent as a "secret"

macos-container launch: same literals added to sidecar + agent env entries
```

The sidecar uses `BOT_BOTTLE_SENSITIVE_PREFIXES` to make the random canary env
name part of the existing `scan_known_secrets` detector without adding a
manifest schema field.

### Broadened known-value scanning

`scan_known_secrets` gains a `sensitive_prefixes` parameter:

```python
def scan_known_secrets(
    text: str,
    *,
    location: str = "body",
    env: Mapping[str, str] | None = None,
    sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",),
) -> ScanResult | None:
```

`scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` (comma-separated list
of additional prefixes) from `environ` and appends them:

```python
extra = tuple(
    p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p
)
sensitive_prefixes = ("EGRESS_TOKEN_",) + extra
```

`redact_tokens` receives the same treatment for consistent redaction.

### Fragmentation-resistant matching

A new helper `_alnum_projection(text)` strips all non-alphanumeric characters.
`scan_known_secrets` runs two passes per secret:

1. **Exact pass** — existing encoded-variant loop (unchanged).
2. **Alnum-projection pass** — if the secret's alnum projection has ≥ 8 chars,
   check if it appears in the text's alnum projection. Match → block with
   `"fragmented match (separator injection)"` reason.
3. **Partial-substring pass** — if the secret's alnum projection has ≥
   `PARTIAL_MATCH_MIN_LEN` chars (12), slide a window of that length across the
   secret's projection and look for each window in the text's alnum projection.
   First match → block with `"partial match"` reason.

All three passes run only for the `"known_secrets"` detector; the token-pattern
and entropy detectors are unchanged.

### Entropy scoring

New public function:

```python
def scan_entropy(
    text: str,
    *,
    location: str = "body",
    window: int = ENTROPY_WINDOW,           # 64
    threshold: float = ENTROPY_BLOCK_THRESHOLD,  # 5.5
) -> ScanResult | None:
```

Slides a window of `window` characters across `text` in steps of `window // 2`.
If any window's Shannon entropy exceeds `threshold`, returns a **warn**-severity
`ScanResult`. Never blocks.

`OUTBOUND_DETECTOR_NAMES` gains `"entropy"`. Routes opt in via their `dlp`
block; entropy scanning is **off by default** to avoid false-positive noise on
legitimate binary payloads.

### Binary body handling

In `scan_outbound`, the bytes → str decoding changes from:

```python
body.decode("utf-8", errors="replace")
```

to:

```python
body.decode("utf-8") if body is str else body.decode("latin-1")
```

`latin-1` is a bijective byte↔codepoint mapping; every byte value is preserved
as its corresponding Latin-1 code point, so ASCII-range secret strings remain
intact and `str.find` / regex still locate them correctly. The fallback from
strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully.

## Implementation

Delivered in three commits on the same branch:

1. **DLP detector changes** — `_alnum_projection`, fragmentation passes,
   `scan_entropy`, broadened `scan_known_secrets`, updated `scan_outbound` and
   `redact_tokens`; all accompanying unit tests.
2. **Canary injection** — `EgressPlan.canary`, `Egress.prepare()`,
   Docker compose + macos-container backend injection.
3. **PRD flip** — `Status: Draft → Active`.