docs: draft PRD prd-new for strengthen-outbound-exfil-detection
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,186 @@
|
|||||||
|
# PRD prd-new: Strengthen outbound exfiltration detection
|
||||||
|
|
||||||
|
- **Status:** Draft
|
||||||
|
- **Author:** claude
|
||||||
|
- **Created:** 2026-06-25
|
||||||
|
- **Issue:** #259
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
The current DLP system has two detection tiers: weak (vendor token regexes)
|
||||||
|
and strong (exact matching of provisioned `EGRESS_TOKEN_*` values in multiple
|
||||||
|
encodings). This PRD adds four complementary hardening layers in ranked order
|
||||||
|
of impact:
|
||||||
|
|
||||||
|
1. **Canary tokens** — inject a per-session fake secret into the agent's
|
||||||
|
environment; any outbound appearance is a zero-false-positive exfil signal.
|
||||||
|
2. **Broadened known-value scanning** — extend the strong tier beyond
|
||||||
|
`EGRESS_TOKEN_*` via a `BOT_BOTTLE_SENSITIVE_PREFIXES` sidecar env var.
|
||||||
|
3. **Fragmentation-resistant matching** — catch separator injection and
|
||||||
|
long-substring leaks via alnum projection and sliding-window substring
|
||||||
|
search.
|
||||||
|
4. **Entropy scoring** — warn-only flag on high-entropy blobs as a
|
||||||
|
complementary signal for encrypted exfil.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
After PRD 0056 the outbound scan covers all HTTP request surfaces (host, path,
|
||||||
|
query, headers, body). But the content-based strong tier only matches
|
||||||
|
`EGRESS_TOKEN_*` values and their encodings. Three gaps remain:
|
||||||
|
|
||||||
|
| Gap | How an agent exploits it |
|
||||||
|
|-----|--------------------------|
|
||||||
|
| No canary | There is no "can't be false-positive" signal |
|
||||||
|
| Narrow prefix filter | Sensitive values provisioned under other env prefixes (MCP keys, API keys injected via the agent provider) are invisible to `scan_known_secrets` |
|
||||||
|
| Fragmentation blindness | Inserting `-`, space, or `\n` between secret characters turns an exact-match into a miss |
|
||||||
|
|
||||||
|
## Goals / Success Criteria
|
||||||
|
|
||||||
|
1. Each launched bottle has a unique canary token in the agent's environment
|
||||||
|
(`BOT_BOTTLE_CANARY`) and the egress sidecar's environment
|
||||||
|
(`EGRESS_TOKEN_CANARY`). Any outbound appearance of the canary blocks the
|
||||||
|
request with reason `"canary token"`.
|
||||||
|
2. `scan_known_secrets` accepts a `sensitive_prefixes` parameter (default:
|
||||||
|
`("EGRESS_TOKEN_",)`). `scan_outbound` reads
|
||||||
|
`BOT_BOTTLE_SENSITIVE_PREFIXES` from `environ` and merges those prefixes
|
||||||
|
in, so operators can mark additional env vars as scanned values without
|
||||||
|
changing the manifest schema.
|
||||||
|
3. For every secret that passes exact-match, a secondary alnum-projection pass
|
||||||
|
checks for the secret with all non-alphanumeric characters stripped. This
|
||||||
|
catches separator-injection evasion (`MY-SECRET` → body contains
|
||||||
|
`MY SECRET`).
|
||||||
|
4. A sliding-window partial-match pass checks for long-enough contiguous
|
||||||
|
substrings of the secret's alnum projection in the text's alnum projection.
|
||||||
|
Any match ≥ `PARTIAL_MATCH_MIN_LEN` (12 chars) blocks with reason
|
||||||
|
`"partial match"`.
|
||||||
|
5. A new `scan_entropy` detector flags outbound text windows with Shannon
|
||||||
|
entropy ≥ `ENTROPY_BLOCK_THRESHOLD` (5.5 bits/char) at **warn** severity
|
||||||
|
only. It is registered under the new detector name `"entropy"` in
|
||||||
|
`OUTBOUND_DETECTOR_NAMES` and disabled by default (routes must opt in).
|
||||||
|
6. Binary request bodies are decoded via `latin-1` instead of
|
||||||
|
`utf-8 errors="replace"`, preserving every byte value and allowing
|
||||||
|
ASCII-range secrets to be found within binary payloads.
|
||||||
|
7. All new behaviour is unit-tested; existing tests pass unchanged.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Rolling per-host buffer for split-across-requests detection (state in the
|
||||||
|
stateless addon is complex; deferred).
|
||||||
|
- Additional vendor regexes.
|
||||||
|
- ML / embedding-based detection.
|
||||||
|
- Entropy-based hard blocks (warn only per the issue).
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Canary token flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Egress.prepare()
|
||||||
|
canary = secrets.token_urlsafe(32)
|
||||||
|
EgressPlan(canary=canary, ...)
|
||||||
|
|
||||||
|
Docker compose render:
|
||||||
|
sidecar env: EGRESS_TOKEN_CANARY=<canary> ← scanned by existing known-secrets detector
|
||||||
|
agent env: BOT_BOTTLE_CANARY=<canary> ← visible to agent as a "secret"
|
||||||
|
|
||||||
|
macos-container launch: same literals added to sidecar + agent env entries
|
||||||
|
```
|
||||||
|
|
||||||
|
`EGRESS_TOKEN_CANARY` matches the `EGRESS_TOKEN_` prefix already scanned by
|
||||||
|
`scan_known_secrets`, so no detector code changes are required for canary
|
||||||
|
detection — only the injection path.
|
||||||
|
|
||||||
|
### Broadened known-value scanning
|
||||||
|
|
||||||
|
`scan_known_secrets` gains a `sensitive_prefixes` parameter:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def scan_known_secrets(
|
||||||
|
text: str,
|
||||||
|
*,
|
||||||
|
location: str = "body",
|
||||||
|
env: Mapping[str, str] | None = None,
|
||||||
|
sensitive_prefixes: tuple[str, ...] = ("EGRESS_TOKEN_",),
|
||||||
|
) -> ScanResult | None:
|
||||||
|
```
|
||||||
|
|
||||||
|
`scan_outbound` reads `BOT_BOTTLE_SENSITIVE_PREFIXES` (comma-separated list
|
||||||
|
of additional prefixes) from `environ` and appends them:
|
||||||
|
|
||||||
|
```python
|
||||||
|
extra = tuple(
|
||||||
|
p for p in environ.get("BOT_BOTTLE_SENSITIVE_PREFIXES", "").split(",") if p
|
||||||
|
)
|
||||||
|
sensitive_prefixes = ("EGRESS_TOKEN_",) + extra
|
||||||
|
```
|
||||||
|
|
||||||
|
`redact_tokens` receives the same treatment for consistent redaction.
|
||||||
|
|
||||||
|
### Fragmentation-resistant matching
|
||||||
|
|
||||||
|
A new helper `_alnum_projection(text)` strips all non-alphanumeric characters.
|
||||||
|
`scan_known_secrets` runs two passes per secret:
|
||||||
|
|
||||||
|
1. **Exact pass** — existing encoded-variant loop (unchanged).
|
||||||
|
2. **Alnum-projection pass** — if the secret's alnum projection has ≥ 8 chars,
|
||||||
|
check if it appears in the text's alnum projection. Match → block with
|
||||||
|
`"fragmented match (separator injection)"` reason.
|
||||||
|
3. **Partial-substring pass** — if the secret's alnum projection has ≥
|
||||||
|
`PARTIAL_MATCH_MIN_LEN` chars (12), slide a window of that length across the
|
||||||
|
secret's projection and look for each window in the text's alnum projection.
|
||||||
|
First match → block with `"partial match"` reason.
|
||||||
|
|
||||||
|
All three passes run only for the `"known_secrets"` detector; the token-pattern
|
||||||
|
and entropy detectors are unchanged.
|
||||||
|
|
||||||
|
### Entropy scoring
|
||||||
|
|
||||||
|
New public function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def scan_entropy(
|
||||||
|
text: str,
|
||||||
|
*,
|
||||||
|
location: str = "body",
|
||||||
|
window: int = ENTROPY_WINDOW, # 64
|
||||||
|
threshold: float = ENTROPY_BLOCK_THRESHOLD, # 5.5
|
||||||
|
) -> ScanResult | None:
|
||||||
|
```
|
||||||
|
|
||||||
|
Slides a window of `window` characters across `text` in steps of `window // 2`.
|
||||||
|
If any window's Shannon entropy exceeds `threshold`, returns a **warn**-severity
|
||||||
|
`ScanResult`. Never blocks.
|
||||||
|
|
||||||
|
`OUTBOUND_DETECTOR_NAMES` gains `"entropy"`. Routes opt in via their `dlp`
|
||||||
|
block; entropy scanning is **off by default** to avoid false-positive noise on
|
||||||
|
legitimate binary payloads.
|
||||||
|
|
||||||
|
### Binary body handling
|
||||||
|
|
||||||
|
In `scan_outbound`, the bytes → str decoding changes from:
|
||||||
|
|
||||||
|
```python
|
||||||
|
body.decode("utf-8", errors="replace")
|
||||||
|
```
|
||||||
|
|
||||||
|
to:
|
||||||
|
|
||||||
|
```python
|
||||||
|
body.decode("utf-8") if body is str else body.decode("latin-1")
|
||||||
|
```
|
||||||
|
|
||||||
|
`latin-1` is a bijective byte↔codepoint mapping; every byte value is preserved
|
||||||
|
as its corresponding Latin-1 code point, so ASCII-range secret strings remain
|
||||||
|
intact and `str.find` / regex still locate them correctly. The fallback from
|
||||||
|
strict UTF-8 is tried first so valid UTF-8 bodies are decoded faithfully.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
Delivered in three commits on the same branch:
|
||||||
|
|
||||||
|
1. **DLP detector changes** — `_alnum_projection`, fragmentation passes,
|
||||||
|
`scan_entropy`, broadened `scan_known_secrets`, updated `scan_outbound` and
|
||||||
|
`redact_tokens`; all accompanying unit tests.
|
||||||
|
2. **Canary injection** — `EgressPlan.canary`, `Egress.prepare()`,
|
||||||
|
Docker compose + macos-container backend injection.
|
||||||
|
3. **PRD flip** — `Status: Draft → Active`.
|
||||||
Reference in New Issue
Block a user