diff --git a/docs/prds/0053-egress-dlp-addon.md b/docs/prds/0053-egress-dlp-addon.md new file mode 100644 index 0000000..8112700 --- /dev/null +++ b/docs/prds/0053-egress-dlp-addon.md @@ -0,0 +1,291 @@ +# PRD 0053: Egress DLP addon + +- **Status:** Draft +- **Author:** claude +- **Created:** 2026-06-05 +- **Issue:** #195 + +## Summary + +With pipelock removed (PR #193), the egress proxy no longer performs DLP +scanning on traffic to or from the agent. This PRD implements a replacement +directly inside the mitmproxy egress addon: per-route DLP detectors that +scan outbound requests for credential leakage and inbound responses for +prompt injection attempts. Configuration is expressed as a new `dlp` block +on each `egress.routes` entry in the bottle manifest. + +The design follows the recommendation in the [DLP research document +(PR #192)](https://gitea.dideric.is/didericis/bot-bottle/pulls/192) and +covers all three remaining implementation phases from that plan: + +1. Token pattern detection (Phase 1a) +2. Known-secrets detection (Phase 1b) +3. Naive prompt injection detection (Phase 2) + +## Problem + +Pipelock was removed because it could not support per-route response +scanning, blocking selective DLP policies (e.g., skip scanning `.whl` +downloads while keeping scanning on API calls). Removing it left the egress +proxy with no DLP capability at all. The egress addon already holds per-route +logic for path allowlisting and credential injection; DLP rules belong in the +same place. + +## Goals / Success Criteria + +1. Outbound request bodies and headers are scanned for known token patterns + (AWS, GitHub, Anthropic, etc.) before the request reaches the upstream. + Matches are blocked immediately. +2. Outbound request bodies are scanned for provisioned secrets that the + agent should not have direct access to. Matches are blocked immediately. +3. Inbound response bodies are scanned for prompt disclosure and jailbreak + signals. High-confidence matches are blocked; medium-confidence matches + emit a log warning and are forwarded. +4. DLP scanning is enabled by default on every route. Individual routes can + selectively disable outbound detectors, inbound detectors, or both via a + `dlp` block in the manifest. +5. All detector logic lives in `egress_addon_core.py` (pure Python, no + mitmproxy dependency) and is covered by unit tests on the host. +6. Adding `dlp` configuration to a route that omits it entirely is + backward-compatible — the route behaves as if all detectors are enabled. + +## Non-goals + +- LLM-based semantic prompt injection detection (explicitly deferred to a + potential Phase 2b per the research doc). +- Entropy-based secret detection (excluded from scope; too many false + positives on binary API responses and compressed payloads). +- BIP-39 seed-phrase detection. +- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential + exfil relevant to agent containment. +- Changes to the cred-proxy sidecar. +- Streaming response scanning (scan buffered response body only). + +## Design + +### Manifest schema — `dlp` block + +Each `egress.routes` entry gains an optional `dlp` key: + +```yaml +egress: + routes: + - host: api.anthropic.com + # dlp omitted → all detectors on (default) + + - host: files.pythonhosted.org + dlp: + inbound_detectors: false # skip response scanning (binary downloads) + + - host: internal-docs.corp + dlp: + outbound_detectors: false + inbound_detectors: false # trusted internal, no scanning +``` + +`outbound_detectors` controls scanning of the *request* body + headers +leaving the agent. `inbound_detectors` controls scanning of the *response* +body arriving from the upstream. + +Valid values per field: +- Omitted (or `null`) — default: all detectors active. +- `false` — scanning disabled for this direction on this route. +- A list of detector names — only the listed detectors run. + +Named outbound detectors: `token_patterns`, `known_secrets`. +Named inbound detectors: `naive_injection_detection`. + +The manifest parser (`manifest_egress.py`) validates the `dlp` block and +rejects unknown detector names. + +### `EgressRoute` changes + +`EgressRoute` gains two new fields: + +```python +@dataclass(frozen=True) +class EgressRoute: + Host: str + PathAllowlist: tuple[str, ...] = () + AuthScheme: str = "" + TokenRef: str = "" + Role: tuple[str, ...] = () + OutboundDetectors: tuple[str, ...] | None = None # None = all enabled + InboundDetectors: tuple[str, ...] | None = None # None = all enabled +``` + +`None` means "use defaults" (all active); an empty `tuple[str, ...]` means +"disabled". Named detectors use `tuple[str, ...]` with the detector name. + +`manifest_egress.py` uses `from_dict` to parse the new `dlp` block and +populate these fields; unknown keys inside `dlp` are rejected. + +### `Route` changes in `egress_addon_core.py` + +The addon-side `Route` dataclass mirrors the manifest-side change: + +```python +@dataclass(frozen=True) +class Route: + host: str + path_allowlist: tuple[str, ...] = () + auth_scheme: str = "" + token_env: str = "" + outbound_detectors: tuple[str, ...] | None = None + inbound_detectors: tuple[str, ...] | None = None +``` + +`parse_routes` / `_parse_one` grow the corresponding parsing logic. + +### Detector interface + +Each detector is a pure function: + +```python +def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None: + ... +``` + +`ScanResult` carries: + +```python +@dataclass(frozen=True) +class ScanResult: + severity: str # "block" or "warn" + reason: str +``` + +`scan` returns `None` if the body is clean, `ScanResult` otherwise. + +### Detector: `token_patterns` + +Regex patterns for well-known credential formats, applied to the outbound +request body and `Authorization` header (before the addon strips it — the +strip happens after DLP scanning so that the scan sees any credential the +agent tried to smuggle): + +| Token type | Pattern | +|------------|---------| +| AWS access key | `AKIA[0-9A-Z]{16}` | +| GitHub token (classic) | `ghp_[A-Za-z0-9_]{36}` | +| GitHub fine-grained | `github_pat_[A-Za-z0-9_]{82}` | +| Anthropic API key | `sk-ant-[A-Za-z0-9\-_]{93}` | +| OpenAI API key | `sk-[A-Za-z0-9]{48}` | +| Stripe live key | `sk_live_[A-Za-z0-9]{24}` | +| Generic Bearer JWT | `Bearer\s+[A-Za-z0-9._\-]{50,}` | + +Action: `"block"` on any match. No tolerance — a credential in an outbound +request is always a violation. + +### Detector: `known_secrets` + +At request time the egress addon has access to `os.environ`, which includes +all `token_env` values declared by route auth blocks. The detector: + +1. Collects all `EGRESS_TOKEN_*` values from the environment (the naming + contract established by `manifest_egress.py`'s `TokenRef` rendering). +2. For each secret value, derives encoded variants: raw, base64, URL-encoded, + hex. +3. Scans the outbound request body for any variant. + +Action: `"block"` on match. + +This detector does **not** accept a custom detector name in the YAML — it +is always named `known_secrets`. The environment is passed in via the `env` +keyword argument to `scan`. + +### Detector: `naive_injection_detection` + +Pattern-based inbound response scanner. Uses two tiers: + +**Tier 1 — BLOCK (credential + disclosure together):** +- Response contains a token-pattern match (reuses `token_patterns` regex + set) AND a prompt-disclosure phrase (e.g., `system prompt`, `my instructions + are`, `hidden rules`). + +**Tier 2 — WARN (multiple jailbreak signals):** +- Two or more jailbreak phrases detected (e.g., `ignore previous`, + `forget everything`, `pretend you are`, `act as`). +- OR explicit prompt disclosure (`system prompt:`) without a credential. + +**Tier 3 — ALLOW:** +- Single jailbreak keyword without additional context. +- Common documentation phrases. + +See the research doc for the full phrase lists and pseudocode. + +### Wiring into `egress_addon.py` + +Two new mitmproxy hooks are added alongside the existing `request` hook: + +```python +def request(self, flow: http.HTTPFlow) -> None: + # ... existing path-allowlist + auth-injection logic ... + # After route decision, if action == "forward": + result = scan_outbound(route, flow.request, os.environ) + if result and result.severity == "block": + flow.response = http.Response.make(403, result.reason.encode(), ...) + return + +def response(self, flow: http.HTTPFlow) -> None: + route = match_route(self.routes, flow.request.pretty_host) + if route is None: + return # already blocked at request time + result = scan_inbound(route, flow.response) + if result and result.severity == "block": + flow.response = http.Response.make(403, result.reason.encode(), ...) + elif result and result.severity == "warn": + sys.stderr.write(f"egress DLP warn: {result.reason}\n") +``` + +`scan_outbound` and `scan_inbound` are pure functions in +`egress_addon_core.py` that dispatch to the per-route detector list. + +### Ordering: auth strip vs. DLP scan + +The DLP outbound scan sees the *agent's original* `Authorization` header +before the addon strips it. This ensures that a token the agent smuggled +in the header is caught. The strip + optional re-injection still happens +afterward, preserving the existing credential-injection security model. + +## Implementation chunks + +1. **Manifest `dlp` block + `EgressRoute` fields.** + `manifest_egress.py`: parse `dlp`, add `OutboundDetectors` / + `InboundDetectors` to `EgressRoute`. Extend + `tests/unit/test_manifest_egress.py` with `dlp` valid/invalid cases. + `egress_addon_core.py`: add `outbound_detectors` / `inbound_detectors` + to `Route`; update `_parse_one` and `parse_routes`; extend + `tests/unit/test_egress_addon_core.py`. + +2. **Token-patterns detector (Phase 1a).** + New module `bot_bottle/dlp_detectors.py` (host-importable) and + companion flat copy for the sidecar bundle. Add `TokenPatternsDetector` + with the regex set above. Wire `scan_outbound` into the `request` hook + in `egress_addon.py`. Unit tests in + `tests/unit/test_dlp_detectors.py`. + +3. **Known-secrets detector (Phase 1b).** + Add `KnownSecretsDetector` to `dlp_detectors.py`. Collect + `EGRESS_TOKEN_*` from env; derive encoded variants; scan request body. + Extend unit tests. Wire into `scan_outbound`. + +4. **Naive prompt injection detector (Phase 2).** + Add `NaiveInjectionDetector` to `dlp_detectors.py`. Wire + `scan_inbound` into the new `response` hook in `egress_addon.py`. + Extend unit tests. Activate PRD 0053 (`Status: Draft → Active`) in + this commit. + +## Open questions + +1. **Response body buffering:** mitmproxy's `response` hook already has + the full body for non-streaming responses. For streaming (chunked) + responses the body may be empty or incomplete at hook time. Scope for + now: log a warning and skip scanning on streaming responses; revisit + if needed. +2. **Encoding breadth for `known_secrets`:** Start with raw + base64 + + URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts + appear. +3. **`EGRESS_TOKEN_*` naming contract:** The detector relies on the + env-var naming convention from `manifest_egress.py`. If that contract + changes, the detector must be updated in lock-step.