docs: rename PRD 0053 to PRD 0052

Renames docs/prds/0053-egress-dlp-addon.md to 0052-egress-dlp-addon.md and updates all references in the documentation.
2026-06-06 16:27:04 +00:00
parent 63a3b9b50a
commit 8c0a9c5bc6
2 changed files with 3 additions and 3 deletions
@@ -0,0 +1,415 @@
+# PRD 0052: Egress DLP addon
+
+- **Status:** Active
+- **Author:** claude
+- **Created:** 2026-06-05
+- **Issue:** #195
+
+## Summary
+
+With pipelock removed (PR #193), the egress proxy no longer performs DLP
+scanning on traffic to or from the agent. This PRD implements a replacement
+directly inside the mitmproxy egress addon: per-route DLP detectors that
+scan outbound requests for credential leakage and inbound responses for
+prompt injection attempts.
+
+The manifest route schema is also upgraded in this PRD from the flat
+`path_allowlist` field to a structured `matches` block modelled on the
+[Kubernetes Gateway API `HTTPRoute`](https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteMatch)
+match vocabulary. This upgrade is a hard cutover — no compatibility shim
+for the old format. The rationale and format survey are in the
+[YAML route matching formats research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md).
+DLP detectors attach to the new `matches`-based routes directly.
+
+The design follows the recommendation in the
+[DLP research document (PR #192)](https://gitea.dideric.is/didericis/bot-bottle/pulls/192)
+and covers all three remaining implementation phases from that plan:
+
+1. Token pattern detection (Phase 1a)
+2. Known-secrets detection (Phase 1b)
+3. Naive prompt injection detection (Phase 2)
+
+## Problem
+
+Pipelock was removed because it could not support per-route response
+scanning, blocking selective DLP policies (e.g., skip scanning `.whl`
+downloads while keeping scanning on API calls). Removing it left the egress
+proxy with no DLP capability at all. The egress addon already holds per-route
+logic for path allowlisting and credential injection; DLP rules belong in the
+same place.
+
+The existing `path_allowlist` field is also limiting: it only supports path
+prefixes, with no way to express exact-path, regex, method, or header
+constraints. The Gateway API match vocabulary is a well-specified, widely
+deployed standard that covers all of these without inventing new syntax.
+
+## Goals / Success Criteria
+
+1. Outbound request bodies and headers are scanned for known token patterns
+   (AWS, GitHub, Anthropic, etc.) before the request reaches the upstream.
+   Matches are blocked immediately.
+2. Outbound request bodies are scanned for provisioned secrets that the
+   agent should not have direct access to. Matches are blocked immediately.
+3. Inbound response bodies are scanned for prompt disclosure and jailbreak
+   signals. High-confidence matches are blocked; medium-confidence matches
+   emit a log warning and are forwarded.
+4. DLP scanning is enabled by default on every route. Individual routes can
+   selectively disable outbound detectors, inbound detectors, or both via a
+   `dlp` block in the manifest.
+5. All detector logic lives in `egress_addon_core.py` (pure Python, no
+   mitmproxy dependency) and is covered by unit tests on the host.
+6. Each route's `matches` block supports path (exact/prefix/regex), HTTP
+   method, and header predicates using Gateway API match semantics.
+7. The manifest change is a hard cutover: `path_allowlist` is removed with
+   no fallback, no deprecation alias, and no loud exception for old-format
+   manifests. Old manifests that use `path_allowlist` will fail validation
+   at load time with an unknown-key error (same as any other unrecognised
+   key today).
+
+## Non-goals
+
+- LLM-based semantic prompt injection detection (explicitly deferred to a
+  potential Phase 2b per the research doc).
+- Entropy-based secret detection (excluded from scope; too many false
+  positives on binary API responses and compressed payloads).
+- BIP-39 seed-phrase detection.
+- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential
+  exfil relevant to agent containment.
+- Changes to the cred-proxy sidecar.
+- Streaming response scanning (scan buffered response body only).
+- Glob-style path matching — regex covers every case glob would handle
+  without adding a third path-matching language.
+
+## Design
+
+### Route matching: Gateway API `matches` vocabulary
+
+The existing `path_allowlist` field is replaced by a `matches` list. The
+vocabulary mirrors Kubernetes Gateway API `HTTPRouteMatch` (see the
+[route matching research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md)
+for a full format survey and rationale). Gateway API was chosen because it
+is spec-backed, implementation-tested across multiple proxies, and its
+`{type, value}` pattern is consistent and schema-validatable.
+
+**AND/OR semantics** (same as Gateway API):
+- Predicates *within* a single `matches` entry are ANDed.
+- Multiple entries in the `matches` list are ORed — the route matches if
+  any entry matches.
+
+```yaml
+egress:
+  routes:
+    # Bare route — all traffic to this host is forwarded (no path/method/header
+    # constraints). Equivalent to the old path_allowlist-omitted case.
+    - host: api.anthropic.com
+      auth:
+        scheme: Bearer
+        token_ref: EGRESS_TOKEN_0
+
+    # Two match entries (OR): GET/HEAD on /packages/** OR POST on /upload
+    - host: files.pythonhosted.org
+      matches:
+        - paths:
+            - type: prefix
+              value: /packages/
+          methods: [GET, HEAD]
+        - paths:
+            - type: exact
+              value: /upload
+          methods: [POST]
+      dlp:
+        inbound_detectors: false   # skip response scanning (binary downloads)
+
+    # Header + regex path — only JSON API responses on versioned endpoints
+    - host: internal-api.corp
+      matches:
+        - paths:
+            - type: regex
+              value: "^/v[0-9]+/"
+          headers:
+            - name: Content-Type
+              type: exact
+              value: application/json
+      dlp:
+        outbound_detectors: false
+        inbound_detectors: false
+```
+
+#### Path matching types
+
+| `type` | Semantics |
+|--------|-----------|
+| `exact` | Full path must equal `value` exactly |
+| `prefix` | Path must start with `value` at a segment boundary (matches `/api/v1` for value `/api/v1`, rejects `/api/v10`) |
+| `regex` | RE2 regex; rejected at load time if pattern fails to compile. Use for wildcard needs: `/api/[^/]+/data` instead of glob |
+
+`type` defaults to `prefix` when omitted (preserves the semantic of the
+old `path_allowlist`).
+
+#### Method matching
+
+`methods` is a list of HTTP method names, case-insensitive at parse time —
+`get`, `GET`, and `Get` are all accepted and stored as uppercase internally.
+An absent or empty `methods` list means all methods are permitted.
+
+#### Header matching
+
+`headers` is a list of `{name, value, type}` objects. ALL listed headers
+must match (AND semantics). To OR on header values, use multiple `matches`
+entries.
+
+| `type` | Semantics |
+|--------|-----------|
+| `exact` | Header value equals `value` (default when `type` omitted) |
+| `regex` | Header value matches RE2 regex |
+
+### Manifest schema — `dlp` block
+
+Each `egress.routes` entry gains an optional `dlp` key alongside `matches`
+and `auth`:
+
+```yaml
+egress:
+  routes:
+    - host: api.anthropic.com
+      # dlp omitted → all detectors on (default)
+
+    - host: files.pythonhosted.org
+      dlp:
+        inbound_detectors: false   # skip response scanning (binary downloads)
+
+    - host: internal-docs.corp
+      dlp:
+        outbound_detectors: false
+        inbound_detectors: false   # trusted internal, no scanning
+```
+
+`outbound_detectors` controls scanning of the *request* body + headers
+leaving the agent. `inbound_detectors` controls scanning of the *response*
+body arriving from the upstream.
+
+Valid values per field:
+- Omitted (or `null`) — default: all detectors active.
+- `false` — scanning disabled for this direction on this route.
+- A list of detector names — only the listed detectors run.
+
+Named outbound detectors: `token_patterns`, `known_secrets`.
+Named inbound detectors: `naive_injection_detection`.
+
+The manifest parser (`manifest_egress.py`) validates the `dlp` block and
+rejects unknown detector names.
+
+### `EgressRoute` changes
+
+`EgressRoute` replaces `PathAllowlist` with `Matches` and gains two new
+DLP fields. `MatchEntry` captures one AND-predicate block:
+
+```python
+@dataclass(frozen=True)
+class PathMatch:
+    type: str   # "exact" | "prefix" | "regex"
+    value: str
+
+
+@dataclass(frozen=True)
+class HeaderMatch:
+    name: str
+    value: str
+    type: str = "exact"   # "exact" | "regex"
+
+
+@dataclass(frozen=True)
+class MatchEntry:
+    paths: tuple[PathMatch, ...] = ()     # empty = match any path
+    methods: tuple[str, ...] = ()         # empty = match any method (uppercase)
+    headers: tuple[HeaderMatch, ...] = () # empty = match any headers
+
+
+@dataclass(frozen=True)
+class EgressRoute:
+    Host: str
+    Matches: tuple[MatchEntry, ...] = ()  # empty = match all requests
+    AuthScheme: str = ""
+    TokenRef: str = ""
+    Role: tuple[str, ...] = ()
+    OutboundDetectors: tuple[str, ...] | None = None   # None = all enabled
+    InboundDetectors: tuple[str, ...] | None = None    # None = all enabled
+```
+
+`manifest_egress.py`'s `from_dict` parses the new `matches` block and `dlp`
+block; `path_allowlist` is no longer a recognised key and will be rejected
+by the unknown-key check.
+
+### `Route` changes in `egress_addon_core.py`
+
+The addon-side `Route` and its helper types mirror the manifest-side changes.
+`match_route` is extended to evaluate the `Matches` list:
+
+```python
+@dataclass(frozen=True)
+class Route:
+    host: str
+    matches: tuple[MatchEntry, ...] = ()
+    auth_scheme: str = ""
+    token_env: str = ""
+    outbound_detectors: tuple[str, ...] | None = None
+    inbound_detectors: tuple[str, ...] | None = None
+```
+
+`decide()` feeds through `match_route` (unchanged host lookup) then
+evaluates the match entries in order; if the route has no `matches` entries
+all requests pass. Path `prefix` type uses segment-boundary checking
+(`/api/v1` matches `/api/v1/foo` but not `/api/v10`).
+
+### Detector interface
+
+Each detector is a pure function:
+
+```python
+def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
+    ...
+```
+
+`ScanResult` carries:
+
+```python
+@dataclass(frozen=True)
+class ScanResult:
+    severity: str   # "block" or "warn"
+    reason: str
+```
+
+`scan` returns `None` if the body is clean, `ScanResult` otherwise.
+
+### Detector: `token_patterns`
+
+Regex patterns for well-known credential formats, applied to the outbound
+request body and `Authorization` header (before the addon strips it — the
+strip happens after DLP scanning so that the scan sees any credential the
+agent tried to smuggle):
+
+| Token type | Pattern |
+|------------|---------|
+| AWS access key | `AKIA[0-9A-Z]{16}` |
+| GitHub token (classic) | `ghp_[A-Za-z0-9_]{36}` |
+| GitHub fine-grained | `github_pat_[A-Za-z0-9_]{82}` |
+| Anthropic API key | `sk-ant-[A-Za-z0-9\-_]{93}` |
+| OpenAI API key | `sk-[A-Za-z0-9]{48}` |
+| Stripe live key | `sk_live_[A-Za-z0-9]{24}` |
+| Generic Bearer JWT | `Bearer\s+[A-Za-z0-9._\-]{50,}` |
+
+Action: `"block"` on any match. No tolerance — a credential in an outbound
+request is always a violation.
+
+### Detector: `known_secrets`
+
+At request time the egress addon has access to `os.environ`, which includes
+all `token_env` values declared by route auth blocks. The detector:
+
+1. Collects all `EGRESS_TOKEN_*` values from the environment (the naming
+   contract established by `manifest_egress.py`'s `TokenRef` rendering).
+2. For each secret value, derives encoded variants: raw, base64, URL-encoded,
+   hex.
+3. Scans the outbound request body for any variant.
+
+Action: `"block"` on match.
+
+This detector does **not** accept a custom detector name in the YAML — it
+is always named `known_secrets`. The environment is passed in via the `env`
+keyword argument to `scan`.
+
+### Detector: `naive_injection_detection`
+
+Pattern-based inbound response scanner. Uses two tiers:
+
+**Tier 1 — BLOCK (credential + disclosure together):**
+- Response contains a token-pattern match (reuses `token_patterns` regex
+  set) AND a prompt-disclosure phrase (e.g., `system prompt`, `my instructions
+  are`, `hidden rules`).
+
+**Tier 2 — WARN (multiple jailbreak signals):**
+- Two or more jailbreak phrases detected (e.g., `ignore previous`,
+  `forget everything`, `pretend you are`, `act as`).
+- OR explicit prompt disclosure (`system prompt:`) without a credential.
+
+**Tier 3 — ALLOW:**
+- Single jailbreak keyword without additional context.
+- Common documentation phrases.
+
+See the DLP research doc for the full phrase lists and pseudocode.
+
+### Wiring into `egress_addon.py`
+
+Two new mitmproxy hooks are added alongside the existing `request` hook:
+
+```python
+def request(self, flow: http.HTTPFlow) -> None:
+    # ... existing match + auth-injection logic ...
+    # After route decision, if action == "forward":
+    result = scan_outbound(route, flow.request, os.environ)
+    if result and result.severity == "block":
+        flow.response = http.Response.make(403, result.reason.encode(), ...)
+        return
+
+def response(self, flow: http.HTTPFlow) -> None:
+    route = match_route(self.routes, flow.request.pretty_host)
+    if route is None:
+        return  # already blocked at request time
+    result = scan_inbound(route, flow.response)
+    if result and result.severity == "block":
+        flow.response = http.Response.make(403, result.reason.encode(), ...)
+    elif result and result.severity == "warn":
+        sys.stderr.write(f"egress DLP warn: {result.reason}\n")
+```
+
+`scan_outbound` and `scan_inbound` are pure functions in
+`egress_addon_core.py` that dispatch to the per-route detector list.
+
+### Ordering: auth strip vs. DLP scan
+
+The DLP outbound scan sees the *agent's original* `Authorization` header
+before the addon strips it. This ensures that a token the agent smuggled
+in the header is caught. The strip + optional re-injection still happens
+afterward, preserving the existing credential-injection security model.
+
+## Implementation chunks
+
+1. **New `matches` block + `EgressRoute` / `Route` restructure.**
+   Remove `path_allowlist` from `manifest_egress.py` and `egress_addon_core.py`.
+   Add `MatchEntry`, `PathMatch`, `HeaderMatch` types. Parse `matches` in
+   `EgressRoute.from_dict` and `_parse_one`; unknown-key rejection handles
+   old `path_allowlist` manifests. Add `OutboundDetectors` / `InboundDetectors`
+   to `EgressRoute` and `Route`; parse `dlp` block. Extend
+   `tests/unit/test_manifest_egress.py` and `tests/unit/test_egress_addon_core.py`
+   with match and dlp valid/invalid cases.
+
+2. **Token-patterns detector (Phase 1a).**
+   New module `bot_bottle/dlp_detectors.py` (host-importable) and
+   companion flat copy for the sidecar bundle. Add `TokenPatternsDetector`
+   with the regex set above. Wire `scan_outbound` into the `request` hook
+   in `egress_addon.py`. Unit tests in `tests/unit/test_dlp_detectors.py`.
+
+3. **Known-secrets detector (Phase 1b).**
+   Add `KnownSecretsDetector` to `dlp_detectors.py`. Collect
+   `EGRESS_TOKEN_*` from env; derive encoded variants; scan request body.
+   Extend unit tests. Wire into `scan_outbound`.
+
+4. **Naive prompt injection detector (Phase 2).**
+   Add `NaiveInjectionDetector` to `dlp_detectors.py`. Wire
+   `scan_inbound` into the new `response` hook in `egress_addon.py`.
+   Extend unit tests. Activate PRD 0052 (`Status: Draft → Active`) in
+   this commit.
+
+## Open questions
+
+1. **Response body buffering:** mitmproxy's `response` hook already has
+   the full body for non-streaming responses. For streaming (chunked)
+   responses the body may be empty or incomplete at hook time. Scope for
+   now: log a warning and skip scanning on streaming responses; revisit
+   if needed.
+2. **Encoding breadth for `known_secrets`:** Start with raw + base64 +
+   URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts
+   appear.
+3. **`EGRESS_TOKEN_*` naming contract:** The detector relies on the
+   env-var naming convention from `manifest_egress.py`. If that contract
+   changes, the detector must be updated in lock-step.