Files
bot-bottle/docs/prds/0053-egress-dlp-addon.md
T
didericis-claude 726713d081
lint / lint (push) Failing after 1m43s
test / unit (pull_request) Successful in 40s
test / integration (pull_request) Successful in 50s
feat(egress): implement PRD 0053 — DLP addon with Gateway API matches
Replace path_allowlist with Gateway API HTTPRoute match vocabulary
(paths, methods, headers with AND/OR semantics) and add DLP scanning
to the egress proxy:

- Token pattern detection (AWS, GitHub, Anthropic, OpenAI, Stripe, JWT)
- Known secret detection (EGRESS_TOKEN_* with base64/URL/hex variants)
- Naive prompt injection detection (disclosure + credential, jailbreak)
- Per-route DLP configuration via manifest dlp block
- Inbound response scanning with block/warn severity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 19:53:23 +00:00

416 lines
16 KiB
Markdown

# PRD 0053: Egress DLP addon
- **Status:** Active
- **Author:** claude
- **Created:** 2026-06-05
- **Issue:** #195
## Summary
With pipelock removed (PR #193), the egress proxy no longer performs DLP
scanning on traffic to or from the agent. This PRD implements a replacement
directly inside the mitmproxy egress addon: per-route DLP detectors that
scan outbound requests for credential leakage and inbound responses for
prompt injection attempts.
The manifest route schema is also upgraded in this PRD from the flat
`path_allowlist` field to a structured `matches` block modelled on the
[Kubernetes Gateway API `HTTPRoute`](https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteMatch)
match vocabulary. This upgrade is a hard cutover — no compatibility shim
for the old format. The rationale and format survey are in the
[YAML route matching formats research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md).
DLP detectors attach to the new `matches`-based routes directly.
The design follows the recommendation in the
[DLP research document (PR #192)](https://gitea.dideric.is/didericis/bot-bottle/pulls/192)
and covers all three remaining implementation phases from that plan:
1. Token pattern detection (Phase 1a)
2. Known-secrets detection (Phase 1b)
3. Naive prompt injection detection (Phase 2)
## Problem
Pipelock was removed because it could not support per-route response
scanning, blocking selective DLP policies (e.g., skip scanning `.whl`
downloads while keeping scanning on API calls). Removing it left the egress
proxy with no DLP capability at all. The egress addon already holds per-route
logic for path allowlisting and credential injection; DLP rules belong in the
same place.
The existing `path_allowlist` field is also limiting: it only supports path
prefixes, with no way to express exact-path, regex, method, or header
constraints. The Gateway API match vocabulary is a well-specified, widely
deployed standard that covers all of these without inventing new syntax.
## Goals / Success Criteria
1. Outbound request bodies and headers are scanned for known token patterns
(AWS, GitHub, Anthropic, etc.) before the request reaches the upstream.
Matches are blocked immediately.
2. Outbound request bodies are scanned for provisioned secrets that the
agent should not have direct access to. Matches are blocked immediately.
3. Inbound response bodies are scanned for prompt disclosure and jailbreak
signals. High-confidence matches are blocked; medium-confidence matches
emit a log warning and are forwarded.
4. DLP scanning is enabled by default on every route. Individual routes can
selectively disable outbound detectors, inbound detectors, or both via a
`dlp` block in the manifest.
5. All detector logic lives in `egress_addon_core.py` (pure Python, no
mitmproxy dependency) and is covered by unit tests on the host.
6. Each route's `matches` block supports path (exact/prefix/regex), HTTP
method, and header predicates using Gateway API match semantics.
7. The manifest change is a hard cutover: `path_allowlist` is removed with
no fallback, no deprecation alias, and no loud exception for old-format
manifests. Old manifests that use `path_allowlist` will fail validation
at load time with an unknown-key error (same as any other unrecognised
key today).
## Non-goals
- LLM-based semantic prompt injection detection (explicitly deferred to a
potential Phase 2b per the research doc).
- Entropy-based secret detection (excluded from scope; too many false
positives on binary API responses and compressed payloads).
- BIP-39 seed-phrase detection.
- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential
exfil relevant to agent containment.
- Changes to the cred-proxy sidecar.
- Streaming response scanning (scan buffered response body only).
- Glob-style path matching — regex covers every case glob would handle
without adding a third path-matching language.
## Design
### Route matching: Gateway API `matches` vocabulary
The existing `path_allowlist` field is replaced by a `matches` list. The
vocabulary mirrors Kubernetes Gateway API `HTTPRouteMatch` (see the
[route matching research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md)
for a full format survey and rationale). Gateway API was chosen because it
is spec-backed, implementation-tested across multiple proxies, and its
`{type, value}` pattern is consistent and schema-validatable.
**AND/OR semantics** (same as Gateway API):
- Predicates *within* a single `matches` entry are ANDed.
- Multiple entries in the `matches` list are ORed — the route matches if
any entry matches.
```yaml
egress:
routes:
# Bare route — all traffic to this host is forwarded (no path/method/header
# constraints). Equivalent to the old path_allowlist-omitted case.
- host: api.anthropic.com
auth:
scheme: Bearer
token_ref: EGRESS_TOKEN_0
# Two match entries (OR): GET/HEAD on /packages/** OR POST on /upload
- host: files.pythonhosted.org
matches:
- paths:
- type: prefix
value: /packages/
methods: [GET, HEAD]
- paths:
- type: exact
value: /upload
methods: [POST]
dlp:
inbound_detectors: false # skip response scanning (binary downloads)
# Header + regex path — only JSON API responses on versioned endpoints
- host: internal-api.corp
matches:
- paths:
- type: regex
value: "^/v[0-9]+/"
headers:
- name: Content-Type
type: exact
value: application/json
dlp:
outbound_detectors: false
inbound_detectors: false
```
#### Path matching types
| `type` | Semantics |
|--------|-----------|
| `exact` | Full path must equal `value` exactly |
| `prefix` | Path must start with `value` at a segment boundary (matches `/api/v1` for value `/api/v1`, rejects `/api/v10`) |
| `regex` | RE2 regex; rejected at load time if pattern fails to compile. Use for wildcard needs: `/api/[^/]+/data` instead of glob |
`type` defaults to `prefix` when omitted (preserves the semantic of the
old `path_allowlist`).
#### Method matching
`methods` is a list of HTTP method names, case-insensitive at parse time —
`get`, `GET`, and `Get` are all accepted and stored as uppercase internally.
An absent or empty `methods` list means all methods are permitted.
#### Header matching
`headers` is a list of `{name, value, type}` objects. ALL listed headers
must match (AND semantics). To OR on header values, use multiple `matches`
entries.
| `type` | Semantics |
|--------|-----------|
| `exact` | Header value equals `value` (default when `type` omitted) |
| `regex` | Header value matches RE2 regex |
### Manifest schema — `dlp` block
Each `egress.routes` entry gains an optional `dlp` key alongside `matches`
and `auth`:
```yaml
egress:
routes:
- host: api.anthropic.com
# dlp omitted → all detectors on (default)
- host: files.pythonhosted.org
dlp:
inbound_detectors: false # skip response scanning (binary downloads)
- host: internal-docs.corp
dlp:
outbound_detectors: false
inbound_detectors: false # trusted internal, no scanning
```
`outbound_detectors` controls scanning of the *request* body + headers
leaving the agent. `inbound_detectors` controls scanning of the *response*
body arriving from the upstream.
Valid values per field:
- Omitted (or `null`) — default: all detectors active.
- `false` — scanning disabled for this direction on this route.
- A list of detector names — only the listed detectors run.
Named outbound detectors: `token_patterns`, `known_secrets`.
Named inbound detectors: `naive_injection_detection`.
The manifest parser (`manifest_egress.py`) validates the `dlp` block and
rejects unknown detector names.
### `EgressRoute` changes
`EgressRoute` replaces `PathAllowlist` with `Matches` and gains two new
DLP fields. `MatchEntry` captures one AND-predicate block:
```python
@dataclass(frozen=True)
class PathMatch:
type: str # "exact" | "prefix" | "regex"
value: str
@dataclass(frozen=True)
class HeaderMatch:
name: str
value: str
type: str = "exact" # "exact" | "regex"
@dataclass(frozen=True)
class MatchEntry:
paths: tuple[PathMatch, ...] = () # empty = match any path
methods: tuple[str, ...] = () # empty = match any method (uppercase)
headers: tuple[HeaderMatch, ...] = () # empty = match any headers
@dataclass(frozen=True)
class EgressRoute:
Host: str
Matches: tuple[MatchEntry, ...] = () # empty = match all requests
AuthScheme: str = ""
TokenRef: str = ""
Role: tuple[str, ...] = ()
OutboundDetectors: tuple[str, ...] | None = None # None = all enabled
InboundDetectors: tuple[str, ...] | None = None # None = all enabled
```
`manifest_egress.py`'s `from_dict` parses the new `matches` block and `dlp`
block; `path_allowlist` is no longer a recognised key and will be rejected
by the unknown-key check.
### `Route` changes in `egress_addon_core.py`
The addon-side `Route` and its helper types mirror the manifest-side changes.
`match_route` is extended to evaluate the `Matches` list:
```python
@dataclass(frozen=True)
class Route:
host: str
matches: tuple[MatchEntry, ...] = ()
auth_scheme: str = ""
token_env: str = ""
outbound_detectors: tuple[str, ...] | None = None
inbound_detectors: tuple[str, ...] | None = None
```
`decide()` feeds through `match_route` (unchanged host lookup) then
evaluates the match entries in order; if the route has no `matches` entries
all requests pass. Path `prefix` type uses segment-boundary checking
(`/api/v1` matches `/api/v1/foo` but not `/api/v10`).
### Detector interface
Each detector is a pure function:
```python
def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
...
```
`ScanResult` carries:
```python
@dataclass(frozen=True)
class ScanResult:
severity: str # "block" or "warn"
reason: str
```
`scan` returns `None` if the body is clean, `ScanResult` otherwise.
### Detector: `token_patterns`
Regex patterns for well-known credential formats, applied to the outbound
request body and `Authorization` header (before the addon strips it — the
strip happens after DLP scanning so that the scan sees any credential the
agent tried to smuggle):
| Token type | Pattern |
|------------|---------|
| AWS access key | `AKIA[0-9A-Z]{16}` |
| GitHub token (classic) | `ghp_[A-Za-z0-9_]{36}` |
| GitHub fine-grained | `github_pat_[A-Za-z0-9_]{82}` |
| Anthropic API key | `sk-ant-[A-Za-z0-9\-_]{93}` |
| OpenAI API key | `sk-[A-Za-z0-9]{48}` |
| Stripe live key | `sk_live_[A-Za-z0-9]{24}` |
| Generic Bearer JWT | `Bearer\s+[A-Za-z0-9._\-]{50,}` |
Action: `"block"` on any match. No tolerance — a credential in an outbound
request is always a violation.
### Detector: `known_secrets`
At request time the egress addon has access to `os.environ`, which includes
all `token_env` values declared by route auth blocks. The detector:
1. Collects all `EGRESS_TOKEN_*` values from the environment (the naming
contract established by `manifest_egress.py`'s `TokenRef` rendering).
2. For each secret value, derives encoded variants: raw, base64, URL-encoded,
hex.
3. Scans the outbound request body for any variant.
Action: `"block"` on match.
This detector does **not** accept a custom detector name in the YAML — it
is always named `known_secrets`. The environment is passed in via the `env`
keyword argument to `scan`.
### Detector: `naive_injection_detection`
Pattern-based inbound response scanner. Uses two tiers:
**Tier 1 — BLOCK (credential + disclosure together):**
- Response contains a token-pattern match (reuses `token_patterns` regex
set) AND a prompt-disclosure phrase (e.g., `system prompt`, `my instructions
are`, `hidden rules`).
**Tier 2 — WARN (multiple jailbreak signals):**
- Two or more jailbreak phrases detected (e.g., `ignore previous`,
`forget everything`, `pretend you are`, `act as`).
- OR explicit prompt disclosure (`system prompt:`) without a credential.
**Tier 3 — ALLOW:**
- Single jailbreak keyword without additional context.
- Common documentation phrases.
See the DLP research doc for the full phrase lists and pseudocode.
### Wiring into `egress_addon.py`
Two new mitmproxy hooks are added alongside the existing `request` hook:
```python
def request(self, flow: http.HTTPFlow) -> None:
# ... existing match + auth-injection logic ...
# After route decision, if action == "forward":
result = scan_outbound(route, flow.request, os.environ)
if result and result.severity == "block":
flow.response = http.Response.make(403, result.reason.encode(), ...)
return
def response(self, flow: http.HTTPFlow) -> None:
route = match_route(self.routes, flow.request.pretty_host)
if route is None:
return # already blocked at request time
result = scan_inbound(route, flow.response)
if result and result.severity == "block":
flow.response = http.Response.make(403, result.reason.encode(), ...)
elif result and result.severity == "warn":
sys.stderr.write(f"egress DLP warn: {result.reason}\n")
```
`scan_outbound` and `scan_inbound` are pure functions in
`egress_addon_core.py` that dispatch to the per-route detector list.
### Ordering: auth strip vs. DLP scan
The DLP outbound scan sees the *agent's original* `Authorization` header
before the addon strips it. This ensures that a token the agent smuggled
in the header is caught. The strip + optional re-injection still happens
afterward, preserving the existing credential-injection security model.
## Implementation chunks
1. **New `matches` block + `EgressRoute` / `Route` restructure.**
Remove `path_allowlist` from `manifest_egress.py` and `egress_addon_core.py`.
Add `MatchEntry`, `PathMatch`, `HeaderMatch` types. Parse `matches` in
`EgressRoute.from_dict` and `_parse_one`; unknown-key rejection handles
old `path_allowlist` manifests. Add `OutboundDetectors` / `InboundDetectors`
to `EgressRoute` and `Route`; parse `dlp` block. Extend
`tests/unit/test_manifest_egress.py` and `tests/unit/test_egress_addon_core.py`
with match and dlp valid/invalid cases.
2. **Token-patterns detector (Phase 1a).**
New module `bot_bottle/dlp_detectors.py` (host-importable) and
companion flat copy for the sidecar bundle. Add `TokenPatternsDetector`
with the regex set above. Wire `scan_outbound` into the `request` hook
in `egress_addon.py`. Unit tests in `tests/unit/test_dlp_detectors.py`.
3. **Known-secrets detector (Phase 1b).**
Add `KnownSecretsDetector` to `dlp_detectors.py`. Collect
`EGRESS_TOKEN_*` from env; derive encoded variants; scan request body.
Extend unit tests. Wire into `scan_outbound`.
4. **Naive prompt injection detector (Phase 2).**
Add `NaiveInjectionDetector` to `dlp_detectors.py`. Wire
`scan_inbound` into the new `response` hook in `egress_addon.py`.
Extend unit tests. Activate PRD 0053 (`Status: Draft → Active`) in
this commit.
## Open questions
1. **Response body buffering:** mitmproxy's `response` hook already has
the full body for non-streaming responses. For streaming (chunked)
responses the body may be empty or incomplete at hook time. Scope for
now: log a warning and skip scanning on streaming responses; revisit
if needed.
2. **Encoding breadth for `known_secrets`:** Start with raw + base64 +
URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts
appear.
3. **`EGRESS_TOKEN_*` naming contract:** The detector relies on the
env-var naming convention from `manifest_egress.py`. If that contract
changes, the detector must be updated in lock-step.