726713d081
Replace path_allowlist with Gateway API HTTPRoute match vocabulary (paths, methods, headers with AND/OR semantics) and add DLP scanning to the egress proxy: - Token pattern detection (AWS, GitHub, Anthropic, OpenAI, Stripe, JWT) - Known secret detection (EGRESS_TOKEN_* with base64/URL/hex variants) - Naive prompt injection detection (disclosure + credential, jailbreak) - Per-route DLP configuration via manifest dlp block - Inbound response scanning with block/warn severity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
416 lines
16 KiB
Markdown
416 lines
16 KiB
Markdown
# PRD 0053: Egress DLP addon
|
|
|
|
- **Status:** Active
|
|
- **Author:** claude
|
|
- **Created:** 2026-06-05
|
|
- **Issue:** #195
|
|
|
|
## Summary
|
|
|
|
With pipelock removed (PR #193), the egress proxy no longer performs DLP
|
|
scanning on traffic to or from the agent. This PRD implements a replacement
|
|
directly inside the mitmproxy egress addon: per-route DLP detectors that
|
|
scan outbound requests for credential leakage and inbound responses for
|
|
prompt injection attempts.
|
|
|
|
The manifest route schema is also upgraded in this PRD from the flat
|
|
`path_allowlist` field to a structured `matches` block modelled on the
|
|
[Kubernetes Gateway API `HTTPRoute`](https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteMatch)
|
|
match vocabulary. This upgrade is a hard cutover — no compatibility shim
|
|
for the old format. The rationale and format survey are in the
|
|
[YAML route matching formats research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md).
|
|
DLP detectors attach to the new `matches`-based routes directly.
|
|
|
|
The design follows the recommendation in the
|
|
[DLP research document (PR #192)](https://gitea.dideric.is/didericis/bot-bottle/pulls/192)
|
|
and covers all three remaining implementation phases from that plan:
|
|
|
|
1. Token pattern detection (Phase 1a)
|
|
2. Known-secrets detection (Phase 1b)
|
|
3. Naive prompt injection detection (Phase 2)
|
|
|
|
## Problem
|
|
|
|
Pipelock was removed because it could not support per-route response
|
|
scanning, blocking selective DLP policies (e.g., skip scanning `.whl`
|
|
downloads while keeping scanning on API calls). Removing it left the egress
|
|
proxy with no DLP capability at all. The egress addon already holds per-route
|
|
logic for path allowlisting and credential injection; DLP rules belong in the
|
|
same place.
|
|
|
|
The existing `path_allowlist` field is also limiting: it only supports path
|
|
prefixes, with no way to express exact-path, regex, method, or header
|
|
constraints. The Gateway API match vocabulary is a well-specified, widely
|
|
deployed standard that covers all of these without inventing new syntax.
|
|
|
|
## Goals / Success Criteria
|
|
|
|
1. Outbound request bodies and headers are scanned for known token patterns
|
|
(AWS, GitHub, Anthropic, etc.) before the request reaches the upstream.
|
|
Matches are blocked immediately.
|
|
2. Outbound request bodies are scanned for provisioned secrets that the
|
|
agent should not have direct access to. Matches are blocked immediately.
|
|
3. Inbound response bodies are scanned for prompt disclosure and jailbreak
|
|
signals. High-confidence matches are blocked; medium-confidence matches
|
|
emit a log warning and are forwarded.
|
|
4. DLP scanning is enabled by default on every route. Individual routes can
|
|
selectively disable outbound detectors, inbound detectors, or both via a
|
|
`dlp` block in the manifest.
|
|
5. All detector logic lives in `egress_addon_core.py` (pure Python, no
|
|
mitmproxy dependency) and is covered by unit tests on the host.
|
|
6. Each route's `matches` block supports path (exact/prefix/regex), HTTP
|
|
method, and header predicates using Gateway API match semantics.
|
|
7. The manifest change is a hard cutover: `path_allowlist` is removed with
|
|
no fallback, no deprecation alias, and no loud exception for old-format
|
|
manifests. Old manifests that use `path_allowlist` will fail validation
|
|
at load time with an unknown-key error (same as any other unrecognised
|
|
key today).
|
|
|
|
## Non-goals
|
|
|
|
- LLM-based semantic prompt injection detection (explicitly deferred to a
|
|
potential Phase 2b per the research doc).
|
|
- Entropy-based secret detection (excluded from scope; too many false
|
|
positives on binary API responses and compressed payloads).
|
|
- BIP-39 seed-phrase detection.
|
|
- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential
|
|
exfil relevant to agent containment.
|
|
- Changes to the cred-proxy sidecar.
|
|
- Streaming response scanning (scan buffered response body only).
|
|
- Glob-style path matching — regex covers every case glob would handle
|
|
without adding a third path-matching language.
|
|
|
|
## Design
|
|
|
|
### Route matching: Gateway API `matches` vocabulary
|
|
|
|
The existing `path_allowlist` field is replaced by a `matches` list. The
|
|
vocabulary mirrors Kubernetes Gateway API `HTTPRouteMatch` (see the
|
|
[route matching research doc](https://gitea.dideric.is/didericis/bot-bottle/src/branch/main/docs/research/yaml-route-matching-formats.md)
|
|
for a full format survey and rationale). Gateway API was chosen because it
|
|
is spec-backed, implementation-tested across multiple proxies, and its
|
|
`{type, value}` pattern is consistent and schema-validatable.
|
|
|
|
**AND/OR semantics** (same as Gateway API):
|
|
- Predicates *within* a single `matches` entry are ANDed.
|
|
- Multiple entries in the `matches` list are ORed — the route matches if
|
|
any entry matches.
|
|
|
|
```yaml
|
|
egress:
|
|
routes:
|
|
# Bare route — all traffic to this host is forwarded (no path/method/header
|
|
# constraints). Equivalent to the old path_allowlist-omitted case.
|
|
- host: api.anthropic.com
|
|
auth:
|
|
scheme: Bearer
|
|
token_ref: EGRESS_TOKEN_0
|
|
|
|
# Two match entries (OR): GET/HEAD on /packages/** OR POST on /upload
|
|
- host: files.pythonhosted.org
|
|
matches:
|
|
- paths:
|
|
- type: prefix
|
|
value: /packages/
|
|
methods: [GET, HEAD]
|
|
- paths:
|
|
- type: exact
|
|
value: /upload
|
|
methods: [POST]
|
|
dlp:
|
|
inbound_detectors: false # skip response scanning (binary downloads)
|
|
|
|
# Header + regex path — only JSON API responses on versioned endpoints
|
|
- host: internal-api.corp
|
|
matches:
|
|
- paths:
|
|
- type: regex
|
|
value: "^/v[0-9]+/"
|
|
headers:
|
|
- name: Content-Type
|
|
type: exact
|
|
value: application/json
|
|
dlp:
|
|
outbound_detectors: false
|
|
inbound_detectors: false
|
|
```
|
|
|
|
#### Path matching types
|
|
|
|
| `type` | Semantics |
|
|
|--------|-----------|
|
|
| `exact` | Full path must equal `value` exactly |
|
|
| `prefix` | Path must start with `value` at a segment boundary (matches `/api/v1` for value `/api/v1`, rejects `/api/v10`) |
|
|
| `regex` | RE2 regex; rejected at load time if pattern fails to compile. Use for wildcard needs: `/api/[^/]+/data` instead of glob |
|
|
|
|
`type` defaults to `prefix` when omitted (preserves the semantic of the
|
|
old `path_allowlist`).
|
|
|
|
#### Method matching
|
|
|
|
`methods` is a list of HTTP method names, case-insensitive at parse time —
|
|
`get`, `GET`, and `Get` are all accepted and stored as uppercase internally.
|
|
An absent or empty `methods` list means all methods are permitted.
|
|
|
|
#### Header matching
|
|
|
|
`headers` is a list of `{name, value, type}` objects. ALL listed headers
|
|
must match (AND semantics). To OR on header values, use multiple `matches`
|
|
entries.
|
|
|
|
| `type` | Semantics |
|
|
|--------|-----------|
|
|
| `exact` | Header value equals `value` (default when `type` omitted) |
|
|
| `regex` | Header value matches RE2 regex |
|
|
|
|
### Manifest schema — `dlp` block
|
|
|
|
Each `egress.routes` entry gains an optional `dlp` key alongside `matches`
|
|
and `auth`:
|
|
|
|
```yaml
|
|
egress:
|
|
routes:
|
|
- host: api.anthropic.com
|
|
# dlp omitted → all detectors on (default)
|
|
|
|
- host: files.pythonhosted.org
|
|
dlp:
|
|
inbound_detectors: false # skip response scanning (binary downloads)
|
|
|
|
- host: internal-docs.corp
|
|
dlp:
|
|
outbound_detectors: false
|
|
inbound_detectors: false # trusted internal, no scanning
|
|
```
|
|
|
|
`outbound_detectors` controls scanning of the *request* body + headers
|
|
leaving the agent. `inbound_detectors` controls scanning of the *response*
|
|
body arriving from the upstream.
|
|
|
|
Valid values per field:
|
|
- Omitted (or `null`) — default: all detectors active.
|
|
- `false` — scanning disabled for this direction on this route.
|
|
- A list of detector names — only the listed detectors run.
|
|
|
|
Named outbound detectors: `token_patterns`, `known_secrets`.
|
|
Named inbound detectors: `naive_injection_detection`.
|
|
|
|
The manifest parser (`manifest_egress.py`) validates the `dlp` block and
|
|
rejects unknown detector names.
|
|
|
|
### `EgressRoute` changes
|
|
|
|
`EgressRoute` replaces `PathAllowlist` with `Matches` and gains two new
|
|
DLP fields. `MatchEntry` captures one AND-predicate block:
|
|
|
|
```python
|
|
@dataclass(frozen=True)
|
|
class PathMatch:
|
|
type: str # "exact" | "prefix" | "regex"
|
|
value: str
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class HeaderMatch:
|
|
name: str
|
|
value: str
|
|
type: str = "exact" # "exact" | "regex"
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class MatchEntry:
|
|
paths: tuple[PathMatch, ...] = () # empty = match any path
|
|
methods: tuple[str, ...] = () # empty = match any method (uppercase)
|
|
headers: tuple[HeaderMatch, ...] = () # empty = match any headers
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class EgressRoute:
|
|
Host: str
|
|
Matches: tuple[MatchEntry, ...] = () # empty = match all requests
|
|
AuthScheme: str = ""
|
|
TokenRef: str = ""
|
|
Role: tuple[str, ...] = ()
|
|
OutboundDetectors: tuple[str, ...] | None = None # None = all enabled
|
|
InboundDetectors: tuple[str, ...] | None = None # None = all enabled
|
|
```
|
|
|
|
`manifest_egress.py`'s `from_dict` parses the new `matches` block and `dlp`
|
|
block; `path_allowlist` is no longer a recognised key and will be rejected
|
|
by the unknown-key check.
|
|
|
|
### `Route` changes in `egress_addon_core.py`
|
|
|
|
The addon-side `Route` and its helper types mirror the manifest-side changes.
|
|
`match_route` is extended to evaluate the `Matches` list:
|
|
|
|
```python
|
|
@dataclass(frozen=True)
|
|
class Route:
|
|
host: str
|
|
matches: tuple[MatchEntry, ...] = ()
|
|
auth_scheme: str = ""
|
|
token_env: str = ""
|
|
outbound_detectors: tuple[str, ...] | None = None
|
|
inbound_detectors: tuple[str, ...] | None = None
|
|
```
|
|
|
|
`decide()` feeds through `match_route` (unchanged host lookup) then
|
|
evaluates the match entries in order; if the route has no `matches` entries
|
|
all requests pass. Path `prefix` type uses segment-boundary checking
|
|
(`/api/v1` matches `/api/v1/foo` but not `/api/v10`).
|
|
|
|
### Detector interface
|
|
|
|
Each detector is a pure function:
|
|
|
|
```python
|
|
def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
|
|
...
|
|
```
|
|
|
|
`ScanResult` carries:
|
|
|
|
```python
|
|
@dataclass(frozen=True)
|
|
class ScanResult:
|
|
severity: str # "block" or "warn"
|
|
reason: str
|
|
```
|
|
|
|
`scan` returns `None` if the body is clean, `ScanResult` otherwise.
|
|
|
|
### Detector: `token_patterns`
|
|
|
|
Regex patterns for well-known credential formats, applied to the outbound
|
|
request body and `Authorization` header (before the addon strips it — the
|
|
strip happens after DLP scanning so that the scan sees any credential the
|
|
agent tried to smuggle):
|
|
|
|
| Token type | Pattern |
|
|
|------------|---------|
|
|
| AWS access key | `AKIA[0-9A-Z]{16}` |
|
|
| GitHub token (classic) | `ghp_[A-Za-z0-9_]{36}` |
|
|
| GitHub fine-grained | `github_pat_[A-Za-z0-9_]{82}` |
|
|
| Anthropic API key | `sk-ant-[A-Za-z0-9\-_]{93}` |
|
|
| OpenAI API key | `sk-[A-Za-z0-9]{48}` |
|
|
| Stripe live key | `sk_live_[A-Za-z0-9]{24}` |
|
|
| Generic Bearer JWT | `Bearer\s+[A-Za-z0-9._\-]{50,}` |
|
|
|
|
Action: `"block"` on any match. No tolerance — a credential in an outbound
|
|
request is always a violation.
|
|
|
|
### Detector: `known_secrets`
|
|
|
|
At request time the egress addon has access to `os.environ`, which includes
|
|
all `token_env` values declared by route auth blocks. The detector:
|
|
|
|
1. Collects all `EGRESS_TOKEN_*` values from the environment (the naming
|
|
contract established by `manifest_egress.py`'s `TokenRef` rendering).
|
|
2. For each secret value, derives encoded variants: raw, base64, URL-encoded,
|
|
hex.
|
|
3. Scans the outbound request body for any variant.
|
|
|
|
Action: `"block"` on match.
|
|
|
|
This detector does **not** accept a custom detector name in the YAML — it
|
|
is always named `known_secrets`. The environment is passed in via the `env`
|
|
keyword argument to `scan`.
|
|
|
|
### Detector: `naive_injection_detection`
|
|
|
|
Pattern-based inbound response scanner. Uses two tiers:
|
|
|
|
**Tier 1 — BLOCK (credential + disclosure together):**
|
|
- Response contains a token-pattern match (reuses `token_patterns` regex
|
|
set) AND a prompt-disclosure phrase (e.g., `system prompt`, `my instructions
|
|
are`, `hidden rules`).
|
|
|
|
**Tier 2 — WARN (multiple jailbreak signals):**
|
|
- Two or more jailbreak phrases detected (e.g., `ignore previous`,
|
|
`forget everything`, `pretend you are`, `act as`).
|
|
- OR explicit prompt disclosure (`system prompt:`) without a credential.
|
|
|
|
**Tier 3 — ALLOW:**
|
|
- Single jailbreak keyword without additional context.
|
|
- Common documentation phrases.
|
|
|
|
See the DLP research doc for the full phrase lists and pseudocode.
|
|
|
|
### Wiring into `egress_addon.py`
|
|
|
|
Two new mitmproxy hooks are added alongside the existing `request` hook:
|
|
|
|
```python
|
|
def request(self, flow: http.HTTPFlow) -> None:
|
|
# ... existing match + auth-injection logic ...
|
|
# After route decision, if action == "forward":
|
|
result = scan_outbound(route, flow.request, os.environ)
|
|
if result and result.severity == "block":
|
|
flow.response = http.Response.make(403, result.reason.encode(), ...)
|
|
return
|
|
|
|
def response(self, flow: http.HTTPFlow) -> None:
|
|
route = match_route(self.routes, flow.request.pretty_host)
|
|
if route is None:
|
|
return # already blocked at request time
|
|
result = scan_inbound(route, flow.response)
|
|
if result and result.severity == "block":
|
|
flow.response = http.Response.make(403, result.reason.encode(), ...)
|
|
elif result and result.severity == "warn":
|
|
sys.stderr.write(f"egress DLP warn: {result.reason}\n")
|
|
```
|
|
|
|
`scan_outbound` and `scan_inbound` are pure functions in
|
|
`egress_addon_core.py` that dispatch to the per-route detector list.
|
|
|
|
### Ordering: auth strip vs. DLP scan
|
|
|
|
The DLP outbound scan sees the *agent's original* `Authorization` header
|
|
before the addon strips it. This ensures that a token the agent smuggled
|
|
in the header is caught. The strip + optional re-injection still happens
|
|
afterward, preserving the existing credential-injection security model.
|
|
|
|
## Implementation chunks
|
|
|
|
1. **New `matches` block + `EgressRoute` / `Route` restructure.**
|
|
Remove `path_allowlist` from `manifest_egress.py` and `egress_addon_core.py`.
|
|
Add `MatchEntry`, `PathMatch`, `HeaderMatch` types. Parse `matches` in
|
|
`EgressRoute.from_dict` and `_parse_one`; unknown-key rejection handles
|
|
old `path_allowlist` manifests. Add `OutboundDetectors` / `InboundDetectors`
|
|
to `EgressRoute` and `Route`; parse `dlp` block. Extend
|
|
`tests/unit/test_manifest_egress.py` and `tests/unit/test_egress_addon_core.py`
|
|
with match and dlp valid/invalid cases.
|
|
|
|
2. **Token-patterns detector (Phase 1a).**
|
|
New module `bot_bottle/dlp_detectors.py` (host-importable) and
|
|
companion flat copy for the sidecar bundle. Add `TokenPatternsDetector`
|
|
with the regex set above. Wire `scan_outbound` into the `request` hook
|
|
in `egress_addon.py`. Unit tests in `tests/unit/test_dlp_detectors.py`.
|
|
|
|
3. **Known-secrets detector (Phase 1b).**
|
|
Add `KnownSecretsDetector` to `dlp_detectors.py`. Collect
|
|
`EGRESS_TOKEN_*` from env; derive encoded variants; scan request body.
|
|
Extend unit tests. Wire into `scan_outbound`.
|
|
|
|
4. **Naive prompt injection detector (Phase 2).**
|
|
Add `NaiveInjectionDetector` to `dlp_detectors.py`. Wire
|
|
`scan_inbound` into the new `response` hook in `egress_addon.py`.
|
|
Extend unit tests. Activate PRD 0053 (`Status: Draft → Active`) in
|
|
this commit.
|
|
|
|
## Open questions
|
|
|
|
1. **Response body buffering:** mitmproxy's `response` hook already has
|
|
the full body for non-streaming responses. For streaming (chunked)
|
|
responses the body may be empty or incomplete at hook time. Scope for
|
|
now: log a warning and skip scanning on streaming responses; revisit
|
|
if needed.
|
|
2. **Encoding breadth for `known_secrets`:** Start with raw + base64 +
|
|
URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts
|
|
appear.
|
|
3. **`EGRESS_TOKEN_*` naming contract:** The detector relies on the
|
|
env-var naming convention from `manifest_egress.py`. If that contract
|
|
changes, the detector must be updated in lock-step.
|