docs: PRD 0053 — egress DLP addon (token, secret, injection detection)
Adds the product requirements document for replacing pipelock's DLP capability with a per-route mitmproxy addon. Covers three implementation chunks: token-pattern detection, known-secret detection, and naive prompt injection scanning. References the research in PR #192 and issue #195.
This commit is contained in:
@@ -0,0 +1,291 @@
|
||||
# PRD 0053: Egress DLP addon
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** claude
|
||||
- **Created:** 2026-06-05
|
||||
- **Issue:** #195
|
||||
|
||||
## Summary
|
||||
|
||||
With pipelock removed (PR #193), the egress proxy no longer performs DLP
|
||||
scanning on traffic to or from the agent. This PRD implements a replacement
|
||||
directly inside the mitmproxy egress addon: per-route DLP detectors that
|
||||
scan outbound requests for credential leakage and inbound responses for
|
||||
prompt injection attempts. Configuration is expressed as a new `dlp` block
|
||||
on each `egress.routes` entry in the bottle manifest.
|
||||
|
||||
The design follows the recommendation in the [DLP research document
|
||||
(PR #192)](https://gitea.dideric.is/didericis/bot-bottle/pulls/192) and
|
||||
covers all three remaining implementation phases from that plan:
|
||||
|
||||
1. Token pattern detection (Phase 1a)
|
||||
2. Known-secrets detection (Phase 1b)
|
||||
3. Naive prompt injection detection (Phase 2)
|
||||
|
||||
## Problem
|
||||
|
||||
Pipelock was removed because it could not support per-route response
|
||||
scanning, blocking selective DLP policies (e.g., skip scanning `.whl`
|
||||
downloads while keeping scanning on API calls). Removing it left the egress
|
||||
proxy with no DLP capability at all. The egress addon already holds per-route
|
||||
logic for path allowlisting and credential injection; DLP rules belong in the
|
||||
same place.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
1. Outbound request bodies and headers are scanned for known token patterns
|
||||
(AWS, GitHub, Anthropic, etc.) before the request reaches the upstream.
|
||||
Matches are blocked immediately.
|
||||
2. Outbound request bodies are scanned for provisioned secrets that the
|
||||
agent should not have direct access to. Matches are blocked immediately.
|
||||
3. Inbound response bodies are scanned for prompt disclosure and jailbreak
|
||||
signals. High-confidence matches are blocked; medium-confidence matches
|
||||
emit a log warning and are forwarded.
|
||||
4. DLP scanning is enabled by default on every route. Individual routes can
|
||||
selectively disable outbound detectors, inbound detectors, or both via a
|
||||
`dlp` block in the manifest.
|
||||
5. All detector logic lives in `egress_addon_core.py` (pure Python, no
|
||||
mitmproxy dependency) and is covered by unit tests on the host.
|
||||
6. Adding `dlp` configuration to a route that omits it entirely is
|
||||
backward-compatible — the route behaves as if all detectors are enabled.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- LLM-based semantic prompt injection detection (explicitly deferred to a
|
||||
potential Phase 2b per the research doc).
|
||||
- Entropy-based secret detection (excluded from scope; too many false
|
||||
positives on binary API responses and compressed payloads).
|
||||
- BIP-39 seed-phrase detection.
|
||||
- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential
|
||||
exfil relevant to agent containment.
|
||||
- Changes to the cred-proxy sidecar.
|
||||
- Streaming response scanning (scan buffered response body only).
|
||||
|
||||
## Design
|
||||
|
||||
### Manifest schema — `dlp` block
|
||||
|
||||
Each `egress.routes` entry gains an optional `dlp` key:
|
||||
|
||||
```yaml
|
||||
egress:
|
||||
routes:
|
||||
- host: api.anthropic.com
|
||||
# dlp omitted → all detectors on (default)
|
||||
|
||||
- host: files.pythonhosted.org
|
||||
dlp:
|
||||
inbound_detectors: false # skip response scanning (binary downloads)
|
||||
|
||||
- host: internal-docs.corp
|
||||
dlp:
|
||||
outbound_detectors: false
|
||||
inbound_detectors: false # trusted internal, no scanning
|
||||
```
|
||||
|
||||
`outbound_detectors` controls scanning of the *request* body + headers
|
||||
leaving the agent. `inbound_detectors` controls scanning of the *response*
|
||||
body arriving from the upstream.
|
||||
|
||||
Valid values per field:
|
||||
- Omitted (or `null`) — default: all detectors active.
|
||||
- `false` — scanning disabled for this direction on this route.
|
||||
- A list of detector names — only the listed detectors run.
|
||||
|
||||
Named outbound detectors: `token_patterns`, `known_secrets`.
|
||||
Named inbound detectors: `naive_injection_detection`.
|
||||
|
||||
The manifest parser (`manifest_egress.py`) validates the `dlp` block and
|
||||
rejects unknown detector names.
|
||||
|
||||
### `EgressRoute` changes
|
||||
|
||||
`EgressRoute` gains two new fields:
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class EgressRoute:
|
||||
Host: str
|
||||
PathAllowlist: tuple[str, ...] = ()
|
||||
AuthScheme: str = ""
|
||||
TokenRef: str = ""
|
||||
Role: tuple[str, ...] = ()
|
||||
OutboundDetectors: tuple[str, ...] | None = None # None = all enabled
|
||||
InboundDetectors: tuple[str, ...] | None = None # None = all enabled
|
||||
```
|
||||
|
||||
`None` means "use defaults" (all active); an empty `tuple[str, ...]` means
|
||||
"disabled". Named detectors use `tuple[str, ...]` with the detector name.
|
||||
|
||||
`manifest_egress.py` uses `from_dict` to parse the new `dlp` block and
|
||||
populate these fields; unknown keys inside `dlp` are rejected.
|
||||
|
||||
### `Route` changes in `egress_addon_core.py`
|
||||
|
||||
The addon-side `Route` dataclass mirrors the manifest-side change:
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class Route:
|
||||
host: str
|
||||
path_allowlist: tuple[str, ...] = ()
|
||||
auth_scheme: str = ""
|
||||
token_env: str = ""
|
||||
outbound_detectors: tuple[str, ...] | None = None
|
||||
inbound_detectors: tuple[str, ...] | None = None
|
||||
```
|
||||
|
||||
`parse_routes` / `_parse_one` grow the corresponding parsing logic.
|
||||
|
||||
### Detector interface
|
||||
|
||||
Each detector is a pure function:
|
||||
|
||||
```python
|
||||
def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
|
||||
...
|
||||
```
|
||||
|
||||
`ScanResult` carries:
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class ScanResult:
|
||||
severity: str # "block" or "warn"
|
||||
reason: str
|
||||
```
|
||||
|
||||
`scan` returns `None` if the body is clean, `ScanResult` otherwise.
|
||||
|
||||
### Detector: `token_patterns`
|
||||
|
||||
Regex patterns for well-known credential formats, applied to the outbound
|
||||
request body and `Authorization` header (before the addon strips it — the
|
||||
strip happens after DLP scanning so that the scan sees any credential the
|
||||
agent tried to smuggle):
|
||||
|
||||
| Token type | Pattern |
|
||||
|------------|---------|
|
||||
| AWS access key | `AKIA[0-9A-Z]{16}` |
|
||||
| GitHub token (classic) | `ghp_[A-Za-z0-9_]{36}` |
|
||||
| GitHub fine-grained | `github_pat_[A-Za-z0-9_]{82}` |
|
||||
| Anthropic API key | `sk-ant-[A-Za-z0-9\-_]{93}` |
|
||||
| OpenAI API key | `sk-[A-Za-z0-9]{48}` |
|
||||
| Stripe live key | `sk_live_[A-Za-z0-9]{24}` |
|
||||
| Generic Bearer JWT | `Bearer\s+[A-Za-z0-9._\-]{50,}` |
|
||||
|
||||
Action: `"block"` on any match. No tolerance — a credential in an outbound
|
||||
request is always a violation.
|
||||
|
||||
### Detector: `known_secrets`
|
||||
|
||||
At request time the egress addon has access to `os.environ`, which includes
|
||||
all `token_env` values declared by route auth blocks. The detector:
|
||||
|
||||
1. Collects all `EGRESS_TOKEN_*` values from the environment (the naming
|
||||
contract established by `manifest_egress.py`'s `TokenRef` rendering).
|
||||
2. For each secret value, derives encoded variants: raw, base64, URL-encoded,
|
||||
hex.
|
||||
3. Scans the outbound request body for any variant.
|
||||
|
||||
Action: `"block"` on match.
|
||||
|
||||
This detector does **not** accept a custom detector name in the YAML — it
|
||||
is always named `known_secrets`. The environment is passed in via the `env`
|
||||
keyword argument to `scan`.
|
||||
|
||||
### Detector: `naive_injection_detection`
|
||||
|
||||
Pattern-based inbound response scanner. Uses two tiers:
|
||||
|
||||
**Tier 1 — BLOCK (credential + disclosure together):**
|
||||
- Response contains a token-pattern match (reuses `token_patterns` regex
|
||||
set) AND a prompt-disclosure phrase (e.g., `system prompt`, `my instructions
|
||||
are`, `hidden rules`).
|
||||
|
||||
**Tier 2 — WARN (multiple jailbreak signals):**
|
||||
- Two or more jailbreak phrases detected (e.g., `ignore previous`,
|
||||
`forget everything`, `pretend you are`, `act as`).
|
||||
- OR explicit prompt disclosure (`system prompt:`) without a credential.
|
||||
|
||||
**Tier 3 — ALLOW:**
|
||||
- Single jailbreak keyword without additional context.
|
||||
- Common documentation phrases.
|
||||
|
||||
See the research doc for the full phrase lists and pseudocode.
|
||||
|
||||
### Wiring into `egress_addon.py`
|
||||
|
||||
Two new mitmproxy hooks are added alongside the existing `request` hook:
|
||||
|
||||
```python
|
||||
def request(self, flow: http.HTTPFlow) -> None:
|
||||
# ... existing path-allowlist + auth-injection logic ...
|
||||
# After route decision, if action == "forward":
|
||||
result = scan_outbound(route, flow.request, os.environ)
|
||||
if result and result.severity == "block":
|
||||
flow.response = http.Response.make(403, result.reason.encode(), ...)
|
||||
return
|
||||
|
||||
def response(self, flow: http.HTTPFlow) -> None:
|
||||
route = match_route(self.routes, flow.request.pretty_host)
|
||||
if route is None:
|
||||
return # already blocked at request time
|
||||
result = scan_inbound(route, flow.response)
|
||||
if result and result.severity == "block":
|
||||
flow.response = http.Response.make(403, result.reason.encode(), ...)
|
||||
elif result and result.severity == "warn":
|
||||
sys.stderr.write(f"egress DLP warn: {result.reason}\n")
|
||||
```
|
||||
|
||||
`scan_outbound` and `scan_inbound` are pure functions in
|
||||
`egress_addon_core.py` that dispatch to the per-route detector list.
|
||||
|
||||
### Ordering: auth strip vs. DLP scan
|
||||
|
||||
The DLP outbound scan sees the *agent's original* `Authorization` header
|
||||
before the addon strips it. This ensures that a token the agent smuggled
|
||||
in the header is caught. The strip + optional re-injection still happens
|
||||
afterward, preserving the existing credential-injection security model.
|
||||
|
||||
## Implementation chunks
|
||||
|
||||
1. **Manifest `dlp` block + `EgressRoute` fields.**
|
||||
`manifest_egress.py`: parse `dlp`, add `OutboundDetectors` /
|
||||
`InboundDetectors` to `EgressRoute`. Extend
|
||||
`tests/unit/test_manifest_egress.py` with `dlp` valid/invalid cases.
|
||||
`egress_addon_core.py`: add `outbound_detectors` / `inbound_detectors`
|
||||
to `Route`; update `_parse_one` and `parse_routes`; extend
|
||||
`tests/unit/test_egress_addon_core.py`.
|
||||
|
||||
2. **Token-patterns detector (Phase 1a).**
|
||||
New module `bot_bottle/dlp_detectors.py` (host-importable) and
|
||||
companion flat copy for the sidecar bundle. Add `TokenPatternsDetector`
|
||||
with the regex set above. Wire `scan_outbound` into the `request` hook
|
||||
in `egress_addon.py`. Unit tests in
|
||||
`tests/unit/test_dlp_detectors.py`.
|
||||
|
||||
3. **Known-secrets detector (Phase 1b).**
|
||||
Add `KnownSecretsDetector` to `dlp_detectors.py`. Collect
|
||||
`EGRESS_TOKEN_*` from env; derive encoded variants; scan request body.
|
||||
Extend unit tests. Wire into `scan_outbound`.
|
||||
|
||||
4. **Naive prompt injection detector (Phase 2).**
|
||||
Add `NaiveInjectionDetector` to `dlp_detectors.py`. Wire
|
||||
`scan_inbound` into the new `response` hook in `egress_addon.py`.
|
||||
Extend unit tests. Activate PRD 0053 (`Status: Draft → Active`) in
|
||||
this commit.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Response body buffering:** mitmproxy's `response` hook already has
|
||||
the full body for non-streaming responses. For streaming (chunked)
|
||||
responses the body may be empty or incomplete at hook time. Scope for
|
||||
now: log a warning and skip scanning on streaming responses; revisit
|
||||
if needed.
|
||||
2. **Encoding breadth for `known_secrets`:** Start with raw + base64 +
|
||||
URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts
|
||||
appear.
|
||||
3. **`EGRESS_TOKEN_*` naming contract:** The detector relies on the
|
||||
env-var naming convention from `manifest_egress.py`. If that contract
|
||||
changes, the detector must be updated in lock-step.
|
||||
Reference in New Issue
Block a user