Adds the product requirements document for replacing pipelock's DLP capability with a per-route mitmproxy addon. Covers three implementation chunks: token-pattern detection, known-secret detection, and naive prompt injection scanning. References the research in PR #192 and issue #195.
11 KiB
PRD 0053: Egress DLP addon
- Status: Draft
- Author: claude
- Created: 2026-06-05
- Issue: #195
Summary
With pipelock removed (PR #193), the egress proxy no longer performs DLP
scanning on traffic to or from the agent. This PRD implements a replacement
directly inside the mitmproxy egress addon: per-route DLP detectors that
scan outbound requests for credential leakage and inbound responses for
prompt injection attempts. Configuration is expressed as a new dlp block
on each egress.routes entry in the bottle manifest.
The design follows the recommendation in the DLP research document (PR #192) and covers all three remaining implementation phases from that plan:
- Token pattern detection (Phase 1a)
- Known-secrets detection (Phase 1b)
- Naive prompt injection detection (Phase 2)
Problem
Pipelock was removed because it could not support per-route response
scanning, blocking selective DLP policies (e.g., skip scanning .whl
downloads while keeping scanning on API calls). Removing it left the egress
proxy with no DLP capability at all. The egress addon already holds per-route
logic for path allowlisting and credential injection; DLP rules belong in the
same place.
Goals / Success Criteria
- Outbound request bodies and headers are scanned for known token patterns (AWS, GitHub, Anthropic, etc.) before the request reaches the upstream. Matches are blocked immediately.
- Outbound request bodies are scanned for provisioned secrets that the agent should not have direct access to. Matches are blocked immediately.
- Inbound response bodies are scanned for prompt disclosure and jailbreak signals. High-confidence matches are blocked; medium-confidence matches emit a log warning and are forwarded.
- DLP scanning is enabled by default on every route. Individual routes can
selectively disable outbound detectors, inbound detectors, or both via a
dlpblock in the manifest. - All detector logic lives in
egress_addon_core.py(pure Python, no mitmproxy dependency) and is covered by unit tests on the host. - Adding
dlpconfiguration to a route that omits it entirely is backward-compatible — the route behaves as if all detectors are enabled.
Non-goals
- LLM-based semantic prompt injection detection (explicitly deferred to a potential Phase 2b per the research doc).
- Entropy-based secret detection (excluded from scope; too many false positives on binary API responses and compressed payloads).
- BIP-39 seed-phrase detection.
- Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential exfil relevant to agent containment.
- Changes to the cred-proxy sidecar.
- Streaming response scanning (scan buffered response body only).
Design
Manifest schema — dlp block
Each egress.routes entry gains an optional dlp key:
egress:
routes:
- host: api.anthropic.com
# dlp omitted → all detectors on (default)
- host: files.pythonhosted.org
dlp:
inbound_detectors: false # skip response scanning (binary downloads)
- host: internal-docs.corp
dlp:
outbound_detectors: false
inbound_detectors: false # trusted internal, no scanning
outbound_detectors controls scanning of the request body + headers
leaving the agent. inbound_detectors controls scanning of the response
body arriving from the upstream.
Valid values per field:
- Omitted (or
null) — default: all detectors active. false— scanning disabled for this direction on this route.- A list of detector names — only the listed detectors run.
Named outbound detectors: token_patterns, known_secrets.
Named inbound detectors: naive_injection_detection.
The manifest parser (manifest_egress.py) validates the dlp block and
rejects unknown detector names.
EgressRoute changes
EgressRoute gains two new fields:
@dataclass(frozen=True)
class EgressRoute:
Host: str
PathAllowlist: tuple[str, ...] = ()
AuthScheme: str = ""
TokenRef: str = ""
Role: tuple[str, ...] = ()
OutboundDetectors: tuple[str, ...] | None = None # None = all enabled
InboundDetectors: tuple[str, ...] | None = None # None = all enabled
None means "use defaults" (all active); an empty tuple[str, ...] means
"disabled". Named detectors use tuple[str, ...] with the detector name.
manifest_egress.py uses from_dict to parse the new dlp block and
populate these fields; unknown keys inside dlp are rejected.
Route changes in egress_addon_core.py
The addon-side Route dataclass mirrors the manifest-side change:
@dataclass(frozen=True)
class Route:
host: str
path_allowlist: tuple[str, ...] = ()
auth_scheme: str = ""
token_env: str = ""
outbound_detectors: tuple[str, ...] | None = None
inbound_detectors: tuple[str, ...] | None = None
parse_routes / _parse_one grow the corresponding parsing logic.
Detector interface
Each detector is a pure function:
def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
...
ScanResult carries:
@dataclass(frozen=True)
class ScanResult:
severity: str # "block" or "warn"
reason: str
scan returns None if the body is clean, ScanResult otherwise.
Detector: token_patterns
Regex patterns for well-known credential formats, applied to the outbound
request body and Authorization header (before the addon strips it — the
strip happens after DLP scanning so that the scan sees any credential the
agent tried to smuggle):
| Token type | Pattern |
|---|---|
| AWS access key | AKIA[0-9A-Z]{16} |
| GitHub token (classic) | ghp_[A-Za-z0-9_]{36} |
| GitHub fine-grained | github_pat_[A-Za-z0-9_]{82} |
| Anthropic API key | sk-ant-[A-Za-z0-9\-_]{93} |
| OpenAI API key | sk-[A-Za-z0-9]{48} |
| Stripe live key | sk_live_[A-Za-z0-9]{24} |
| Generic Bearer JWT | Bearer\s+[A-Za-z0-9._\-]{50,} |
Action: "block" on any match. No tolerance — a credential in an outbound
request is always a violation.
Detector: known_secrets
At request time the egress addon has access to os.environ, which includes
all token_env values declared by route auth blocks. The detector:
- Collects all
EGRESS_TOKEN_*values from the environment (the naming contract established bymanifest_egress.py'sTokenRefrendering). - For each secret value, derives encoded variants: raw, base64, URL-encoded, hex.
- Scans the outbound request body for any variant.
Action: "block" on match.
This detector does not accept a custom detector name in the YAML — it
is always named known_secrets. The environment is passed in via the env
keyword argument to scan.
Detector: naive_injection_detection
Pattern-based inbound response scanner. Uses two tiers:
Tier 1 — BLOCK (credential + disclosure together):
- Response contains a token-pattern match (reuses
token_patternsregex set) AND a prompt-disclosure phrase (e.g.,system prompt,my instructions are,hidden rules).
Tier 2 — WARN (multiple jailbreak signals):
- Two or more jailbreak phrases detected (e.g.,
ignore previous,forget everything,pretend you are,act as). - OR explicit prompt disclosure (
system prompt:) without a credential.
Tier 3 — ALLOW:
- Single jailbreak keyword without additional context.
- Common documentation phrases.
See the research doc for the full phrase lists and pseudocode.
Wiring into egress_addon.py
Two new mitmproxy hooks are added alongside the existing request hook:
def request(self, flow: http.HTTPFlow) -> None:
# ... existing path-allowlist + auth-injection logic ...
# After route decision, if action == "forward":
result = scan_outbound(route, flow.request, os.environ)
if result and result.severity == "block":
flow.response = http.Response.make(403, result.reason.encode(), ...)
return
def response(self, flow: http.HTTPFlow) -> None:
route = match_route(self.routes, flow.request.pretty_host)
if route is None:
return # already blocked at request time
result = scan_inbound(route, flow.response)
if result and result.severity == "block":
flow.response = http.Response.make(403, result.reason.encode(), ...)
elif result and result.severity == "warn":
sys.stderr.write(f"egress DLP warn: {result.reason}\n")
scan_outbound and scan_inbound are pure functions in
egress_addon_core.py that dispatch to the per-route detector list.
Ordering: auth strip vs. DLP scan
The DLP outbound scan sees the agent's original Authorization header
before the addon strips it. This ensures that a token the agent smuggled
in the header is caught. The strip + optional re-injection still happens
afterward, preserving the existing credential-injection security model.
Implementation chunks
-
Manifest
dlpblock +EgressRoutefields.manifest_egress.py: parsedlp, addOutboundDetectors/InboundDetectorstoEgressRoute. Extendtests/unit/test_manifest_egress.pywithdlpvalid/invalid cases.egress_addon_core.py: addoutbound_detectors/inbound_detectorstoRoute; update_parse_oneandparse_routes; extendtests/unit/test_egress_addon_core.py. -
Token-patterns detector (Phase 1a). New module
bot_bottle/dlp_detectors.py(host-importable) and companion flat copy for the sidecar bundle. AddTokenPatternsDetectorwith the regex set above. Wirescan_outboundinto therequesthook inegress_addon.py. Unit tests intests/unit/test_dlp_detectors.py. -
Known-secrets detector (Phase 1b). Add
KnownSecretsDetectortodlp_detectors.py. CollectEGRESS_TOKEN_*from env; derive encoded variants; scan request body. Extend unit tests. Wire intoscan_outbound. -
Naive prompt injection detector (Phase 2). Add
NaiveInjectionDetectortodlp_detectors.py. Wirescan_inboundinto the newresponsehook inegress_addon.py. Extend unit tests. Activate PRD 0053 (Status: Draft → Active) in this commit.
Open questions
- Response body buffering: mitmproxy's
responsehook already has the full body for non-streaming responses. For streaming (chunked) responses the body may be empty or incomplete at hook time. Scope for now: log a warning and skip scanning on streaming responses; revisit if needed. - Encoding breadth for
known_secrets: Start with raw + base64 + URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts appear. EGRESS_TOKEN_*naming contract: The detector relies on the env-var naming convention frommanifest_egress.py. If that contract changes, the detector must be updated in lock-step.