PRD 0053: Egress DLP addon #196
Reference in New Issue
Block a user
Delete Branch "prd-0053-egress-dlp-addon"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #195.
PRD 0053
Summary
With pipelock removed (PR #193), the egress proxy has no DLP scanning. This PRD defines the replacement: a per-route DLP addon built directly into the mitmproxy egress sidecar, following the architecture recommended in the research doc (PR #192).
Three implementation chunks after the manifest schema:
dlpblock — newdlp:key onegress.routesentries controls which detectors run outbound/inbound per route; omittingdlpkeeps all detectors on (default-on, backward-compatible).token_patternsdetector (regex for AWS/GitHub/Anthropic/etc. keys) blocks outbound requests containing known credential formats.known_secretsdetector checks provisionedEGRESS_TOKEN_*env vars and encoded variants in outbound request bodies.naive_injection_detectiondetector blocks/warns on inbound responses containing credential+disclosure combos or multiple jailbreak phrases.All detector logic is pure Python in
egress_addon_core.py/ a newdlp_detectors.py, unit-testable on the host without a mitmproxy dependency.Also replaces
path_allowlistwith Gateway API HTTPRoute match vocabulary (matchesblock with paths/methods/headers and AND/OR semantics).Merge rule(s)
All chunks implemented in a single commit. PRD status flipped Draft → Active.
@@ -0,0 +473,4 @@1. **Backward compatibility:** `path_allowlist` is the current field. Ifadopting a `match`/`matches` structure, keep `path_allowlist` as adeprecated alias? Or treat this as a breaking manifest version bump?Treat it as a breaking version bump/do not preserve anything about previous behavior or manifests. Also do not bother creating loud exceptions or looking for the old format.
@@ -0,0 +476,4 @@deprecated alias? Or treat this as a breaking manifest version bump?2. **Glob segment semantics:** adopt shell convention (`*` = intra-segment,`**` = cross-segment) or ALB convention (`*` = anything including `/`)?The shell convention is safer; ALB's is simpler.Drop the glob actually, not strictly necessary/can just use regex
@@ -0,0 +481,4 @@header values. ALB allows multiple values in one condition. Which isless surprising for bot-bottle operators? The ALB approach is moreconcise for the common case (e.g., `Content-Type: [application/json,application/x-www-form-urlencoded]`).Stick with Gateway API
@@ -0,0 +483,4 @@concise for the common case (e.g., `Content-Type: [application/json,application/x-www-form-urlencoded]`).4. **Case sensitivity on method names:** normalize to uppercase at parsetime (fail on unrecognised values) or case-insensitively?Case insensitivity
Got it — hard cutover, no compatibility shim, no fallback parsing for the old
path_allowlistshape.Dropping glob. Will update the research doc and PRD to remove it —
regextype covers every case glob would handle.Sticking with Gateway API — OR across header values means a separate match entry, not multiple values in one header block.
Case-insensitive at parse time —
get,GET,Getall accepted and normalised to uppercase internally.@@ -11,3 +5,1 @@Raises EgressApplyError on any failure — the dashboardsurfaces the message and keeps the proposal pending so theoperator can retry.egress-block proposal. Fetches current routes.yaml, validates,Reading through all of
egress_apply.pymakes me think we should just nix this... if a user needs to change the routes for a bottle (or anything else for the bottle), I think they should just change the bottle config. Couple reasons for that:yto everything the bottle asks for/friction isn't exactly badI think a better solution for adding something to a bottle during a workflow is probably more like:
This means there will need to be a strict separation between what in the manifest results in a sidecar change vs what in the manifest results in an agent image change. But I think even agent changes could probably just be applied to the running bottle? Maybe? Will need to think more about that.
@didericis-claude yes, open a new issue.
Opened #198.
@@ -0,0 +111,4 @@def scan_naive_injection(text: str) -> ScanResult | None:disclosure = any(p.search(text) for p in DISCLOSURE_PHRASES)token = scan_token_patterns(text) is not NoneAnything with a token will already be caught by the token detector. Calling it again in here is silly. Let's do the following instead:
@@ -123,1 +65,3 @@routes are merged."""from .egress_addon_core import MatchEntry as CoreMatchEntryfrom .egress_addon_core import PathMatch as CorePathMatchfrom .egress_addon_core import HeaderMatch as CoreHeaderMatchWhy not import these at the top of the module?
0c285b1712to52820278fd