5.2 KiB
PRD 0053: Extended outbound DLP scan surfaces
- Status: Active
- Author: claude
- Created: 2026-06-06
- Issue: #204
Summary
The outbound DLP scan introduced in PRD 0052 covers only the request body and
the Authorization header. An agent can exfiltrate a provisioned secret or
credential token via several other request surfaces that are currently
unscanned: arbitrary HTTP headers, URL query parameters, URL path segments,
and DNS-level hostnames (DNS tunnelling). This PRD closes those gaps by
extending the scan corpus to cover all four surfaces.
Problem
After PRD 0052, egress_addon.py builds the DLP scan text as:
auth_header = flow.request.headers.get("authorization", "")
scan_text = body
if auth_header:
scan_text = auth_header + "\n" + body
This leaves the following attack surfaces unscanned:
| Surface | Example exfiltration vector |
|---|---|
| Other request headers | X-Api-Key: <secret>, Cookie: token=<secret> |
| Query parameters | GET /search?api_key=<secret> |
| URL path segments | GET /proxy/<base64-secret>/endpoint |
| DNS hostname | <base64-secret>.attacker.com (DNS tunnelling) |
Each surface allows a determined agent to bypass the current scan and leak
a provisioned EGRESS_TOKEN_* value or a known token pattern to an
upstream attacker.
Goals / Success Criteria
- All four surfaces (headers, query params, path, hostname) are included in the outbound DLP scan text for every route that has outbound scanning enabled.
- A pure helper
build_outbound_scan_text(host, path, query, headers, body)inegress_addon_core.pyassembles the scan corpus so the logic is fully unit-testable without a mitmproxy dependency. - Unit tests demonstrate that
scan_outboundblocks a request when a known token pattern or provisioned secret appears in each surface independently. - No manifest schema changes — the
dlpblock'soutbound_detectorsfield continues to control which detectors run; all surfaces are scanned by whichever detectors are active. - The auth-strip ordering invariant from PRD 0052 is preserved: the
outbound scan sees the original
Authorizationheader before the addon strips it.
Non-goals
- Scanning inbound response URLs or headers (inbound scan covers response body only; response URL is the same as the outbound request URL and is already scanned there).
- Structured query-param parsing (treating
?k=vas key/value pairs for per-param matching) — scanning the raw query string is sufficient. - Changes to the
dlpblock schema or detector names. - Scanning outbound request bodies for prompt injection (inbound only, per PRD 0052 design).
Design
build_outbound_scan_text in egress_addon_core.py
A new pure function assembles all request surfaces into a single newline-
delimited string suitable for passing to scan_outbound:
def build_outbound_scan_text(
host: str,
path: str,
query: str,
headers: typing.Mapping[str, str],
body: str,
) -> str:
parts: list[str] = [host, path]
if query:
parts.append(query)
for name, value in headers.items():
parts.append(f"{name}: {value}")
if body:
parts.append(body)
return "\n".join(parts)
Why hostname in the scan corpus?
DNS tunnelling encodes data into subdomain labels
(<base64-secret>.attacker.com). The mitmproxy request hook sees the
pretty_host field before the TCP connection is fully established, so
scanning it catches this vector. Both the token_patterns and
known_secrets detectors handle encoded variants (raw, base64, URL-encoded,
hex), so the existing encoding-variant logic in _encoded_variants already
covers common DNS-tunnelling encodings.
egress_addon.py update
The narrow scan-text construction is replaced with a call to
build_outbound_scan_text, which the addon has already split path and
query from flow.request.path at the top of request():
# Build full scan corpus: hostname + path + query + all headers + body
body = flow.request.get_text(strict=False) or ""
scan_text = build_outbound_scan_text(
flow.request.pretty_host,
request_path,
query,
dict(flow.request.headers),
body,
)
dlp_result = scan_outbound(route, scan_text, os.environ)
The Authorization header is present in flow.request.headers at this
point (the strip happens below on line 115), so the auth-strip ordering
invariant is automatically preserved.
Test additions
tests/unit/test_egress_addon_core.py gains:
TestBuildOutboundScanText— verifies hostname, path, query, headers, and body each appear in the assembled text; checks that empty query and body are omitted.TestScanOutbound— verifiesscan_outboundblocks when a known token pattern appears in each surface independently (hostname, path, query, non-auth header, body), and returnsNonefor a clean request.
Implementation
Single commit:
- Add
build_outbound_scan_texttoegress_addon_core.pyand its__all__. - Update
egress_addon.pyto import and call it. - Add
TestBuildOutboundScanTextandTestScanOutboundtotests/unit/test_egress_addon_core.py. - Flip this PRD
Status: Draft → Active.