PRD: Extended outbound DLP scan surfaces #205

Merged
didericis merged 7 commits from prd-0053-extended-outbound-scan into main 2026-06-07 23:24:04 -04:00
Collaborator

Closes #204.

PRD 0055

Summary

The outbound DLP scan (PRD 0052) only covers the request body and the Authorization header. This PR extends it to four additional surfaces:

  • All request headers — any header can carry a smuggled credential (e.g. X-Api-Key, Cookie).
  • URL query parameters — e.g. ?api_key=<secret>.
  • URL path segments — e.g. /proxy/<base64-encoded-secret>/endpoint.
  • DNS-level hostname — DNS tunnelling where the secret is encoded in a subdomain label.

A new pure helper build_outbound_scan_text(host, path, query, headers, body) in egress_addon_core.py assembles the scan corpus, keeping the logic unit-testable without mitmproxy. egress_addon.py is updated to call it; the auth-strip ordering invariant from PRD 0052 is preserved.

Closes #204. [PRD 0055](https://gitea.dideric.is/didericis/bot-bottle/src/commit/83a8b6f411e69e5315362dc5a5ec0f8c8c0d1cbb/docs/prds/0055-extended-outbound-scan.md) ## Summary The outbound DLP scan (PRD 0052) only covers the request body and the `Authorization` header. This PR extends it to four additional surfaces: - **All request headers** — any header can carry a smuggled credential (e.g. `X-Api-Key`, `Cookie`). - **URL query parameters** — e.g. `?api_key=<secret>`. - **URL path segments** — e.g. `/proxy/<base64-encoded-secret>/endpoint`. - **DNS-level hostname** — DNS tunnelling where the secret is encoded in a subdomain label. A new pure helper `build_outbound_scan_text(host, path, query, headers, body)` in `egress_addon_core.py` assembles the scan corpus, keeping the logic unit-testable without mitmproxy. `egress_addon.py` is updated to call it; the auth-strip ordering invariant from PRD 0052 is preserved.
didericis-claude changed title from PRD 0053: Extended outbound DLP scan surfaces to PRD 0055: Extended outbound DLP scan surfaces 2026-06-06 16:49:01 -04:00
didericis-claude changed title from PRD 0055: Extended outbound DLP scan surfaces to PRD: Extended outbound DLP scan surfaces 2026-06-06 22:10:42 -04:00
didericis force-pushed prd-0053-extended-outbound-scan from 10236528d2 to bf8eeb8d3d 2026-06-07 22:42:35 -04:00 Compare
didericis added 7 commits 2026-06-07 23:19:16 -04:00
Token patterns: HuggingFace (hf_), Databricks (dapi), Slack (xox[baprs]-),
npm (npm_), SendGrid (SG.x.y), PyPI (pypi-), HashiCorp Vault (hvs.).

Unicode normalization (_normalize_text) applies NFKD + strips combining
marks and control chars before pattern matching, defeating fullwidth-char
and combining-mark evasion.

CRLF injection (scan_crlf_injection) detects %0d%0a in URLs and literal
\r\n header-injection patterns; runs unconditionally in scan_outbound
regardless of outbound_detectors config.
ci(prd): rename PRD to prd-new placeholder per new convention
test / unit (pull_request) Successful in 37s
test / integration (pull_request) Successful in 49s
lint / lint (push) Successful in 1m30s
prd-number / assign-numbers (push) Successful in 32s
test / unit (push) Successful in 31s
test / integration (push) Successful in 42s
Update Quality Badges / update-badges (push) Successful in 1m11s
652c8cb5a7
didericis force-pushed prd-0053-extended-outbound-scan from bf8eeb8d3d to 652c8cb5a7 2026-06-07 23:19:16 -04:00 Compare
didericis merged commit 652c8cb5a7 into main 2026-06-07 23:24:04 -04:00
didericis deleted branch prd-0053-extended-outbound-scan 2026-06-07 23:24:05 -04:00
Sign in to join this conversation.