Commit Graph

6 Commits

Author SHA1 Message Date
didericis-claude 451e6fc2fc feat(dlp): add 7 token patterns, Unicode normalization, CRLF injection detection (PRD 0053)
Token patterns: HuggingFace (hf_), Databricks (dapi), Slack (xox[baprs]-),
npm (npm_), SendGrid (SG.x.y), PyPI (pypi-), HashiCorp Vault (hvs.).

Unicode normalization (_normalize_text) applies NFKD + strips combining
marks and control chars before pattern matching, defeating fullwidth-char
and combining-mark evasion.

CRLF injection (scan_crlf_injection) detects %0d%0a in URLs and literal
\r\n header-injection patterns; runs unconditionally in scan_outbound
regardless of outbound_detectors config.
2026-06-07 23:19:11 -04:00
didericis-claude 1ecef55fea feat(dlp): websocket scanning, response headers, extended encoding variants, sk-proj pattern (PRD 0053) 2026-06-07 23:19:11 -04:00
didericis 545ff3582f fix(lint): resolve pylint and pyright issues on egress-log-option
lint / lint (push) Failing after 1m34s
test / unit (pull_request) Successful in 32s
test / integration (pull_request) Successful in 44s
- egress.py: extract _render_match_entry helper to reduce nesting depth
- egress_addon_core.py: make request_method/request_headers keyword-only
  to satisfy too-many-positional-arguments; wrap long lazy import lines
- egress_addon.py: remove unused Route import; add pylint disable for
  import-error on sidecar-only mitmproxy/egress_addon_core imports
- dlp_detectors.py: remove dead _min_distance function (superseded by
  _closest_pair)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 20:10:32 -04:00
didericis 86b0a4d285 feat(egress): add location, context snippets, and token redaction to DLP logging
Each DLP block/warn now reports where the match was found (body,
authorization header, response body) and includes a context snippet:
SNIPPET_CONTEXT chars before and after the match, with the matched
value replaced by REDACT ("********").

scan_token_patterns/scan_known_secrets/scan_naive_injection all gain
`location` and `context` fields on their ScanResult returns. The
outbound scanner takes `auth_header` as a separate kwarg so the two
locations are scanned and reported independently.

redact_tokens() is added to dlp_detectors and used in egress_addon.py
to scrub token patterns and provisioned secrets from host/path fields
before they appear in any log output (level 1 and 2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 14:41:27 -04:00
didericis-claude abcb336e7c fix(dlp): rework naive injection to proximity-based disclosure+jailbreak
lint / lint (push) Failing after 1m24s
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 44s
Token detection is already handled by the token_patterns detector
running separately — calling it again from scan_naive_injection was
redundant. New logic:

- Warn on any disclosure phrase
- Warn on any jailbreak phrase
- Block when both appear within 500 chars of each other

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 20:34:21 +00:00
didericis-claude 726713d081 feat(egress): implement PRD 0053 — DLP addon with Gateway API matches
lint / lint (push) Failing after 1m43s
test / unit (pull_request) Successful in 40s
test / integration (pull_request) Successful in 50s
Replace path_allowlist with Gateway API HTTPRoute match vocabulary
(paths, methods, headers with AND/OR semantics) and add DLP scanning
to the egress proxy:

- Token pattern detection (AWS, GitHub, Anthropic, OpenAI, Stripe, JWT)
- Known secret detection (EGRESS_TOKEN_* with base64/URL/hex variants)
- Naive prompt injection detection (disclosure + credential, jailbreak)
- Per-route DLP configuration via manifest dlp block
- Inbound response scanning with block/warn severity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 19:53:23 +00:00