Commit Graph

10 Commits

Author SHA1 Message Date
didericis-codex fcfe2f0849 fix(tests): remove unused dlp entropy import
lint / lint (push) Successful in 2m14s
test / unit (pull_request) Successful in 47s
test / integration (pull_request) Successful in 28s
2026-06-25 02:48:49 +00:00
didericis-claude 330e836085 feat(dlp): fragmentation resistance, entropy detector, broadened known-value scan
- _alnum_projection(): strip non-alphanumeric chars for separator-injection detection
- scan_known_secrets() gains two extra passes per secret after exact-variant matching:
  alnum-projection exact match (catches hyphens/spaces between secret chars) and a
  sliding-window partial-match scan (catches chunked substrings ≥ PARTIAL_MATCH_MIN_LEN)
- scan_known_secrets() accepts sensitive_prefixes param (default ("EGRESS_TOKEN_",))
  so redact_tokens and call-sites can extend the scanned env-var prefix set
- scan_entropy() warn-only detector flagging windows with Shannon entropy ≥ 5.5 bits/char
- "entropy" added to OUTBOUND_DETECTOR_NAMES; scan_outbound opts it in only when
  explicitly listed in dlp.outbound_detectors (never part of the default "all" set)
- scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ to extend
  scan_known_secrets beyond EGRESS_TOKEN_* without schema changes
- Binary bodies decoded via latin-1 fallback (bijective byte↔codepoint) instead
  of utf-8 errors=replace, preserving ASCII secret strings in binary payloads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-24 22:45:51 -04:00
didericis b411577e76 Stop scanning the request body for CRLF injection
lint / lint (push) Successful in 1m41s
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 18s
A 403 "egress DLP: URL-encoded CRLF (%0d%0a)" was firing on legitimate
requests (e.g. the Claude Code login flow) and bypassing the on-match
policy entirely, because CRLF blocks carry no matched value and were
routed straight to a hard 403.

Root cause: CRLF injection is only an attack in the request line and
headers. An HTTP body is delimited by Content-Length, so CRLF bytes in
the body cannot split the request — but the scan flattened the body into
the same blob it checked, so form-encoded / multi-line body content
(which legitimately contains %0d%0a) tripped it.

Fix:
- scan_outbound takes a crlf_text param; the addon scans CRLF only over
  the body-excluded request line + headers. crlf_text=None keeps the
  old full-blob behavior for host-side callers/tests; the websocket path
  passes "" since a data frame is not a request line.
- The redact policy now also scrubs CRLF (new strip_crlf helper) from the
  path and headers, so redact is a complete escape hatch and structural
  CRLF in the URL/headers can be forwarded when a route opts into it.

Tests: strip_crlf unit tests; scan_outbound crlf_text body-exclusion and
backward-compat tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 20:37:26 -04:00
didericis 7f2352287e PRD 0062: supervisor override for egress token blocks
lint / lint (push) Successful in 1m42s
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 16s
When the outbound DLP catches a token, route the block through the
existing supervisor approval queue instead of returning 403 outright.
The egress proxy holds the request open until the operator answers, then
remembers an approved value for the life of the proxy so the request --
and later ones carrying it -- flow through. Fails closed on rejection,
timeout, malformed response, or when supervise is disabled.

- ScanResult.matched carries the raw matched substring (sidecar-only;
  never logged or written to the proposal). scan_outbound and the token
  detectors take a safe_tokens set and skip approved values, continuing
  past a safelisted match so a second secret in the same request is
  still caught.
- New egress-token-allow proposal tool, written directly to the queue by
  the addon (the gitleaks-allow pattern from PRD 0061). build_token_allow
  _payload renders host/method/path/detector reason + redacted context.
- Async request hook polls the queue without stalling the proxy event
  loop; EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS (default 300) bounds the wait.
- Supervisor TUI renders egress-token-allow like gitleaks-allow: report
  only, modify unavailable, approval requires a recorded reason.
- Unit tests for the matched/safe-tokens plumbing, payload builder, tool
  constant round-trip, and TUI paths; README + PRD 0062.

Closes #261.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 16:12:50 -04:00
didericis-claude 451e6fc2fc feat(dlp): add 7 token patterns, Unicode normalization, CRLF injection detection (PRD 0053)
Token patterns: HuggingFace (hf_), Databricks (dapi), Slack (xox[baprs]-),
npm (npm_), SendGrid (SG.x.y), PyPI (pypi-), HashiCorp Vault (hvs.).

Unicode normalization (_normalize_text) applies NFKD + strips combining
marks and control chars before pattern matching, defeating fullwidth-char
and combining-mark evasion.

CRLF injection (scan_crlf_injection) detects %0d%0a in URLs and literal
\r\n header-injection patterns; runs unconditionally in scan_outbound
regardless of outbound_detectors config.
2026-06-07 23:19:11 -04:00
didericis-claude 1ecef55fea feat(dlp): websocket scanning, response headers, extended encoding variants, sk-proj pattern (PRD 0053) 2026-06-07 23:19:11 -04:00
didericis 86b0a4d285 feat(egress): add location, context snippets, and token redaction to DLP logging
Each DLP block/warn now reports where the match was found (body,
authorization header, response body) and includes a context snippet:
SNIPPET_CONTEXT chars before and after the match, with the matched
value replaced by REDACT ("********").

scan_token_patterns/scan_known_secrets/scan_naive_injection all gain
`location` and `context` fields on their ScanResult returns. The
outbound scanner takes `auth_header` as a separate kwarg so the two
locations are scanned and reported independently.

redact_tokens() is added to dlp_detectors and used in egress_addon.py
to scrub token patterns and provisioned secrets from host/path fields
before they appear in any log output (level 1 and 2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 14:41:27 -04:00
didericis ab528d9163 fix(types): replace assertIsNotNone with assert for pyright narrowing
test / unit (push) Successful in 38s
test / integration (push) Successful in 51s
Update Quality Badges / update-badges (push) Successful in 1m11s
lint / lint (push) Successful in 1m30s
assertIsNotNone doesn't narrow Optional types; bare assert does.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 00:59:26 -04:00
didericis-claude abcb336e7c fix(dlp): rework naive injection to proximity-based disclosure+jailbreak
lint / lint (push) Failing after 1m24s
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 44s
Token detection is already handled by the token_patterns detector
running separately — calling it again from scan_naive_injection was
redundant. New logic:

- Warn on any disclosure phrase
- Warn on any jailbreak phrase
- Block when both appear within 500 chars of each other

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 20:34:21 +00:00
didericis-claude 726713d081 feat(egress): implement PRD 0053 — DLP addon with Gateway API matches
lint / lint (push) Failing after 1m43s
test / unit (pull_request) Successful in 40s
test / integration (pull_request) Successful in 50s
Replace path_allowlist with Gateway API HTTPRoute match vocabulary
(paths, methods, headers with AND/OR semantics) and add DLP scanning
to the egress proxy:

- Token pattern detection (AWS, GitHub, Anthropic, OpenAI, Stripe, JWT)
- Known secret detection (EGRESS_TOKEN_* with base64/URL/hex variants)
- Naive prompt injection detection (disclosure + credential, jailbreak)
- Per-route DLP configuration via manifest dlp block
- Inbound response scanning with block/warn severity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 19:53:23 +00:00