Stop scanning the request body for CRLF injection
lint / lint (push) Successful in 1m41s
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 18s

A 403 "egress DLP: URL-encoded CRLF (%0d%0a)" was firing on legitimate
requests (e.g. the Claude Code login flow) and bypassing the on-match
policy entirely, because CRLF blocks carry no matched value and were
routed straight to a hard 403.

Root cause: CRLF injection is only an attack in the request line and
headers. An HTTP body is delimited by Content-Length, so CRLF bytes in
the body cannot split the request — but the scan flattened the body into
the same blob it checked, so form-encoded / multi-line body content
(which legitimately contains %0d%0a) tripped it.

Fix:
- scan_outbound takes a crlf_text param; the addon scans CRLF only over
  the body-excluded request line + headers. crlf_text=None keeps the
  old full-blob behavior for host-side callers/tests; the websocket path
  passes "" since a data frame is not a request line.
- The redact policy now also scrubs CRLF (new strip_crlf helper) from the
  path and headers, so redact is a complete escape hatch and structural
  CRLF in the URL/headers can be forwarded when a route opts into it.

Tests: strip_crlf unit tests; scan_outbound crlf_text body-exclusion and
backward-compat tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
This commit is contained in:
2026-06-24 20:37:26 -04:00
parent cdfaaa3de8
commit b411577e76
5 changed files with 108 additions and 31 deletions
+9 -3
View File
@@ -722,6 +722,7 @@ def scan_outbound(
environ: typing.Mapping[str, str],
*,
safe_tokens: typing.AbstractSet[str] | None = None,
crlf_text: str | None = None,
) -> ScanResult | None:
# Lazy import to avoid circular deps and keep dlp_detectors optional
# at import time (the sidecar copies it flat alongside this file).
@@ -740,9 +741,14 @@ def scan_outbound(
text = body if isinstance(body, str) else body.decode("utf-8", errors="replace")
# CRLF injection is never legitimate — runs unconditionally, not gated
# by outbound_detectors config, and never override-able by safe_tokens.
result = scan_crlf_injection(text)
# CRLF injection is only an attack in the request line + headers, never the
# body: an HTTP body is delimited by Content-Length, so CRLF bytes there
# cannot split the request. Scanning the body produces false positives on
# legitimate form-encoded / multi-line content. Callers pass the
# body-excluded surfaces as `crlf_text`; `None` falls back to the full text
# for backward-compatible callers (host-side tests, websocket frames).
crlf_target = text if crlf_text is None else crlf_text
result = scan_crlf_injection(crlf_target)
if result is not None:
return result