Stop scanning the request body for CRLF injection

A 403 "egress DLP: URL-encoded CRLF (%0d%0a)" was firing on legitimate requests (e.g. the Claude Code login flow) and bypassing the on-match policy entirely, because CRLF blocks carry no matched value and were routed straight to a hard 403. Root cause: CRLF injection is only an attack in the request line and headers. An HTTP body is delimited by Content-Length, so CRLF bytes in the body cannot split the request — but the scan flattened the body into the same blob it checked, so form-encoded / multi-line body content (which legitimately contains %0d%0a) tripped it. Fix: - scan_outbound takes a crlf_text param; the addon scans CRLF only over the body-excluded request line + headers. crlf_text=None keeps the old full-blob behavior for host-side callers/tests; the websocket path passes "" since a data frame is not a request line. - The redact policy now also scrubs CRLF (new strip_crlf helper) from the path and headers, so redact is a complete escape hatch and structural CRLF in the URL/headers can be forwarded when a route opts into it. Tests: strip_crlf unit tests; scan_outbound crlf_text body-exclusion and backward-compat tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 20:37:26 -04:00
parent cdfaaa3de8
commit b411577e76
5 changed files with 108 additions and 31 deletions
@@ -722,6 +722,7 @@ def scan_outbound(
    environ: typing.Mapping[str, str],
    *,
    safe_tokens: typing.AbstractSet[str] | None = None,
+    crlf_text: str | None = None,
 ) -> ScanResult | None:
    # Lazy import to avoid circular deps and keep dlp_detectors optional
    # at import time (the sidecar copies it flat alongside this file).
@@ -740,9 +741,14 @@ def scan_outbound(

    text = body if isinstance(body, str) else body.decode("utf-8", errors="replace")

-    # CRLF injection is never legitimate — runs unconditionally, not gated
-    # by outbound_detectors config, and never override-able by safe_tokens.
-    result = scan_crlf_injection(text)
+    # CRLF injection is only an attack in the request line + headers, never the
+    # body: an HTTP body is delimited by Content-Length, so CRLF bytes there
+    # cannot split the request. Scanning the body produces false positives on
+    # legitimate form-encoded / multi-line content. Callers pass the
+    # body-excluded surfaces as `crlf_text`; `None` falls back to the full text
+    # for backward-compatible callers (host-side tests, websocket frames).
+    crlf_target = text if crlf_text is None else crlf_text
+    result = scan_crlf_injection(crlf_target)
    if result is not None:
        return result