Stop scanning the request body for CRLF injection

A 403 "egress DLP: URL-encoded CRLF (%0d%0a)" was firing on legitimate requests (e.g. the Claude Code login flow) and bypassing the on-match policy entirely, because CRLF blocks carry no matched value and were routed straight to a hard 403. Root cause: CRLF injection is only an attack in the request line and headers. An HTTP body is delimited by Content-Length, so CRLF bytes in the body cannot split the request — but the scan flattened the body into the same blob it checked, so form-encoded / multi-line body content (which legitimately contains %0d%0a) tripped it. Fix: - scan_outbound takes a crlf_text param; the addon scans CRLF only over the body-excluded request line + headers. crlf_text=None keeps the old full-blob behavior for host-side callers/tests; the websocket path passes "" since a data frame is not a request line. - The redact policy now also scrubs CRLF (new strip_crlf helper) from the path and headers, so redact is a complete escape hatch and structural CRLF in the URL/headers can be forwarded when a route opts into it. Tests: strip_crlf unit tests; scan_outbound crlf_text body-exclusion and backward-compat tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 20:37:26 -04:00
parent cdfaaa3de8
commit b411577e76
5 changed files with 108 additions and 31 deletions
@@ -283,6 +283,14 @@ _CRLF_ENCODED_RE = re.compile(r"%0[dD]%0[aA]", re.ASCII)
 _CRLF_HEADER_INJECT_RE = re.compile(r"\r\n[A-Za-z][A-Za-z0-9\-]+\s*:", re.ASCII)


+def strip_crlf(text: str) -> str:
+    """Remove URL-encoded and literal CRLF injection sequences from a request
+    surface (PRD 0062 redact policy). Used to scrub the request line / headers
+    so the request can be forwarded instead of hard-blocked."""
+    text = _CRLF_ENCODED_RE.sub("", text)
+    return _CRLF_HEADER_INJECT_RE.sub(lambda m: m.group(0)[2:], text)
+
+
 def scan_crlf_injection(text: str) -> ScanResult | None:
    if _CRLF_ENCODED_RE.search(text):
        return ScanResult(
@@ -306,4 +314,5 @@ __all__ = [
    "scan_known_secrets",
    "scan_naive_injection",
    "scan_token_patterns",
+    "strip_crlf",
 ]