fix(pipelock): scan all request headers + fix attack-3 destination

Two related changes the PRD 0022 sandbox-escape test surfaced: 1. `pipelock_build_config` now emits `request_body_scanning.scan_headers: true` and `header_mode: all`. Pipelock's default `header_mode: sensitive` only checks Authorization / Cookie / X-Api-Key / X-Token / Proxy-Authorization / X-Goog-Api-Key — an agent attempting exfil could trivially pick a non-sensitive header (`X-Custom: $SECRET`) and slip through. `all` closes the gap; pipelock caps it by the same max_body_bytes the body scan uses. 2. Test 3 (HTTP exfil shapes) now targets raw.githubusercontent.com instead of api.anthropic.com. api.anthropic.com is in `DEFAULT_TLS_PASSTHROUGH` — pipelock can't MITM it because real LLM conversation bodies false-positive on DLP scanners (BIP-39 etc.). The trade-off is documented in `pipelock.DEFAULT_TLS_PASSTHROUGH`; the test now exercises a host where the sandbox is actually supposed to block. All 5 sandbox-escape attacks now produce HTTP 403 with the expected sandbox marker (`egress:`, `pipelock`, or `blocked:`): - Attack 1 (non-allowlisted host) ✓ egress - Attack 2 (non-allowlisted IP + spoof) ✓ egress - Attack 3a (URL path) ✓ pipelock DLP - Attack 3b (URL query) ✓ pipelock DLP - Attack 3c (request body) ✓ pipelock DLP - Attack 3d (request header) ✓ pipelock DLP (scan_headers) - Attack 4a (crafted subdomain) ✓ egress - Attack 4b (direct dig @8.8.8.8) ✓ network isolation - Attack 5 (README push, 3 secret shapes) ✓ gitleaks (pre-upstream) 489 unit tests pass (1 updated for the new request_body_scanning shape). Full integration suite passes in ~6s.
2026-05-26 22:38:38 -04:00
parent e2231f46a3
commit 23f50f7720
3 changed files with 44 additions and 8 deletions
@@ -179,7 +179,19 @@ def pipelock_build_config(
    # built-in default for request_body_scanning is "warn" (forward
    # with a log line); claude-bottle hard-codes "block" so a hit
    # actually stops the request from leaving the egress network.
-    cfg["request_body_scanning"] = {"action": "block"}
+    #
+    # `scan_headers: true` + `header_mode: all` extends the scan to
+    # every request header — pipelock's default `header_mode:
+    # sensitive` only checks Authorization / Cookie / X-Api-Key /
+    # X-Token / Proxy-Authorization / X-Goog-Api-Key, which an
+    # agent attempting to exfil could trivially avoid by picking
+    # a non-sensitive header name. "all" closes the gap; pipelock
+    # caps it at the same max_body_bytes the body scan uses.
+    cfg["request_body_scanning"] = {
+        "action": "block",
+        "scan_headers": True,
+        "header_mode": "all",
+    }
    if ca_cert_path or ca_key_path:
        if not (ca_cert_path and ca_key_path):
            raise ValueError(
@@ -230,6 +242,10 @@ def pipelock_render_yaml(cfg: dict[str, object]) -> str:
    lines.append("request_body_scanning:")
    rbs = cast(dict[str, object], cfg["request_body_scanning"])
    lines.append(f'  action: "{rbs["action"]}"')
+    if "scan_headers" in rbs:
+        lines.append(f"  scan_headers: {_bool(rbs['scan_headers'])}")
+    if "header_mode" in rbs:
+        lines.append(f'  header_mode: "{rbs["header_mode"]}"')
    if "tls_interception" in cfg:
        lines.append("")
        lines.append("tls_interception:")