egress: optional max-request-size policy (block | noscan) for DLP body scanning #311

Open
opened 2026-06-26 22:58:31 -04:00 by didericis-claude · 0 comments
Collaborator

Summary

Add an optional, explicit per-manifest egress control for how the DLP scanner handles oversized request bodies. Default behavior stays: scan all bodies regardless of size.

Motivation

DLP body scanning is O(body size). Today there is no escape hatch for pathologically large bodies. We considered silently capping the scan length, but that quietly weakens DLP coverage, which is the wrong default.

Reasoning on why scanning large bodies by default is fine:

  • Large ingress bodies are almost always either git (covered by git-gate, a different path) or an LLM model-provider response (trusted on ingress). So ingress size is not really the concern.
  • Large egress bodies are themselves a red flag — that's exactly the traffic we most want scanned, not skipped.

So the default must be "scan everything." But there's still a legitimate need for an explicit operator-set cutoff (perf ceilings, or a hard size policy), and making it explicit beats a hidden magic number.

Proposed design

Add a max-request-size control to the egress section of the manifest (global and/or per-route — TBD during design):

  • max_request_bytes: <int> — threshold above which the policy applies.
  • on_oversize: block | noscan — what to do when a body exceeds the threshold:
    • block — hard 403 (fail closed; safest).
    • noscan — forward without DLP scanning (explicit, opt-in risk acceptance).

Default: unset → scan all bodies, no cutoff (current behavior).

Open questions

  • Global vs per-route (or both, per-route overriding global)?
  • Does the threshold count the decoded/decompressed body or the raw body?
  • Should noscan still apply structural checks (CRLF) and the cheap token-pattern regex, skipping only the expensive known-secret projection passes?
  • Does this interact with dlp.outbound_on_match?

Acceptance criteria

  • New manifest field(s) parsed and validated (unknown values rejected at load, matching the rest of the egress schema).
  • Default unchanged: bodies of any size are scanned.
  • block returns a hard 403 for oversize bodies; noscan forwards them with scanning skipped per the resolved policy.
  • Unit coverage for parsing, the block path, and the noscan path.
## Summary Add an **optional, explicit** per-manifest egress control for how the DLP scanner handles oversized request bodies. Default behavior stays: scan all bodies regardless of size. ## Motivation DLP body scanning is `O(body size)`. Today there is no escape hatch for pathologically large bodies. We considered silently capping the scan length, but that quietly weakens DLP coverage, which is the wrong default. Reasoning on why scanning large bodies by default is fine: - **Large *ingress* bodies** are almost always either git (covered by git-gate, a different path) or an LLM model-provider response (trusted on ingress). So ingress size is not really the concern. - **Large *egress* bodies are themselves a red flag** — that's exactly the traffic we most want scanned, not skipped. So the default must be "scan everything." But there's still a legitimate need for an explicit operator-set cutoff (perf ceilings, or a hard size policy), and making it explicit beats a hidden magic number. ## Proposed design Add a max-request-size control to the `egress` section of the manifest (global and/or per-route — TBD during design): - `max_request_bytes: <int>` — threshold above which the policy applies. - `on_oversize: block | noscan` — what to do when a body exceeds the threshold: - `block` — hard 403 (fail closed; safest). - `noscan` — forward without DLP scanning (explicit, opt-in risk acceptance). Default: unset → **scan all bodies, no cutoff** (current behavior). ## Open questions - Global vs per-route (or both, per-route overriding global)? - Does the threshold count the decoded/decompressed body or the raw body? - Should `noscan` still apply structural checks (CRLF) and the cheap token-pattern regex, skipping only the expensive known-secret projection passes? - Does this interact with `dlp.outbound_on_match`? ## Acceptance criteria - New manifest field(s) parsed and validated (unknown values rejected at load, matching the rest of the egress schema). - Default unchanged: bodies of any size are scanned. - `block` returns a hard 403 for oversize bodies; `noscan` forwards them with scanning skipped per the resolved policy. - Unit coverage for parsing, the block path, and the noscan path.
didericis added the Kind/Feature label 2026-06-29 11:32:41 -04:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: didericis/bot-bottle#311