Add dlp.outbound_on_match policy (block | redact | supervise)

Give each egress route a policy for what the proxy does when an outbound DLP detector matches a token, defaulting to the supervise flow added in the previous commit. The goal is cutting false-positive friction without weakening default-deny. - redact: scrub the matched value(s) from the body, non-host headers, and path/query via redact_tokens, then re-scan. Forward if clean; fail closed with a 403 if a match remains on a surface redaction can't rewrite (the hostname, or a unicode-evasion token). For routes where a token-shaped value is noise the upstream doesn't need. - block: the original hard 403, never overridable. - supervise (default, unset): hold the request for operator approval. Structural blocks (CRLF, no safelist-able value) stay hard 403s under every policy. Threads outbound_on_match from the bottle manifest (manifest_egress) through the resolved EgressRoute and rendered routes.yaml (egress.py) to the addon's Route (egress_addon_core), and round-trips it via the list-egress-routes introspection endpoint. The allow/egress-block tool descriptions document the new key. Tests: manifest parse/validation, core parse/validation, full manifest->render->addon round-trip for redact. README + PRD 0062 updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 16:50:13 -04:00
parent 7f2352287e
commit cdfaaa3de8
10 changed files with 291 additions and 53 deletions
@@ -7,12 +7,26 @@

 ## Summary

-When the egress proxy blocks an outbound request because a DLP detector
-matched a token/secret, route that block through the existing supervisor
-approval queue instead of returning `403` immediately. The proxy holds the
-request open until the operator approves or rejects it. On approval, the
-matched token is added to an in-memory "safe tokens" set so the request — and
-any later request carrying the same token — flows through without re-prompting.
+Give each egress route a policy for what happens when an outbound DLP detector
+matches a token, via `dlp.outbound_on_match: block | redact | supervise`
+(default `supervise`):
+
+- **`supervise`** (default) — route the block through the existing supervisor
+  approval queue instead of returning `403` immediately. The proxy holds the
+  request open until the operator approves or rejects it. On approval the
+  matched token is added to an in-memory "safe tokens" set so the request — and
+  any later request carrying the same token — flows through without
+  re-prompting.
+- **`redact`** — scrub the matched value(s) from the request and forward it,
+  no operator in the loop. For routes where a token-shaped value is noise the
+  upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on
+  a surface redaction can't rewrite (the hostname).
+- **`block`** — the original hard `403`; never overridable. For routes where a
+  detected token must always stop.
+
+The motivating goal is reducing friction from false positives without weakening
+the default-deny posture: supervise keeps a human in the loop, redact is an
+explicit per-route opt-in, and block stays available for sensitive routes.

 ## Problem

@@ -58,9 +72,35 @@ fine-grained way to say "this specific value is fine."
  the exact value the detector found.
 - Replacing the per-route `dlp.outbound_detectors` override. That remains the
  way to turn a detector off wholesale.
+- Making `redact` the default. Silent redaction of a true false positive
+  corrupts legitimate data, so it is opt-in per route; `supervise` (human in
+  the loop) stays the default.

 ## Design

+### On-match policy
+
+`dlp.outbound_on_match` is a per-route enum threaded from the bottle manifest
+(`manifest_egress`) through the resolved route (`egress.EgressRoute`), the
+rendered `routes.yaml` (`egress_render_routes`), and the addon's `Route`
+(`egress_addon_core`). Unset renders nothing and resolves to `supervise` at
+request time. The `list-egress-routes` introspection endpoint round-trips it so
+the agent's proposals preserve it.
+
+On an outbound block the addon dispatches on the resolved policy:
+
+- **Structural blocks always 403.** A `ScanResult` with no `matched` value
+  (CRLF injection) is a hard `403` regardless of policy — there is nothing to
+  redact or safelist.
+- **`redact`** runs `redact_tokens` over the body, non-`host` header values,
+  and path/query, then re-scans. If the re-scan is clean the (rewritten)
+  request is forwarded; if a block-severity match remains (e.g. in the
+  hostname, or a unicode-evasion token redaction can't reach) it fails closed
+  with a `403`.
+- **`block`** writes the `403` immediately.
+- **`supervise`** runs the queue-and-wait loop below, falling back to `block`
+  when supervise isn't wired for the bottle.
+
 ### Detected-value plumbing

 `ScanResult` gains a `matched: str = ""` field carrying the raw substring the
@@ -128,8 +168,11 @@ closed.
   required approval reason.
 3. **Addon glue** — async `request`, safe-tokens set, proposal write + async
   poll, allow/block decision; pass `safe_tokens` into the WebSocket path.
-4. **Tests + docs** — core/supervise/TUI unit tests; README egress + supervisor
-   notes.
+4. **On-match policy** — `dlp.outbound_on_match` through manifest → render →
+   addon; `redact` surface scrub with fail-closed re-scan; policy dispatch in
+   the addon's outbound handler.
+5. **Tests + docs** — core/supervise/TUI/manifest/render unit tests; README
+   egress + supervisor notes.

 ## Open questions