Add dlp.outbound_on_match policy (block | redact | supervise)
lint / lint (push) Successful in 1m41s
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 18s

Give each egress route a policy for what the proxy does when an outbound
DLP detector matches a token, defaulting to the supervise flow added in
the previous commit. The goal is cutting false-positive friction without
weakening default-deny.

- redact: scrub the matched value(s) from the body, non-host headers, and
  path/query via redact_tokens, then re-scan. Forward if clean; fail
  closed with a 403 if a match remains on a surface redaction can't
  rewrite (the hostname, or a unicode-evasion token). For routes where a
  token-shaped value is noise the upstream doesn't need.
- block: the original hard 403, never overridable.
- supervise (default, unset): hold the request for operator approval.

Structural blocks (CRLF, no safelist-able value) stay hard 403s under
every policy.

Threads outbound_on_match from the bottle manifest (manifest_egress)
through the resolved EgressRoute and rendered routes.yaml (egress.py) to
the addon's Route (egress_addon_core), and round-trips it via the
list-egress-routes introspection endpoint. The allow/egress-block tool
descriptions document the new key.

Tests: manifest parse/validation, core parse/validation, full
manifest->render->addon round-trip for redact. README + PRD 0062 updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
This commit is contained in:
2026-06-24 16:50:13 -04:00
parent 7f2352287e
commit cdfaaa3de8
10 changed files with 291 additions and 53 deletions
@@ -7,12 +7,26 @@
## Summary
When the egress proxy blocks an outbound request because a DLP detector
matched a token/secret, route that block through the existing supervisor
approval queue instead of returning `403` immediately. The proxy holds the
request open until the operator approves or rejects it. On approval, the
matched token is added to an in-memory "safe tokens" set so the request — and
any later request carrying the same token — flows through without re-prompting.
Give each egress route a policy for what happens when an outbound DLP detector
matches a token, via `dlp.outbound_on_match: block | redact | supervise`
(default `supervise`):
- **`supervise`** (default) — route the block through the existing supervisor
approval queue instead of returning `403` immediately. The proxy holds the
request open until the operator approves or rejects it. On approval the
matched token is added to an in-memory "safe tokens" set so the request — and
any later request carrying the same token — flows through without
re-prompting.
- **`redact`** — scrub the matched value(s) from the request and forward it,
no operator in the loop. For routes where a token-shaped value is noise the
upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on
a surface redaction can't rewrite (the hostname).
- **`block`** — the original hard `403`; never overridable. For routes where a
detected token must always stop.
The motivating goal is reducing friction from false positives without weakening
the default-deny posture: supervise keeps a human in the loop, redact is an
explicit per-route opt-in, and block stays available for sensitive routes.
## Problem
@@ -58,9 +72,35 @@ fine-grained way to say "this specific value is fine."
the exact value the detector found.
- Replacing the per-route `dlp.outbound_detectors` override. That remains the
way to turn a detector off wholesale.
- Making `redact` the default. Silent redaction of a true false positive
corrupts legitimate data, so it is opt-in per route; `supervise` (human in
the loop) stays the default.
## Design
### On-match policy
`dlp.outbound_on_match` is a per-route enum threaded from the bottle manifest
(`manifest_egress`) through the resolved route (`egress.EgressRoute`), the
rendered `routes.yaml` (`egress_render_routes`), and the addon's `Route`
(`egress_addon_core`). Unset renders nothing and resolves to `supervise` at
request time. The `list-egress-routes` introspection endpoint round-trips it so
the agent's proposals preserve it.
On an outbound block the addon dispatches on the resolved policy:
- **Structural blocks always 403.** A `ScanResult` with no `matched` value
(CRLF injection) is a hard `403` regardless of policy — there is nothing to
redact or safelist.
- **`redact`** runs `redact_tokens` over the body, non-`host` header values,
and path/query, then re-scans. If the re-scan is clean the (rewritten)
request is forwarded; if a block-severity match remains (e.g. in the
hostname, or a unicode-evasion token redaction can't reach) it fails closed
with a `403`.
- **`block`** writes the `403` immediately.
- **`supervise`** runs the queue-and-wait loop below, falling back to `block`
when supervise isn't wired for the bottle.
### Detected-value plumbing
`ScanResult` gains a `matched: str = ""` field carrying the raw substring the
@@ -128,8 +168,11 @@ closed.
required approval reason.
3. **Addon glue** — async `request`, safe-tokens set, proposal write + async
poll, allow/block decision; pass `safe_tokens` into the WebSocket path.
4. **Tests + docs** — core/supervise/TUI unit tests; README egress + supervisor
notes.
4. **On-match policy**`dlp.outbound_on_match` through manifest → render →
addon; `redact` surface scrub with fail-closed re-scan; policy dispatch in
the addon's outbound handler.
5. **Tests + docs** — core/supervise/TUI/manifest/render unit tests; README
egress + supervisor notes.
## Open questions