Files
bot-bottle/docs/prds/0062-egress-supervisor-token-override.md
T
didericis cdfaaa3de8
lint / lint (push) Successful in 1m41s
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 18s
Add dlp.outbound_on_match policy (block | redact | supervise)
Give each egress route a policy for what the proxy does when an outbound
DLP detector matches a token, defaulting to the supervise flow added in
the previous commit. The goal is cutting false-positive friction without
weakening default-deny.

- redact: scrub the matched value(s) from the body, non-host headers, and
  path/query via redact_tokens, then re-scan. Forward if clean; fail
  closed with a 403 if a match remains on a surface redaction can't
  rewrite (the hostname, or a unicode-evasion token). For routes where a
  token-shaped value is noise the upstream doesn't need.
- block: the original hard 403, never overridable.
- supervise (default, unset): hold the request for operator approval.

Structural blocks (CRLF, no safelist-able value) stay hard 403s under
every policy.

Threads outbound_on_match from the bottle manifest (manifest_egress)
through the resolved EgressRoute and rendered routes.yaml (egress.py) to
the addon's Route (egress_addon_core), and round-trips it via the
list-egress-routes introspection endpoint. The allow/egress-block tool
descriptions document the new key.

Tests: manifest parse/validation, core parse/validation, full
manifest->render->addon round-trip for redact. README + PRD 0062 updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 16:50:13 -04:00

8.6 KiB

PRD 0062: Supervisor override for egress token blocks

  • Status: Active
  • Author: claude
  • Created: 2026-06-24
  • Issue: #261

Summary

Give each egress route a policy for what happens when an outbound DLP detector matches a token, via dlp.outbound_on_match: block | redact | supervise (default supervise):

  • supervise (default) — route the block through the existing supervisor approval queue instead of returning 403 immediately. The proxy holds the request open until the operator approves or rejects it. On approval the matched token is added to an in-memory "safe tokens" set so the request — and any later request carrying the same token — flows through without re-prompting.
  • redact — scrub the matched value(s) from the request and forward it, no operator in the loop. For routes where a token-shaped value is noise the upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on a surface redaction can't rewrite (the hostname).
  • block — the original hard 403; never overridable. For routes where a detected token must always stop.

The motivating goal is reducing friction from false positives without weakening the default-deny posture: supervise keeps a human in the loop, redact is an explicit per-route opt-in, and block stays available for sensitive routes.

Problem

The outbound DLP detectors (token_patterns, known_secrets) are deliberately aggressive: any string that looks like a credential is blocked before it leaves the bottle. That is the right default, but it produces false positives — a token-shaped value that is not actually a secret, or a credential the agent legitimately needs to send to a declared host. Today the only recovery is for the operator to notice the egress DLP 403 in the logs and hand-edit the route's dlp.outbound_detectors, which disables the detector for the whole route rather than allowing the one value.

The operator has no in-the-loop signal that a token block happened and no fine-grained way to say "this specific value is fine."

Goals / Success Criteria

  1. An outbound DLP token block (a ScanResult carrying a matched secret value) creates a supervisor proposal instead of an immediate 403.
  2. The egress proxy holds the blocked request open, polling for the operator's response up to a bounded timeout.
  3. The proposal shows the operator the host, method, path, the detector reason, and a redacted context snippet — never the raw token value.
  4. On approved/modified, the matched token value is added to an in-memory safe-tokens set and the request proceeds normally; later requests carrying the same value skip the block.
  5. On rejected, timeout, malformed response, or missing supervisor wiring, the request fails closed with the same 403 as today.
  6. Structural blocks that carry no token value (CRLF injection) and the route-not-allowlisted / git blocks are unchanged — they stay hard 403s and keep their existing agent-driven allow / egress-block MCP path.
  7. The proxy event loop is not stalled while waiting: the wait is asynchronous, so other flows keep being served.

Non-goals

  • Persisting the safe-tokens set across egress restarts. It lives in process memory only; a restart re-prompts. (The issue explicitly defers persistence.)
  • Supervising inbound (prompt-injection) blocks or WebSocket frame blocks. WebSocket frames still honour the safe-tokens set for already-approved values but cannot wait for approval (there is no response surface after upgrade).
  • Generalising an approved secret across encodings. The safe-tokens set matches the exact value the detector found.
  • Replacing the per-route dlp.outbound_detectors override. That remains the way to turn a detector off wholesale.
  • Making redact the default. Silent redaction of a true false positive corrupts legitimate data, so it is opt-in per route; supervise (human in the loop) stays the default.

Design

On-match policy

dlp.outbound_on_match is a per-route enum threaded from the bottle manifest (manifest_egress) through the resolved route (egress.EgressRoute), the rendered routes.yaml (egress_render_routes), and the addon's Route (egress_addon_core). Unset renders nothing and resolves to supervise at request time. The list-egress-routes introspection endpoint round-trips it so the agent's proposals preserve it.

On an outbound block the addon dispatches on the resolved policy:

  • Structural blocks always 403. A ScanResult with no matched value (CRLF injection) is a hard 403 regardless of policy — there is nothing to redact or safelist.
  • redact runs redact_tokens over the body, non-host header values, and path/query, then re-scans. If the re-scan is clean the (rewritten) request is forwarded; if a block-severity match remains (e.g. in the hostname, or a unicode-evasion token redaction can't reach) it fails closed with a 403.
  • block writes the 403 immediately.
  • supervise runs the queue-and-wait loop below, falling back to block when supervise isn't wired for the bottle.

Detected-value plumbing

ScanResult gains a matched: str = "" field carrying the raw substring the detector matched. The token detectors (scan_token_patterns, scan_known_secrets) populate it; the structural CRLF detector leaves it empty. The value stays inside the egress sidecar process — it is never written to a log line (logs already use the redacted context) nor to the proposal file.

scan_outbound (and the token detectors it calls) accept a safe_tokens set. A match whose value is in safe_tokens is skipped, so an approved token no longer blocks. The scanners keep searching past a safelisted match so a second, un-approved secret in the same request is still caught.

Supervisor proposal

A new proposal tool constant egress-token-allow is added to supervise.TOOLS. The egress addon writes the proposal directly to SUPERVISE_QUEUE_DIR (the queue is bind-mounted into the sidecar bundle and shared by every daemon, exactly as git-gate's gitleaks-allow proposal in PRD 0061 does). The proposal's proposed_file is a human-readable text payload:

egress blocked an outbound request carrying a detected token
host: api.example.com
method: POST
path: /v1/ingest
detector: OpenAI API key found in body
context: ...before ******** after...

The justification tells the operator to approve only if the value is a false positive or a credential the request legitimately needs.

The addon then polls <proposal-id>.response.json for EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS (default 300). approved/modified allow the request and add the value to the safe-tokens set; rejected, malformed responses, and timeout fail the request closed. The proposal + response are archived to processed/ after a decision.

Because the wait happens inside mitmproxy's asyncio loop, the addon's request hook is async and polls with asyncio.sleep, so concurrent flows are unaffected.

Supervisor UI

cli/supervise.py renders egress-token-allow like gitleaks-allow: the text payload is shown, modify is unavailable (there is no file patch to edit), and approval prompts for a non-empty reason that is recorded in the response notes. There is no on-disk config diff, so — like gitleaks-allow and capability-block — it writes no egress audit-log entry.

Failure handling

If SUPERVISE_QUEUE_DIR / SUPERVISE_BOTTLE_SLUG are unset (supervise disabled for the bottle), the addon skips the queue and returns the existing 403. Any error writing the proposal or reading the response also fails closed.

Implementation chunks

  1. CoreScanResult.matched; thread safe_tokens through scan_outbound / token detectors; build_token_allow_payload.
  2. Supervise + TUITOOL_EGRESS_TOKEN_ALLOW; TUI suffix, modify guard, required approval reason.
  3. Addon glue — async request, safe-tokens set, proposal write + async poll, allow/block decision; pass safe_tokens into the WebSocket path.
  4. On-match policydlp.outbound_on_match through manifest → render → addon; redact surface scrub with fail-closed re-scan; policy dispatch in the addon's outbound handler.
  5. Tests + docs — core/supervise/TUI/manifest/render unit tests; README egress + supervisor notes.

Open questions

  • Should known_secrets (provisioned EGRESS_TOKEN_* exfiltration) be override-able at all, or only token_patterns? This PRD allows both — approval is an explicit operator decision and the safe-tokens set matches the exact found value — but a future revision could restrict known_secrets to reject-only.