# PRD 0062: Supervisor override for egress token blocks - **Status:** Active - **Author:** claude - **Created:** 2026-06-24 - **Issue:** #261 ## Summary Give each egress route a policy for what happens when an outbound DLP detector matches a token, via `dlp.outbound_on_match: block | redact | supervise` (default `supervise`): - **`supervise`** (default) — route the block through the existing supervisor approval queue instead of returning `403` immediately. The proxy holds the request open until the operator approves or rejects it. On approval the matched token is added to an in-memory "safe tokens" set so the request — and any later request carrying the same token — flows through without re-prompting. - **`redact`** — scrub the matched value(s) from the request and forward it, no operator in the loop. For routes where a token-shaped value is noise the upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on a surface redaction can't rewrite (the hostname). - **`block`** — the original hard `403`; never overridable. For routes where a detected token must always stop. The motivating goal is reducing friction from false positives without weakening the default-deny posture: supervise keeps a human in the loop, redact is an explicit per-route opt-in, and block stays available for sensitive routes. ## Problem The outbound DLP detectors (`token_patterns`, `known_secrets`) are deliberately aggressive: any string that looks like a credential is blocked before it leaves the bottle. That is the right default, but it produces false positives — a token-shaped value that is not actually a secret, or a credential the agent legitimately needs to send to a declared host. Today the only recovery is for the operator to notice the `egress DLP` 403 in the logs and hand-edit the route's `dlp.outbound_detectors`, which disables the detector for the whole route rather than allowing the one value. The operator has no in-the-loop signal that a token block happened and no fine-grained way to say "this specific value is fine." ## Goals / Success Criteria 1. An outbound DLP **token** block (a `ScanResult` carrying a matched secret value) creates a supervisor proposal instead of an immediate `403`. 2. The egress proxy holds the blocked request open, polling for the operator's response up to a bounded timeout. 3. The proposal shows the operator the host, method, path, the detector reason, and a **redacted** context snippet — never the raw token value. 4. On `approved`/`modified`, the matched token value is added to an in-memory safe-tokens set and the request proceeds normally; later requests carrying the same value skip the block. 5. On `rejected`, timeout, malformed response, or missing supervisor wiring, the request fails closed with the same `403` as today. 6. Structural blocks that carry no token value (CRLF injection) and the route-not-allowlisted / git blocks are unchanged — they stay hard `403`s and keep their existing agent-driven `allow` / `egress-block` MCP path. 7. The proxy event loop is not stalled while waiting: the wait is asynchronous, so other flows keep being served. ## Non-goals - Persisting the safe-tokens set across egress restarts. It lives in process memory only; a restart re-prompts. (The issue explicitly defers persistence.) - Supervising inbound (prompt-injection) blocks or WebSocket frame blocks. WebSocket frames still honour the safe-tokens set for already-approved values but cannot wait for approval (there is no response surface after upgrade). - Generalising an approved secret across encodings. The safe-tokens set matches the exact value the detector found. - Replacing the per-route `dlp.outbound_detectors` override. That remains the way to turn a detector off wholesale. - Making `redact` the default. Silent redaction of a true false positive corrupts legitimate data, so it is opt-in per route; `supervise` (human in the loop) stays the default. ## Scope ### In scope The minimum cut that ships, in build order: 1. **Core** — `ScanResult.matched`; thread `safe_tokens` through `scan_outbound` / the token detectors; `build_token_allow_payload`. 2. **Supervise + TUI** — `TOOL_EGRESS_TOKEN_ALLOW`; TUI suffix, modify guard, required approval reason. 3. **Addon glue** — async `request`, safe-tokens set, proposal write + async poll, allow/block decision; pass `safe_tokens` into the WebSocket path. 4. **On-match policy** — `dlp.outbound_on_match` through manifest → render → addon; `redact` surface scrub with fail-closed re-scan; policy dispatch in the addon's outbound handler. 5. **Tests + docs** — core/supervise/TUI/manifest/render unit tests; README egress + supervisor notes. ### Out of scope The deferrals enumerated under **Non-goals** — restart persistence, inbound / WebSocket-frame supervision, cross-encoding generalisation, replacing `dlp.outbound_detectors`, and making `redact` the default. ## Proposed Design ### New services / components A new proposal tool constant `egress-token-allow` (`TOOL_EGRESS_TOKEN_ALLOW`) is added to `supervise.TOOLS`, and the egress addon gains an in-memory safe-tokens set plus the policy-dispatch path that drives it. On an outbound block the addon dispatches on the resolved policy: - **Structural blocks always 403.** A `ScanResult` with no `matched` value (CRLF injection) is a hard `403` regardless of policy — there is nothing to redact or safelist. - **`redact`** runs `redact_tokens` over the body, non-`host` header values, and path/query, then re-scans. If the re-scan is clean the (rewritten) request is forwarded; if a block-severity match remains (e.g. in the hostname, or a unicode-evasion token redaction can't reach) it fails closed with a `403`. - **`block`** writes the `403` immediately. - **`supervise`** runs the queue-and-wait loop, falling back to `block` when supervise isn't wired for the bottle. For `supervise`, the addon writes the proposal directly to `SUPERVISE_QUEUE_DIR` (the queue is bind-mounted into the sidecar bundle and shared by every daemon, exactly as git-gate's `gitleaks-allow` proposal in PRD 0061 does). The proposal's `proposed_file` is a human-readable text payload built by `build_token_allow_payload`: ``` egress blocked an outbound request carrying a detected token host: api.example.com method: POST path: /v1/ingest detector: OpenAI API key found in body context: ...before ******** after... ``` The justification tells the operator to approve only if the value is a false positive or a credential the request legitimately needs. The addon then polls `.response.json` for `EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS` (default 300). `approved`/`modified` allow the request and add the value to the safe-tokens set; `rejected`, malformed responses, and timeout fail the request closed. The proposal + response are archived to `processed/` after a decision. Because the wait happens inside mitmproxy's asyncio loop, the addon's `request` hook is async and polls with `asyncio.sleep`, so concurrent flows are unaffected. ### Existing code touched - **Policy threading.** `dlp.outbound_on_match` is a per-route enum threaded from the bottle manifest (`manifest_egress`) through the resolved route (`egress.EgressRoute`), the rendered `routes.yaml` (`egress_render_routes`), and the addon's `Route` (`egress_addon_core`). Unset renders nothing and resolves to `supervise` at request time. The `list-egress-routes` introspection endpoint round-trips it so the agent's proposals preserve it. - **Provider-route default.** Agent-provider routes (the agent talking to its own LLM API — `api.anthropic.com`, the Codex backend, etc.) are the worst source of token-shaped false positives because the whole conversation payload flows through them. `egress_routes_for_bottle` fills `outbound_on_match=redact` on any provider route that doesn't set it explicitly; a provider that sets the policy keeps its choice, and manifest routes are unaffected (they default to `supervise`). - **Scanners.** `scan_outbound` (and the token detectors `scan_token_patterns` / `scan_known_secrets` it calls) accept a `safe_tokens` set. A match whose value is in `safe_tokens` is skipped, so an approved token no longer blocks; the scanners keep searching past a safelisted match so a second, un-approved secret in the same request is still caught. The WebSocket path is passed the same `safe_tokens` set. - **Supervisor UI.** `cli/supervise.py` renders `egress-token-allow` like `gitleaks-allow`: the text payload is shown, modify is unavailable (there is no file patch to edit), and approval prompts for a non-empty reason recorded in the response notes. There is no on-disk config diff, so — like `gitleaks-allow` and `capability-block` — it writes no egress audit-log entry. - **Failure handling.** If `SUPERVISE_QUEUE_DIR` / `SUPERVISE_BOTTLE_SLUG` are unset (supervise disabled for the bottle), the addon skips the queue and returns the existing `403`. Any error writing the proposal or reading the response also fails closed. ### Data model changes - New per-route manifest field `dlp.outbound_on_match: block | redact | supervise`, rendered into `routes.yaml` (omitted when unset). - `ScanResult` gains a `matched: str = ""` field carrying the raw substring the detector matched. The token detectors populate it; the structural CRLF detector leaves it empty. The value stays inside the egress sidecar process — never written to a log line (logs use the redacted `context`) nor to the proposal file. - Proposal text payload (above) plus `.response.json` in `SUPERVISE_QUEUE_DIR`, archived to `processed/` after a decision. - New env var `EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS` (default 300). ### External dependencies None. Reuses the existing supervisor queue (`SUPERVISE_QUEUE_DIR`) and the mitmproxy addon framework already in the egress sidecar. ## Open questions - Should `known_secrets` (provisioned `EGRESS_TOKEN_*` exfiltration) be override-able at all, or only `token_patterns`? This PRD allows both — approval is an explicit operator decision and the safe-tokens set matches the exact found value — but a future revision could restrict `known_secrets` to reject-only. ## References - Issue #261 - PRD 0061 — `gitleaks-allow` supervisor proposal pattern this reuses.