PRD 0062: supervisor override for egress token blocks

When the outbound DLP catches a token, route the block through the existing supervisor approval queue instead of returning 403 outright. The egress proxy holds the request open until the operator answers, then remembers an approved value for the life of the proxy so the request -- and later ones carrying it -- flow through. Fails closed on rejection, timeout, malformed response, or when supervise is disabled. - ScanResult.matched carries the raw matched substring (sidecar-only; never logged or written to the proposal). scan_outbound and the token detectors take a safe_tokens set and skip approved values, continuing past a safelisted match so a second secret in the same request is still caught. - New egress-token-allow proposal tool, written directly to the queue by the addon (the gitleaks-allow pattern from PRD 0061). build_token_allow _payload renders host/method/path/detector reason + redacted context. - Async request hook polls the queue without stalling the proxy event loop; EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS (default 300) bounds the wait. - Supervisor TUI renders egress-token-allow like gitleaks-allow: report only, modify unavailable, approval requires a recorded reason. - Unit tests for the matched/safe-tokens plumbing, payload builder, tool constant round-trip, and TUI paths; README + PRD 0062. Closes #261. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
2026-06-24 16:12:50 -04:00
parent 7cb967770e
commit 7f2352287e
11 changed files with 525 additions and 24 deletions
@@ -0,0 +1,140 @@
+# PRD 0062: Supervisor override for egress token blocks
+
+- **Status:** Active
+- **Author:** claude
+- **Created:** 2026-06-24
+- **Issue:** #261
+
+## Summary
+
+When the egress proxy blocks an outbound request because a DLP detector
+matched a token/secret, route that block through the existing supervisor
+approval queue instead of returning `403` immediately. The proxy holds the
+request open until the operator approves or rejects it. On approval, the
+matched token is added to an in-memory "safe tokens" set so the request — and
+any later request carrying the same token — flows through without re-prompting.
+
+## Problem
+
+The outbound DLP detectors (`token_patterns`, `known_secrets`) are
+deliberately aggressive: any string that looks like a credential is blocked
+before it leaves the bottle. That is the right default, but it produces false
+positives — a token-shaped value that is not actually a secret, or a credential
+the agent legitimately needs to send to a declared host. Today the only
+recovery is for the operator to notice the `egress DLP` 403 in the logs and
+hand-edit the route's `dlp.outbound_detectors`, which disables the detector for
+the whole route rather than allowing the one value.
+
+The operator has no in-the-loop signal that a token block happened and no
+fine-grained way to say "this specific value is fine."
+
+## Goals / Success Criteria
+
+1. An outbound DLP **token** block (a `ScanResult` carrying a matched secret
+   value) creates a supervisor proposal instead of an immediate `403`.
+2. The egress proxy holds the blocked request open, polling for the operator's
+   response up to a bounded timeout.
+3. The proposal shows the operator the host, method, path, the detector reason,
+   and a **redacted** context snippet — never the raw token value.
+4. On `approved`/`modified`, the matched token value is added to an in-memory
+   safe-tokens set and the request proceeds normally; later requests carrying
+   the same value skip the block.
+5. On `rejected`, timeout, malformed response, or missing supervisor wiring,
+   the request fails closed with the same `403` as today.
+6. Structural blocks that carry no token value (CRLF injection) and the
+   route-not-allowlisted / git blocks are unchanged — they stay hard `403`s and
+   keep their existing agent-driven `allow` / `egress-block` MCP path.
+7. The proxy event loop is not stalled while waiting: the wait is asynchronous,
+   so other flows keep being served.
+
+## Non-goals
+
+- Persisting the safe-tokens set across egress restarts. It lives in process
+  memory only; a restart re-prompts. (The issue explicitly defers persistence.)
+- Supervising inbound (prompt-injection) blocks or WebSocket frame blocks.
+  WebSocket frames still honour the safe-tokens set for already-approved values
+  but cannot wait for approval (there is no response surface after upgrade).
+- Generalising an approved secret across encodings. The safe-tokens set matches
+  the exact value the detector found.
+- Replacing the per-route `dlp.outbound_detectors` override. That remains the
+  way to turn a detector off wholesale.
+
+## Design
+
+### Detected-value plumbing
+
+`ScanResult` gains a `matched: str = ""` field carrying the raw substring the
+detector matched. The token detectors (`scan_token_patterns`,
+`scan_known_secrets`) populate it; the structural CRLF detector leaves it
+empty. The value stays inside the egress sidecar process — it is never written
+to a log line (logs already use the redacted `context`) nor to the proposal
+file.
+
+`scan_outbound` (and the token detectors it calls) accept a `safe_tokens`
+set. A match whose value is in `safe_tokens` is skipped, so an approved token
+no longer blocks. The scanners keep searching past a safelisted match so a
+second, un-approved secret in the same request is still caught.
+
+### Supervisor proposal
+
+A new proposal tool constant `egress-token-allow` is added to
+`supervise.TOOLS`. The egress addon writes the proposal directly to
+`SUPERVISE_QUEUE_DIR` (the queue is bind-mounted into the sidecar bundle and
+shared by every daemon, exactly as git-gate's `gitleaks-allow` proposal in PRD
+0061 does). The proposal's `proposed_file` is a human-readable text payload:
+
+```
+egress blocked an outbound request carrying a detected token
+host: api.example.com
+method: POST
+path: /v1/ingest
+detector: OpenAI API key found in body
+context: ...before ******** after...
+```
+
+The justification tells the operator to approve only if the value is a false
+positive or a credential the request legitimately needs.
+
+The addon then polls `<proposal-id>.response.json` for
+`EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS` (default 300). `approved`/`modified`
+allow the request and add the value to the safe-tokens set; `rejected`,
+malformed responses, and timeout fail the request closed. The proposal +
+response are archived to `processed/` after a decision.
+
+Because the wait happens inside mitmproxy's asyncio loop, the addon's
+`request` hook is async and polls with `asyncio.sleep`, so concurrent flows
+are unaffected.
+
+### Supervisor UI
+
+`cli/supervise.py` renders `egress-token-allow` like `gitleaks-allow`: the
+text payload is shown, modify is unavailable (there is no file patch to edit),
+and approval prompts for a non-empty reason that is recorded in the response
+notes. There is no on-disk config diff, so — like `gitleaks-allow` and
+`capability-block` — it writes no egress audit-log entry.
+
+### Failure handling
+
+If `SUPERVISE_QUEUE_DIR` / `SUPERVISE_BOTTLE_SLUG` are unset (supervise
+disabled for the bottle), the addon skips the queue and returns the existing
+`403`. Any error writing the proposal or reading the response also fails
+closed.
+
+## Implementation chunks
+
+1. **Core** — `ScanResult.matched`; thread `safe_tokens` through
+   `scan_outbound` / token detectors; `build_token_allow_payload`.
+2. **Supervise + TUI** — `TOOL_EGRESS_TOKEN_ALLOW`; TUI suffix, modify guard,
+   required approval reason.
+3. **Addon glue** — async `request`, safe-tokens set, proposal write + async
+   poll, allow/block decision; pass `safe_tokens` into the WebSocket path.
+4. **Tests + docs** — core/supervise/TUI unit tests; README egress + supervisor
+   notes.
+
+## Open questions
+
+- Should `known_secrets` (provisioned `EGRESS_TOKEN_*` exfiltration) be
+  override-able at all, or only `token_patterns`? This PRD allows both —
+  approval is an explicit operator decision and the safe-tokens set matches the
+  exact found value — but a future revision could restrict `known_secrets` to
+  reject-only.