Conform the PRD to the standard PRD-new skeleton: add a Scope section (In scope / Out of scope), rename Design -> Proposed Design and split its prose into New services / Existing code touched / Data model changes / External dependencies, fold the old Implementation chunks into In scope, and add a References section. No change in substance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
PRD 0062: Supervisor override for egress token blocks
- Status: Active
- Author: claude
- Created: 2026-06-24
- Issue: #261
Summary
Give each egress route a policy for what happens when an outbound DLP detector
matches a token, via dlp.outbound_on_match: block | redact | supervise
(default supervise):
supervise(default) — route the block through the existing supervisor approval queue instead of returning403immediately. The proxy holds the request open until the operator approves or rejects it. On approval the matched token is added to an in-memory "safe tokens" set so the request — and any later request carrying the same token — flows through without re-prompting.redact— scrub the matched value(s) from the request and forward it, no operator in the loop. For routes where a token-shaped value is noise the upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on a surface redaction can't rewrite (the hostname).block— the original hard403; never overridable. For routes where a detected token must always stop.
The motivating goal is reducing friction from false positives without weakening the default-deny posture: supervise keeps a human in the loop, redact is an explicit per-route opt-in, and block stays available for sensitive routes.
Problem
The outbound DLP detectors (token_patterns, known_secrets) are
deliberately aggressive: any string that looks like a credential is blocked
before it leaves the bottle. That is the right default, but it produces false
positives — a token-shaped value that is not actually a secret, or a credential
the agent legitimately needs to send to a declared host. Today the only
recovery is for the operator to notice the egress DLP 403 in the logs and
hand-edit the route's dlp.outbound_detectors, which disables the detector for
the whole route rather than allowing the one value.
The operator has no in-the-loop signal that a token block happened and no fine-grained way to say "this specific value is fine."
Goals / Success Criteria
- An outbound DLP token block (a
ScanResultcarrying a matched secret value) creates a supervisor proposal instead of an immediate403. - The egress proxy holds the blocked request open, polling for the operator's response up to a bounded timeout.
- The proposal shows the operator the host, method, path, the detector reason, and a redacted context snippet — never the raw token value.
- On
approved/modified, the matched token value is added to an in-memory safe-tokens set and the request proceeds normally; later requests carrying the same value skip the block. - On
rejected, timeout, malformed response, or missing supervisor wiring, the request fails closed with the same403as today. - Structural blocks that carry no token value (CRLF injection) and the
route-not-allowlisted / git blocks are unchanged — they stay hard
403s and keep their existing agent-drivenallow/egress-blockMCP path. - The proxy event loop is not stalled while waiting: the wait is asynchronous, so other flows keep being served.
Non-goals
- Persisting the safe-tokens set across egress restarts. It lives in process memory only; a restart re-prompts. (The issue explicitly defers persistence.)
- Supervising inbound (prompt-injection) blocks or WebSocket frame blocks. WebSocket frames still honour the safe-tokens set for already-approved values but cannot wait for approval (there is no response surface after upgrade).
- Generalising an approved secret across encodings. The safe-tokens set matches the exact value the detector found.
- Replacing the per-route
dlp.outbound_detectorsoverride. That remains the way to turn a detector off wholesale. - Making
redactthe default. Silent redaction of a true false positive corrupts legitimate data, so it is opt-in per route;supervise(human in the loop) stays the default.
Scope
In scope
The minimum cut that ships, in build order:
- Core —
ScanResult.matched; threadsafe_tokensthroughscan_outbound/ the token detectors;build_token_allow_payload. - Supervise + TUI —
TOOL_EGRESS_TOKEN_ALLOW; TUI suffix, modify guard, required approval reason. - Addon glue — async
request, safe-tokens set, proposal write + async poll, allow/block decision; passsafe_tokensinto the WebSocket path. - On-match policy —
dlp.outbound_on_matchthrough manifest → render → addon;redactsurface scrub with fail-closed re-scan; policy dispatch in the addon's outbound handler. - Tests + docs — core/supervise/TUI/manifest/render unit tests; README egress + supervisor notes.
Out of scope
The deferrals enumerated under Non-goals — restart persistence, inbound /
WebSocket-frame supervision, cross-encoding generalisation, replacing
dlp.outbound_detectors, and making redact the default.
Proposed Design
New services / components
A new proposal tool constant egress-token-allow (TOOL_EGRESS_TOKEN_ALLOW)
is added to supervise.TOOLS, and the egress addon gains an in-memory
safe-tokens set plus the policy-dispatch path that drives it.
On an outbound block the addon dispatches on the resolved policy:
- Structural blocks always 403. A
ScanResultwith nomatchedvalue (CRLF injection) is a hard403regardless of policy — there is nothing to redact or safelist. redactrunsredact_tokensover the body, non-hostheader values, and path/query, then re-scans. If the re-scan is clean the (rewritten) request is forwarded; if a block-severity match remains (e.g. in the hostname, or a unicode-evasion token redaction can't reach) it fails closed with a403.blockwrites the403immediately.superviseruns the queue-and-wait loop, falling back toblockwhen supervise isn't wired for the bottle.
For supervise, the addon writes the proposal directly to
SUPERVISE_QUEUE_DIR (the queue is bind-mounted into the sidecar bundle and
shared by every daemon, exactly as git-gate's gitleaks-allow proposal in PRD
0061 does). The proposal's proposed_file is a human-readable text payload
built by build_token_allow_payload:
egress blocked an outbound request carrying a detected token
host: api.example.com
method: POST
path: /v1/ingest
detector: OpenAI API key found in body
context: ...before ******** after...
The justification tells the operator to approve only if the value is a false
positive or a credential the request legitimately needs. The addon then polls
<proposal-id>.response.json for EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS (default
300). approved/modified allow the request and add the value to the
safe-tokens set; rejected, malformed responses, and timeout fail the request
closed. The proposal + response are archived to processed/ after a decision.
Because the wait happens inside mitmproxy's asyncio loop, the addon's request
hook is async and polls with asyncio.sleep, so concurrent flows are
unaffected.
Existing code touched
- Policy threading.
dlp.outbound_on_matchis a per-route enum threaded from the bottle manifest (manifest_egress) through the resolved route (egress.EgressRoute), the renderedroutes.yaml(egress_render_routes), and the addon'sRoute(egress_addon_core). Unset renders nothing and resolves tosuperviseat request time. Thelist-egress-routesintrospection endpoint round-trips it so the agent's proposals preserve it. - Provider-route default. Agent-provider routes (the agent talking to its
own LLM API —
api.anthropic.com, the Codex backend, etc.) are the worst source of token-shaped false positives because the whole conversation payload flows through them.egress_routes_for_bottlefillsoutbound_on_match=redacton any provider route that doesn't set it explicitly; a provider that sets the policy keeps its choice, and manifest routes are unaffected (they default tosupervise). - Scanners.
scan_outbound(and the token detectorsscan_token_patterns/scan_known_secretsit calls) accept asafe_tokensset. A match whose value is insafe_tokensis skipped, so an approved token no longer blocks; the scanners keep searching past a safelisted match so a second, un-approved secret in the same request is still caught. The WebSocket path is passed the samesafe_tokensset. - Supervisor UI.
cli/supervise.pyrendersegress-token-allowlikegitleaks-allow: the text payload is shown, modify is unavailable (there is no file patch to edit), and approval prompts for a non-empty reason recorded in the response notes. There is no on-disk config diff, so — likegitleaks-allowandcapability-block— it writes no egress audit-log entry. - Failure handling. If
SUPERVISE_QUEUE_DIR/SUPERVISE_BOTTLE_SLUGare unset (supervise disabled for the bottle), the addon skips the queue and returns the existing403. Any error writing the proposal or reading the response also fails closed.
Data model changes
- New per-route manifest field
dlp.outbound_on_match: block | redact | supervise, rendered intoroutes.yaml(omitted when unset). ScanResultgains amatched: str = ""field carrying the raw substring the detector matched. The token detectors populate it; the structural CRLF detector leaves it empty. The value stays inside the egress sidecar process — never written to a log line (logs use the redactedcontext) nor to the proposal file.- Proposal text payload (above) plus
<proposal-id>.response.jsoninSUPERVISE_QUEUE_DIR, archived toprocessed/after a decision. - New env var
EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS(default 300).
External dependencies
None. Reuses the existing supervisor queue (SUPERVISE_QUEUE_DIR) and the
mitmproxy addon framework already in the egress sidecar.
Open questions
- Should
known_secrets(provisionedEGRESS_TOKEN_*exfiltration) be override-able at all, or onlytoken_patterns? This PRD allows both — approval is an explicit operator decision and the safe-tokens set matches the exact found value — but a future revision could restrictknown_secretsto reject-only.
References
- Issue #261
- PRD 0061 —
gitleaks-allowsupervisor proposal pattern this reuses.