1ad710a041
Provider routes (the agent talking to its own LLM API — api.anthropic.com, the Codex backend, etc.) carry the whole conversation payload, which is the worst source of token-shaped false positives. egress_routes_for_bottle now fills outbound_on_match=redact on any provider route that doesn't set it explicitly, so a match there is scrubbed and forwarded rather than blocked or queued for the operator. A provider that sets the policy keeps its choice; manifest routes still default to supervise. Tests: provider route gets redact default, explicit provider policy preserved, manifest route unaffected. README + PRD 0062 updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
193 lines
9.2 KiB
Markdown
193 lines
9.2 KiB
Markdown
# PRD 0062: Supervisor override for egress token blocks
|
|
|
|
- **Status:** Active
|
|
- **Author:** claude
|
|
- **Created:** 2026-06-24
|
|
- **Issue:** #261
|
|
|
|
## Summary
|
|
|
|
Give each egress route a policy for what happens when an outbound DLP detector
|
|
matches a token, via `dlp.outbound_on_match: block | redact | supervise`
|
|
(default `supervise`):
|
|
|
|
- **`supervise`** (default) — route the block through the existing supervisor
|
|
approval queue instead of returning `403` immediately. The proxy holds the
|
|
request open until the operator approves or rejects it. On approval the
|
|
matched token is added to an in-memory "safe tokens" set so the request — and
|
|
any later request carrying the same token — flows through without
|
|
re-prompting.
|
|
- **`redact`** — scrub the matched value(s) from the request and forward it,
|
|
no operator in the loop. For routes where a token-shaped value is noise the
|
|
upstream doesn't need (telemetry/log sinks). Fails closed if a match lands on
|
|
a surface redaction can't rewrite (the hostname).
|
|
- **`block`** — the original hard `403`; never overridable. For routes where a
|
|
detected token must always stop.
|
|
|
|
The motivating goal is reducing friction from false positives without weakening
|
|
the default-deny posture: supervise keeps a human in the loop, redact is an
|
|
explicit per-route opt-in, and block stays available for sensitive routes.
|
|
|
|
## Problem
|
|
|
|
The outbound DLP detectors (`token_patterns`, `known_secrets`) are
|
|
deliberately aggressive: any string that looks like a credential is blocked
|
|
before it leaves the bottle. That is the right default, but it produces false
|
|
positives — a token-shaped value that is not actually a secret, or a credential
|
|
the agent legitimately needs to send to a declared host. Today the only
|
|
recovery is for the operator to notice the `egress DLP` 403 in the logs and
|
|
hand-edit the route's `dlp.outbound_detectors`, which disables the detector for
|
|
the whole route rather than allowing the one value.
|
|
|
|
The operator has no in-the-loop signal that a token block happened and no
|
|
fine-grained way to say "this specific value is fine."
|
|
|
|
## Goals / Success Criteria
|
|
|
|
1. An outbound DLP **token** block (a `ScanResult` carrying a matched secret
|
|
value) creates a supervisor proposal instead of an immediate `403`.
|
|
2. The egress proxy holds the blocked request open, polling for the operator's
|
|
response up to a bounded timeout.
|
|
3. The proposal shows the operator the host, method, path, the detector reason,
|
|
and a **redacted** context snippet — never the raw token value.
|
|
4. On `approved`/`modified`, the matched token value is added to an in-memory
|
|
safe-tokens set and the request proceeds normally; later requests carrying
|
|
the same value skip the block.
|
|
5. On `rejected`, timeout, malformed response, or missing supervisor wiring,
|
|
the request fails closed with the same `403` as today.
|
|
6. Structural blocks that carry no token value (CRLF injection) and the
|
|
route-not-allowlisted / git blocks are unchanged — they stay hard `403`s and
|
|
keep their existing agent-driven `allow` / `egress-block` MCP path.
|
|
7. The proxy event loop is not stalled while waiting: the wait is asynchronous,
|
|
so other flows keep being served.
|
|
|
|
## Non-goals
|
|
|
|
- Persisting the safe-tokens set across egress restarts. It lives in process
|
|
memory only; a restart re-prompts. (The issue explicitly defers persistence.)
|
|
- Supervising inbound (prompt-injection) blocks or WebSocket frame blocks.
|
|
WebSocket frames still honour the safe-tokens set for already-approved values
|
|
but cannot wait for approval (there is no response surface after upgrade).
|
|
- Generalising an approved secret across encodings. The safe-tokens set matches
|
|
the exact value the detector found.
|
|
- Replacing the per-route `dlp.outbound_detectors` override. That remains the
|
|
way to turn a detector off wholesale.
|
|
- Making `redact` the default. Silent redaction of a true false positive
|
|
corrupts legitimate data, so it is opt-in per route; `supervise` (human in
|
|
the loop) stays the default.
|
|
|
|
## Design
|
|
|
|
### On-match policy
|
|
|
|
`dlp.outbound_on_match` is a per-route enum threaded from the bottle manifest
|
|
(`manifest_egress`) through the resolved route (`egress.EgressRoute`), the
|
|
rendered `routes.yaml` (`egress_render_routes`), and the addon's `Route`
|
|
(`egress_addon_core`). Unset renders nothing and resolves to `supervise` at
|
|
request time. The `list-egress-routes` introspection endpoint round-trips it so
|
|
the agent's proposals preserve it.
|
|
|
|
**Provider routes default to `redact`.** Agent-provider routes (the agent
|
|
talking to its own LLM API — `api.anthropic.com`, the Codex backend, etc.) are
|
|
the worst source of token-shaped false positives because the whole
|
|
conversation payload flows through them. `egress_routes_for_bottle` fills
|
|
`outbound_on_match=redact` on any provider route that doesn't set it
|
|
explicitly, so a match there is scrubbed and forwarded rather than blocked or
|
|
queued. A provider that sets the policy keeps its choice; manifest routes are
|
|
unaffected (they default to `supervise`).
|
|
|
|
On an outbound block the addon dispatches on the resolved policy:
|
|
|
|
- **Structural blocks always 403.** A `ScanResult` with no `matched` value
|
|
(CRLF injection) is a hard `403` regardless of policy — there is nothing to
|
|
redact or safelist.
|
|
- **`redact`** runs `redact_tokens` over the body, non-`host` header values,
|
|
and path/query, then re-scans. If the re-scan is clean the (rewritten)
|
|
request is forwarded; if a block-severity match remains (e.g. in the
|
|
hostname, or a unicode-evasion token redaction can't reach) it fails closed
|
|
with a `403`.
|
|
- **`block`** writes the `403` immediately.
|
|
- **`supervise`** runs the queue-and-wait loop below, falling back to `block`
|
|
when supervise isn't wired for the bottle.
|
|
|
|
### Detected-value plumbing
|
|
|
|
`ScanResult` gains a `matched: str = ""` field carrying the raw substring the
|
|
detector matched. The token detectors (`scan_token_patterns`,
|
|
`scan_known_secrets`) populate it; the structural CRLF detector leaves it
|
|
empty. The value stays inside the egress sidecar process — it is never written
|
|
to a log line (logs already use the redacted `context`) nor to the proposal
|
|
file.
|
|
|
|
`scan_outbound` (and the token detectors it calls) accept a `safe_tokens`
|
|
set. A match whose value is in `safe_tokens` is skipped, so an approved token
|
|
no longer blocks. The scanners keep searching past a safelisted match so a
|
|
second, un-approved secret in the same request is still caught.
|
|
|
|
### Supervisor proposal
|
|
|
|
A new proposal tool constant `egress-token-allow` is added to
|
|
`supervise.TOOLS`. The egress addon writes the proposal directly to
|
|
`SUPERVISE_QUEUE_DIR` (the queue is bind-mounted into the sidecar bundle and
|
|
shared by every daemon, exactly as git-gate's `gitleaks-allow` proposal in PRD
|
|
0061 does). The proposal's `proposed_file` is a human-readable text payload:
|
|
|
|
```
|
|
egress blocked an outbound request carrying a detected token
|
|
host: api.example.com
|
|
method: POST
|
|
path: /v1/ingest
|
|
detector: OpenAI API key found in body
|
|
context: ...before ******** after...
|
|
```
|
|
|
|
The justification tells the operator to approve only if the value is a false
|
|
positive or a credential the request legitimately needs.
|
|
|
|
The addon then polls `<proposal-id>.response.json` for
|
|
`EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS` (default 300). `approved`/`modified`
|
|
allow the request and add the value to the safe-tokens set; `rejected`,
|
|
malformed responses, and timeout fail the request closed. The proposal +
|
|
response are archived to `processed/` after a decision.
|
|
|
|
Because the wait happens inside mitmproxy's asyncio loop, the addon's
|
|
`request` hook is async and polls with `asyncio.sleep`, so concurrent flows
|
|
are unaffected.
|
|
|
|
### Supervisor UI
|
|
|
|
`cli/supervise.py` renders `egress-token-allow` like `gitleaks-allow`: the
|
|
text payload is shown, modify is unavailable (there is no file patch to edit),
|
|
and approval prompts for a non-empty reason that is recorded in the response
|
|
notes. There is no on-disk config diff, so — like `gitleaks-allow` and
|
|
`capability-block` — it writes no egress audit-log entry.
|
|
|
|
### Failure handling
|
|
|
|
If `SUPERVISE_QUEUE_DIR` / `SUPERVISE_BOTTLE_SLUG` are unset (supervise
|
|
disabled for the bottle), the addon skips the queue and returns the existing
|
|
`403`. Any error writing the proposal or reading the response also fails
|
|
closed.
|
|
|
|
## Implementation chunks
|
|
|
|
1. **Core** — `ScanResult.matched`; thread `safe_tokens` through
|
|
`scan_outbound` / token detectors; `build_token_allow_payload`.
|
|
2. **Supervise + TUI** — `TOOL_EGRESS_TOKEN_ALLOW`; TUI suffix, modify guard,
|
|
required approval reason.
|
|
3. **Addon glue** — async `request`, safe-tokens set, proposal write + async
|
|
poll, allow/block decision; pass `safe_tokens` into the WebSocket path.
|
|
4. **On-match policy** — `dlp.outbound_on_match` through manifest → render →
|
|
addon; `redact` surface scrub with fail-closed re-scan; policy dispatch in
|
|
the addon's outbound handler.
|
|
5. **Tests + docs** — core/supervise/TUI/manifest/render unit tests; README
|
|
egress + supervisor notes.
|
|
|
|
## Open questions
|
|
|
|
- Should `known_secrets` (provisioned `EGRESS_TOKEN_*` exfiltration) be
|
|
override-able at all, or only `token_patterns`? This PRD allows both —
|
|
approval is an explicit operator decision and the safe-tokens set matches the
|
|
exact found value — but a future revision could restrict `known_secrets` to
|
|
reject-only.
|