Files
bot-bottle/docs/prds/0052-egress-dlp-addon.md
didericis-codex 3f04567290
test / unit (pull_request) Successful in 42s
test / integration (pull_request) Successful in 27s
lint / lint (push) Successful in 1m53s
test / unit (push) Successful in 41s
test / integration (push) Successful in 23s
Update Quality Badges / update-badges (push) Successful in 1m35s
egress: require opt-in for HTTPS git fetch
2026-06-10 07:00:01 +00:00

16 KiB

PRD 0052: Egress DLP addon

  • Status: Active
  • Author: claude
  • Created: 2026-06-05
  • Issue: #195

Summary

With pipelock removed (PR #193), the egress proxy no longer performs DLP scanning on traffic to or from the agent. This PRD implements a replacement directly inside the mitmproxy egress addon: per-route DLP detectors that scan outbound requests for credential leakage and inbound responses for prompt injection attempts.

The manifest route schema is also upgraded in this PRD from the flat path_allowlist field to a structured matches block modelled on the Kubernetes Gateway API HTTPRoute match vocabulary. This upgrade is a hard cutover — no compatibility shim for the old format. The rationale and format survey are in the YAML route matching formats research doc. DLP detectors attach to the new matches-based routes directly.

The design follows the recommendation in the DLP research document (PR #192) and covers all three remaining implementation phases from that plan:

  1. Token pattern detection (Phase 1a)
  2. Known-secrets detection (Phase 1b)
  3. Naive prompt injection detection (Phase 2)

Problem

Pipelock was removed because it could not support per-route response scanning, blocking selective DLP policies (e.g., skip scanning .whl downloads while keeping scanning on API calls). Removing it left the egress proxy with no DLP capability at all. The egress addon already holds per-route logic for path allowlisting and credential injection; DLP rules belong in the same place.

The existing path_allowlist field is also limiting: it only supports path prefixes, with no way to express exact-path, regex, method, or header constraints. The Gateway API match vocabulary is a well-specified, widely deployed standard that covers all of these without inventing new syntax.

Goals / Success Criteria

  1. Outbound request bodies and headers are scanned for known token patterns (AWS, GitHub, Anthropic, etc.) before the request reaches the upstream. Matches are blocked immediately.
  2. Outbound request bodies are scanned for provisioned secrets that the agent should not have direct access to. Matches are blocked immediately.
  3. Inbound response bodies are scanned for prompt disclosure and jailbreak signals. High-confidence matches are blocked; medium-confidence matches emit a log warning and are forwarded.
  4. DLP scanning is enabled by default on every route. Individual routes can selectively disable outbound detectors, inbound detectors, or both via a dlp block in the manifest.
  5. All detector logic lives in egress_addon_core.py (pure Python, no mitmproxy dependency) and is covered by unit tests on the host.
  6. Each route's matches block supports path (exact/prefix/regex), HTTP method, and header predicates using Gateway API match semantics.
  7. The manifest change is a hard cutover: path_allowlist is removed with no fallback, no deprecation alias, and no loud exception for old-format manifests. Old manifests that use path_allowlist will fail validation at load time with an unknown-key error (same as any other unrecognised key today).

Non-goals

  • LLM-based semantic prompt injection detection (explicitly deferred to a potential Phase 2b per the research doc).
  • Entropy-based secret detection (excluded from scope; too many false positives on binary API responses and compressed payloads).
  • BIP-39 seed-phrase detection.
  • Generic DLP (credit cards, SSNs, PII) — scope is narrow: AI/credential exfil relevant to agent containment.
  • Changes to the cred-proxy sidecar.
  • Streaming response scanning (scan buffered response body only).
  • Glob-style path matching — regex covers every case glob would handle without adding a third path-matching language.

Design

Route matching: Gateway API matches vocabulary

The existing path_allowlist field is replaced by a matches list. The vocabulary mirrors Kubernetes Gateway API HTTPRouteMatch (see the route matching research doc for a full format survey and rationale). Gateway API was chosen because it is spec-backed, implementation-tested across multiple proxies, and its {type, value} pattern is consistent and schema-validatable.

AND/OR semantics (same as Gateway API):

  • Predicates within a single matches entry are ANDed.
  • Multiple entries in the matches list are ORed — the route matches if any entry matches.
egress:
  routes:
    # Bare route — all traffic to this host is forwarded (no path/method/header
    # constraints). Equivalent to the old path_allowlist-omitted case.
    - host: api.anthropic.com
      auth:
        scheme: Bearer
        token_ref: EGRESS_TOKEN_0

    # Two match entries (OR): GET/HEAD on /packages/** OR POST on /upload
    - host: files.pythonhosted.org
      matches:
        - paths:
            - type: prefix
              value: /packages/
          methods: [GET, HEAD]
        - paths:
            - type: exact
              value: /upload
          methods: [POST]
      dlp:
        inbound_detectors: false   # skip response scanning (binary downloads)

    # Header + regex path — only JSON API responses on versioned endpoints
    - host: internal-api.corp
      matches:
        - paths:
            - type: regex
              value: "^/v[0-9]+/"
          headers:
            - name: Content-Type
              type: exact
              value: application/json
      dlp:
        outbound_detectors: false
        inbound_detectors: false

Path matching types

type Semantics
exact Full path must equal value exactly
prefix Path must start with value at a segment boundary (matches /api/v1 for value /api/v1, rejects /api/v10)
regex RE2 regex; rejected at load time if pattern fails to compile. Use for wildcard needs: /api/[^/]+/data instead of glob

type defaults to prefix when omitted (preserves the semantic of the old path_allowlist).

Method matching

methods is a list of HTTP method names, case-insensitive at parse time — get, GET, and Get are all accepted and stored as uppercase internally. An absent or empty methods list means all methods are permitted.

Header matching

headers is a list of {name, value, type} objects. ALL listed headers must match (AND semantics). To OR on header values, use multiple matches entries.

type Semantics
exact Header value equals value (default when type omitted)
regex Header value matches RE2 regex

Manifest schema — dlp block

Each egress.routes entry gains an optional dlp key alongside matches and auth:

egress:
  routes:
    - host: api.anthropic.com
      # dlp omitted → all detectors on (default)

    - host: files.pythonhosted.org
      dlp:
        inbound_detectors: false   # skip response scanning (binary downloads)

    - host: internal-docs.corp
      dlp:
        outbound_detectors: false
        inbound_detectors: false   # trusted internal, no scanning

outbound_detectors controls scanning of the request body + headers leaving the agent. inbound_detectors controls scanning of the response body arriving from the upstream.

Valid values per field:

  • Omitted (or null) — default: all detectors active.
  • false — scanning disabled for this direction on this route.
  • A list of detector names — only the listed detectors run.

Named outbound detectors: token_patterns, known_secrets. Named inbound detectors: naive_injection_detection.

The manifest parser (manifest_egress.py) validates the dlp block and rejects unknown detector names.

Manifest schema — git block

HTTPS Git clone/fetch traffic is not implied by a host-level egress route. Smart HTTP Git fetch uses git-upload-pack, which can transfer large repo packfiles and bypass the git-gate mirror path. It is therefore blocked by default and must be explicitly enabled per route:

egress:
  routes:
    - host: github.com
      git:
        fetch: true

git.fetch: true permits read-only smart HTTP clone/fetch requests (git-upload-pack) after the normal host and matches checks pass. HTTPS Git push (git-receive-pack) remains blocked by the egress addon.

EgressRoute changes

EgressRoute replaces PathAllowlist with Matches and gains two new DLP fields. MatchEntry captures one AND-predicate block:

@dataclass(frozen=True)
class PathMatch:
    type: str   # "exact" | "prefix" | "regex"
    value: str


@dataclass(frozen=True)
class HeaderMatch:
    name: str
    value: str
    type: str = "exact"   # "exact" | "regex"


@dataclass(frozen=True)
class MatchEntry:
    paths: tuple[PathMatch, ...] = ()     # empty = match any path
    methods: tuple[str, ...] = ()         # empty = match any method (uppercase)
    headers: tuple[HeaderMatch, ...] = () # empty = match any headers


@dataclass(frozen=True)
class EgressRoute:
    Host: str
    Matches: tuple[MatchEntry, ...] = ()  # empty = match all requests
    AuthScheme: str = ""
    TokenRef: str = ""
    Role: tuple[str, ...] = ()
    GitFetch: bool = False
    OutboundDetectors: tuple[str, ...] | None = None   # None = all enabled
    InboundDetectors: tuple[str, ...] | None = None    # None = all enabled

manifest_egress.py's from_dict parses the new matches block and dlp block; path_allowlist is no longer a recognised key and will be rejected by the unknown-key check.

Route changes in egress_addon_core.py

The addon-side Route and its helper types mirror the manifest-side changes. match_route is extended to evaluate the Matches list:

@dataclass(frozen=True)
class Route:
    host: str
    matches: tuple[MatchEntry, ...] = ()
    auth_scheme: str = ""
    token_env: str = ""
    git_fetch: bool = False
    outbound_detectors: tuple[str, ...] | None = None
    inbound_detectors: tuple[str, ...] | None = None

decide() feeds through match_route (unchanged host lookup) then evaluates the match entries in order; if the route has no matches entries all requests pass. Path prefix type uses segment-boundary checking (/api/v1 matches /api/v1/foo but not /api/v10).

Detector interface

Each detector is a pure function:

def scan(body: str | bytes, *, env: Mapping[str, str] = {}) -> ScanResult | None:
    ...

ScanResult carries:

@dataclass(frozen=True)
class ScanResult:
    severity: str   # "block" or "warn"
    reason: str

scan returns None if the body is clean, ScanResult otherwise.

Detector: token_patterns

Regex patterns for well-known credential formats, applied to the outbound request body and Authorization header (before the addon strips it — the strip happens after DLP scanning so that the scan sees any credential the agent tried to smuggle):

Token type Pattern
AWS access key AKIA[0-9A-Z]{16}
GitHub token (classic) ghp_[A-Za-z0-9_]{36}
GitHub fine-grained github_pat_[A-Za-z0-9_]{82}
Anthropic API key sk-ant-[A-Za-z0-9\-_]{93}
OpenAI API key sk-[A-Za-z0-9]{48}
Stripe live key sk_live_[A-Za-z0-9]{24}
Generic Bearer JWT Bearer\s+[A-Za-z0-9._\-]{50,}

Action: "block" on any match. No tolerance — a credential in an outbound request is always a violation.

Detector: known_secrets

At request time the egress addon has access to os.environ, which includes all token_env values declared by route auth blocks. The detector:

  1. Collects all EGRESS_TOKEN_* values from the environment (the naming contract established by manifest_egress.py's TokenRef rendering).
  2. For each secret value, derives encoded variants: raw, base64, URL-encoded, hex.
  3. Scans the outbound request body for any variant.

Action: "block" on match.

This detector does not accept a custom detector name in the YAML — it is always named known_secrets. The environment is passed in via the env keyword argument to scan.

Detector: naive_injection_detection

Pattern-based inbound response scanner. Uses two tiers:

Tier 1 — BLOCK (credential + disclosure together):

  • Response contains a token-pattern match (reuses token_patterns regex set) AND a prompt-disclosure phrase (e.g., system prompt, my instructions are, hidden rules).

Tier 2 — WARN (multiple jailbreak signals):

  • Two or more jailbreak phrases detected (e.g., ignore previous, forget everything, pretend you are, act as).
  • OR explicit prompt disclosure (system prompt:) without a credential.

Tier 3 — ALLOW:

  • Single jailbreak keyword without additional context.
  • Common documentation phrases.

See the DLP research doc for the full phrase lists and pseudocode.

Wiring into egress_addon.py

Two new mitmproxy hooks are added alongside the existing request hook:

def request(self, flow: http.HTTPFlow) -> None:
    # ... existing match + auth-injection logic ...
    # After route decision, if action == "forward":
    result = scan_outbound(route, flow.request, os.environ)
    if result and result.severity == "block":
        flow.response = http.Response.make(403, result.reason.encode(), ...)
        return

def response(self, flow: http.HTTPFlow) -> None:
    route = match_route(self.routes, flow.request.pretty_host)
    if route is None:
        return  # already blocked at request time
    result = scan_inbound(route, flow.response)
    if result and result.severity == "block":
        flow.response = http.Response.make(403, result.reason.encode(), ...)
    elif result and result.severity == "warn":
        sys.stderr.write(f"egress DLP warn: {result.reason}\n")

scan_outbound and scan_inbound are pure functions in egress_addon_core.py that dispatch to the per-route detector list.

Ordering: auth strip vs. DLP scan

The DLP outbound scan sees the agent's original Authorization header before the addon strips it. This ensures that a token the agent smuggled in the header is caught. The strip + optional re-injection still happens afterward, preserving the existing credential-injection security model.

Implementation chunks

  1. New matches block + EgressRoute / Route restructure. Remove path_allowlist from manifest_egress.py and egress_addon_core.py. Add MatchEntry, PathMatch, HeaderMatch types. Parse matches in EgressRoute.from_dict and _parse_one; unknown-key rejection handles old path_allowlist manifests. Add OutboundDetectors / InboundDetectors to EgressRoute and Route; parse dlp block. Extend tests/unit/test_manifest_egress.py and tests/unit/test_egress_addon_core.py with match and dlp valid/invalid cases.

  2. Token-patterns detector (Phase 1a). New module bot_bottle/dlp_detectors.py (host-importable) and companion flat copy for the sidecar bundle. Add TokenPatternsDetector with the regex set above. Wire scan_outbound into the request hook in egress_addon.py. Unit tests in tests/unit/test_dlp_detectors.py.

  3. Known-secrets detector (Phase 1b). Add KnownSecretsDetector to dlp_detectors.py. Collect EGRESS_TOKEN_* from env; derive encoded variants; scan request body. Extend unit tests. Wire into scan_outbound.

  4. Naive prompt injection detector (Phase 2). Add NaiveInjectionDetector to dlp_detectors.py. Wire scan_inbound into the new response hook in egress_addon.py. Extend unit tests. Activate PRD 0052 (Status: Draft → Active) in this commit.

Open questions

  1. Response body buffering: mitmproxy's response hook already has the full body for non-streaming responses. For streaming (chunked) responses the body may be empty or incomplete at hook time. Scope for now: log a warning and skip scanning on streaming responses; revisit if needed.
  2. Encoding breadth for known_secrets: Start with raw + base64 + URL-encoded + hex. Add GZIP / base32 if real-world evasion attempts appear.
  3. EGRESS_TOKEN_* naming contract: The detector relies on the env-var naming convention from manifest_egress.py. If that contract changes, the detector must be updated in lock-step.