# DLP alternatives to pipelock: per-route configuration and response handling ## Question Pipelock lacks support for per-route or per-host response scanning rules, making it impossible to skip DLP scanning for large binary downloads (e.g., `.whl` files) while keeping scanning enabled for other traffic on the same host. Should we replace pipelock with a purpose-built DLP/token-scanning proxy that supports granular per-route configuration? ## Summary Yes. Pipelock's flat, global configuration is fundamentally at odds with the per-route model bot-bottle is built on. A custom or configurable DLP proxy built atop mitmproxy (which we already use for egress) would let us: 1. **Skip DLP scanning selectively** — e.g., scan responses from PyPI for credentials but skip scanning `.whl` file contents 2. **Configure scanning per-route** — different rules for different hosts/paths without global toggles 3. **Reduce operational surface** — one proxy (egress) instead of two (egress + pipelock) 4. **Target AI-specific threats** — focus on credential exfiltration and prompt injection instead of generic DLP **Tradeoff:** We'd need to maintain our own scanning logic. Pipelock provides out-of-the-box BIP-39 seed-phrase detection, entropy checks, and pluggable DLP rules. Building custom logic means we need to be explicit about what we're protecting against and keep that code auditable. ## Current pipelock limitations ### Issue 1: No per-route response scanning rules Pipelock's response scanning is part of TLS interception — a global feature with no per-host knobs: ```yaml tls_interception: enabled: true passthrough_domains: [...] # Can skip MITM, but not just response scanning ``` **Status:** Tested with pipelock v2.3.0. Confirmed that: - `response_body_scanning` config field doesn't exist - No way to set per-host response size limits - No way to skip scanning for specific file extensions - `tls_passthrough: true` disables both request AND response scanning (we want request scanning to stay on) ### Issue 2: Global configuration only All of pipelock's scanning rules are global. If route A wants to skip `.whl` scanning and route B wants to skip `.tar.gz`, there's nowhere to express that distinction — the config is flat. ### Issue 3: LLM prompt-specific false positives Pipelock's BIP-39 seed-phrase detector fires on any 12+ English words matching a checksum, which is common in LLM prompts/responses. Bot-bottle disables this detector globally, sacrificing protection. ### Issue 4: No prompt injection detection **Important clarification:** Pipelock does NOT detect prompt injections. It detects: - Token patterns (regex) - Entropy (random-looking strings) - BIP-39 seed phrases (12+ word checksums) But it cannot detect semantic attacks like: - Attempts to exfiltrate system prompts - Jailbreak attempts ("ignore previous instructions") - Model output that reveals internal system details This is a novel threat specific to LLM agents that pipelock wasn't designed for. ## Replacement design: mitmproxy-based DLP addon Since bot-bottle already uses mitmproxy for egress (PRD 0017), we can extend the mitmproxy addon to do DLP scanning alongside egress rules: ### Architecture ``` Agent ↓ (HTTP_PROXY=http://egress:8080) Egress (mitmproxy) ├─ Addon 1: Path allowlisting (current) ├─ Addon 2: Credential injection (current) └─ Addon 3: DLP scanning (NEW) ├─ Config: per-route scanning rules from manifest ├─ Detectors: token patterns, prompt injection, entropy └─ Action: block/warn based on route config ``` ### Per-route configuration in manifest Routes separately configure **outbound** (request to upstream) and **inbound** (response from upstream) scanning: ```yaml egress: routes: - host: api.anthropic.com dlp: outbound_detectors: [token_patterns, known_secrets] # default inbound_detectors: [naive_injection_detection] # default - host: files.pythonhosted.org dlp: outbound_detectors: [token_patterns, known_secrets] inbound_detectors: false # Skip response scanning (binary downloads) - host: internal-service.corp dlp: outbound_detectors: false inbound_detectors: false # Trusted internal, no scanning ``` **Detectors:** - `token_patterns` — API keys, GitHub tokens, AWS credentials, etc. - `known_secrets` — Secrets we provisioned (API keys, OAuth tokens passed via cred-proxy) - `naive_injection_detection` — Semantic attacks on system prompt (see section below) ### Detector design Three core detectors, each with tunable sensitivity: 1. **Token detector** - Regex patterns for API keys (AWS `AKIA`, GitHub `ghp_`, etc.) - Anthropic/OpenAI API keys - OAuth tokens (Bearer patterns) - Action: Block immediately with no false-positive tolerance 2. **Entropy detector** - Shannon entropy threshold (bits/char) - Flags high-entropy secrets (tunable per-route) - Current pipelock default: 4.5 bits/char - Action: Warn or block based on route config 3. **Prompt injection detector** (phase 2) - Detect attempts to exfiltrate system prompts via LLM outputs - Pattern: responses containing "system prompt", "instructions", "directive" + credential - Action: Block or sample for audit ### Advantages over pipelock | Aspect | Pipelock | Mitmproxy addon | |--------|----------|-----------------| | Per-route rules | ❌ (global only) | ✅ (manifest-driven) | | Response-specific config | ❌ (all-or-nothing) | ✅ (request_only, skip_extensions) | | Request scanning overhead | ✅ (lightweight) | ~same | | Maintenance burden | Low (third-party) | High (custom code) | | Auditability | Closed source | ✅ (in-repo) | | AI-specific detection | Limited | ✅ (token patterns, prompt injection) | | Code reuse | None | ✅ (egress addon framework) | ### Disadvantages 1. **Maintenance responsibility** — We own the security logic. Any bugs in detector regexes or entropy thresholds are our problem. 2. **Feature parity gap** — Pipelock's BIP-39 detector is sophisticated. We'd need to decide: replicate it, skip it, or ship a simplified version. 3. **Performance** — Custom Python detectors will be slower than pipelock's Go implementation. Benchmarking needed. 4. **Coverage breadth** — Pipelock covers generic DLP (credit cards, SSNs, etc.). We'd focus narrowly on AI/credential exfil. ## Alternative: Configurable pipelock fork Rather than build from scratch, fork pipelock and add `response_body_scanning` config: ```yaml response_body_scanning: enabled: true skip_extensions: [".whl", ".tar.gz"] max_response_bytes: 104857600 # 100MB ``` **Pros:** - Reuses existing detectors and maturity - Lower maintenance burden - Clear path to upstream (could be PR'd) **Cons:** - Still maintains a fork - Pipelock's maintainers may not want global per-host rules - Go code is farther from our codebase (harder to audit) - Doesn't solve prompt-injection detection ## Recommendation **Build the mitmproxy addon** (phase 1: tokens + entropy; phase 2: prompt injection). **Rationale:** 1. Bot-bottle already owns the mitmproxy egress addon — extending it keeps security logic in-repo and auditable. 2. Per-route DLP configuration aligns with bot-bottle's design (PRD 0017 is already per-route). 3. Replacing pipelock reduces sidecar count and operational surface. 4. AI-specific detectors (tokens, prompt injection) matter more than generic DLP for agent containment. **Fallback:** If performance testing shows unacceptable latency in the Python addon, revisit the pipelock fork approach. ## Naive prompt injection detector design Since pipelock doesn't detect prompt injections, we need a custom detector. Here's a permissive design that favors missing attacks over false positives: ### What to detect **High confidence (block immediately):** 1. Response contains known credential pattern + "system prompt" phrase together 2. Response contains both "instructions" and a token pattern **Medium confidence (warn):** 1. Response contains prompt-disclosure phrases without credentials (might be innocent documentation) 2. Multiple jailbreak keywords in single response **Ignore (too noisy):** - Single jailbreak keywords without additional context - "system prompt" in documentation contexts - Common phrases like "instructions provided" ### Naive detector pseudocode ```python class PromptInjectionDetector: # Phrases that suggest prompt exfiltration DISCLOSURE_PHRASES = [ r'(?i)(system\s+prompt|instructions\s+given|your\s+role\s+is|you\s+are\s+an?)', r'(?i)(original\s+instructions|secret\s+instructions|hidden\s+rules)', ] # Phrases suggesting jailbreak attempts JAILBREAK_PHRASES = [ r'(?i)(ignore\s+previous|forget\s+everything|disregard)', r'(?i)(from\s+now\s+on|pretend|act\s+as)', r'(?i)(bypass|circumvent|override)', ] TOKEN_PATTERNS = [ r'AKIA[0-9A-Z]{16}', # AWS r'ghp_[A-Za-z0-9_]{36}', # GitHub r'sk_live_[A-Za-z0-9]{24}', # Stripe r'Bearer\s+[A-Za-z0-9._-]{50,}', # JWT-like tokens ] def scan_response(self, response_body): """Returns (severity, reason) or (None, None) if clean.""" # Rule 1: Disclosure + token = HIGH confidence block disclosure_found = any( re.search(phrase, response_body) for phrase in self.DISCLOSURE_PHRASES ) token_found = any( re.search(pattern, response_body) for pattern in self.TOKEN_PATTERNS ) if disclosure_found and token_found: return ("BLOCK", "Prompt disclosure with embedded credential") # Rule 2: Multiple jailbreak keywords = WARN jailbreak_count = sum( 1 for phrase in self.JAILBREAK_PHRASES if re.search(phrase, response_body) ) if jailbreak_count >= 2: return ("WARN", f"{jailbreak_count} jailbreak attempts detected") # Rule 3: Disclosure alone without tokens = WARN only if very explicit if disclosure_found and "system prompt:" in response_body.lower(): return ("WARN", "Explicit system prompt disclosure") # Otherwise: clean return (None, None) ``` ### Why this is permissive 1. **Single keywords ignored** — "ignore previous instructions" in a legitimate conversation doesn't trigger 2. **Context required** — disclosure phrases need tokens or multiple jailbreak attempts 3. **Documentation exemption** — "instructions provided" in a help section won't block 4. **Warn vs. block** — Only block on high-confidence signals; warn on medium 5. **No entropy-based guessing** — We don't try to be clever about detecting obfuscated prompts ### False negatives this misses This detector intentionally lets through: - Prompt injections using novel phrasing we haven't seen - Obfuscated jailbreak attempts ("behave differently", "role-play") - Exfiltration via indirect methods ("describe the system", "what are your constraints") - Sophisticated attacks that split the prompt across multiple exchanges **Rationale:** Better to miss a sophisticated jailbreak than block legitimate agent output 100 times/day. ### Per-route configuration Routes can enable/disable prompt injection scanning: ```yaml egress: routes: - host: api.anthropic.com dlp: enabled: true detectors: [tokens, prompt_injection] - host: internal-docs.corp dlp: enabled: true detectors: [tokens] # Skip prompt injection (trusted internal) ``` ## Implementation phases ### Phase 1: Secret exfiltration detection (2-3 weeks) **Goal:** Prevent credentials from leaking to upstream services - **Token patterns detector** — API keys, GitHub tokens, AWS credentials (regex-based) - **Known secrets detector** — Check if provisioned credentials appear in outbound traffic - Secrets passed to cred-proxy or agent environment - Multiple encodings (base64, hex, URL-encoded variants) - **Outbound scanning by default** — enabled for all routes unless explicitly disabled - **Per-route config:** `outbound_detectors: [token_patterns, known_secrets]` - **Action:** Block immediately on token match; warn on entropy threshold (tuned low to avoid false positives) ### Phase 2: Prompt injection detection (1-2 weeks) **Goal:** Prevent agents from exfiltrating system prompts or being jailbroken - **Naive injection detector** — as sketched above - **Inbound scanning by default** — enabled for all routes unless explicitly disabled - **Per-route config:** `inbound_detectors: [naive_injection_detection]` - **Actions:** - BLOCK: Credential + prompt disclosure detected - WARN: Multiple jailbreak keywords or explicit prompt disclosure - ALLOW: Single keywords or documentation phrases ### Phase 3: Hardening & tuning (2-3 weeks, optional) - Real-world false positive analysis from Phase 1 & 2 - Rate limiting on DLP blocks - Audit/sampling mode for flagged responses - Additional encodings for known_secrets (GZIP, base32, etc.) ## Open questions 1. **Performance:** How much latency does Python string-matching add? Benchmark against pipelock. 2. **False positives:** Will entropy detector trip on legitimate high-entropy traffic (e.g., binary API responses)? Need real-world testing. 3. **Coverage:** Are regex patterns sufficient, or do we need more sophisticated token detection (e.g., format validation)? 4. **Upstream:** If we build this, should we upstream it as an option to pipelock, or keep it bot-bottle-specific?