49f77f2d1e
Per feedback from PR 192: - Restructure around outbound_detectors (requests to upstream) and inbound_detectors (responses from upstream) - Rename to 'secret exfiltration' detection for Phase 1 - Add 'known_secrets' detector for provisioned credentials - Make scanning enabled by default per detector type - Clarify that multiple encodings of secrets should be checked Phase 1 now focuses on preventing outbound credential leaks. Phase 2 handles inbound prompt injection attacks. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
326 lines
13 KiB
Markdown
326 lines
13 KiB
Markdown
# DLP alternatives to pipelock: per-route configuration and response handling
|
|
|
|
## Question
|
|
|
|
Pipelock lacks support for per-route or per-host response scanning rules, making it impossible to skip DLP scanning for large binary downloads (e.g., `.whl` files) while keeping scanning enabled for other traffic on the same host. Should we replace pipelock with a purpose-built DLP/token-scanning proxy that supports granular per-route configuration?
|
|
|
|
## Summary
|
|
|
|
Yes. Pipelock's flat, global configuration is fundamentally at odds with the per-route model bot-bottle is built on. A custom or configurable DLP proxy built atop mitmproxy (which we already use for egress) would let us:
|
|
|
|
1. **Skip DLP scanning selectively** — e.g., scan responses from PyPI for credentials but skip scanning `.whl` file contents
|
|
2. **Configure scanning per-route** — different rules for different hosts/paths without global toggles
|
|
3. **Reduce operational surface** — one proxy (egress) instead of two (egress + pipelock)
|
|
4. **Target AI-specific threats** — focus on credential exfiltration and prompt injection instead of generic DLP
|
|
|
|
**Tradeoff:** We'd need to maintain our own scanning logic. Pipelock provides out-of-the-box BIP-39 seed-phrase detection, entropy checks, and pluggable DLP rules. Building custom logic means we need to be explicit about what we're protecting against and keep that code auditable.
|
|
|
|
## Current pipelock limitations
|
|
|
|
### Issue 1: No per-route response scanning rules
|
|
|
|
Pipelock's response scanning is part of TLS interception — a global feature with no per-host knobs:
|
|
|
|
```yaml
|
|
tls_interception:
|
|
enabled: true
|
|
passthrough_domains: [...] # Can skip MITM, but not just response scanning
|
|
```
|
|
|
|
**Status:** Tested with pipelock v2.3.0. Confirmed that:
|
|
- `response_body_scanning` config field doesn't exist
|
|
- No way to set per-host response size limits
|
|
- No way to skip scanning for specific file extensions
|
|
- `tls_passthrough: true` disables both request AND response scanning (we want request scanning to stay on)
|
|
|
|
### Issue 2: Global configuration only
|
|
|
|
All of pipelock's scanning rules are global. If route A wants to skip `.whl` scanning and route B wants to skip `.tar.gz`, there's nowhere to express that distinction — the config is flat.
|
|
|
|
### Issue 3: LLM prompt-specific false positives
|
|
|
|
Pipelock's BIP-39 seed-phrase detector fires on any 12+ English words matching a checksum, which is common in LLM prompts/responses. Bot-bottle disables this detector globally, sacrificing protection.
|
|
|
|
### Issue 4: No prompt injection detection
|
|
|
|
**Important clarification:** Pipelock does NOT detect prompt injections. It detects:
|
|
- Token patterns (regex)
|
|
- Entropy (random-looking strings)
|
|
- BIP-39 seed phrases (12+ word checksums)
|
|
|
|
But it cannot detect semantic attacks like:
|
|
- Attempts to exfiltrate system prompts
|
|
- Jailbreak attempts ("ignore previous instructions")
|
|
- Model output that reveals internal system details
|
|
|
|
This is a novel threat specific to LLM agents that pipelock wasn't designed for.
|
|
|
|
## Replacement design: mitmproxy-based DLP addon
|
|
|
|
Since bot-bottle already uses mitmproxy for egress (PRD 0017), we can extend the mitmproxy addon to do DLP scanning alongside egress rules:
|
|
|
|
### Architecture
|
|
|
|
```
|
|
Agent
|
|
↓ (HTTP_PROXY=http://egress:8080)
|
|
Egress (mitmproxy)
|
|
├─ Addon 1: Path allowlisting (current)
|
|
├─ Addon 2: Credential injection (current)
|
|
└─ Addon 3: DLP scanning (NEW)
|
|
├─ Config: per-route scanning rules from manifest
|
|
├─ Detectors: token patterns, prompt injection, entropy
|
|
└─ Action: block/warn based on route config
|
|
```
|
|
|
|
### Per-route configuration in manifest
|
|
|
|
Routes separately configure **outbound** (request to upstream) and **inbound** (response from upstream) scanning:
|
|
|
|
```yaml
|
|
egress:
|
|
routes:
|
|
- host: api.anthropic.com
|
|
dlp:
|
|
outbound_detectors: [token_patterns, known_secrets] # default
|
|
inbound_detectors: [naive_injection_detection] # default
|
|
|
|
- host: files.pythonhosted.org
|
|
dlp:
|
|
outbound_detectors: [token_patterns, known_secrets]
|
|
inbound_detectors: false # Skip response scanning (binary downloads)
|
|
|
|
- host: internal-service.corp
|
|
dlp:
|
|
outbound_detectors: false
|
|
inbound_detectors: false # Trusted internal, no scanning
|
|
```
|
|
|
|
**Detectors:**
|
|
- `token_patterns` — API keys, GitHub tokens, AWS credentials, etc.
|
|
- `known_secrets` — Secrets we provisioned (API keys, OAuth tokens passed via cred-proxy)
|
|
- `naive_injection_detection` — Semantic attacks on system prompt (see section below)
|
|
|
|
### Detector design
|
|
|
|
Three core detectors, each with tunable sensitivity:
|
|
|
|
1. **Token detector**
|
|
- Regex patterns for API keys (AWS `AKIA`, GitHub `ghp_`, etc.)
|
|
- Anthropic/OpenAI API keys
|
|
- OAuth tokens (Bearer patterns)
|
|
- Action: Block immediately with no false-positive tolerance
|
|
|
|
2. **Entropy detector**
|
|
- Shannon entropy threshold (bits/char)
|
|
- Flags high-entropy secrets (tunable per-route)
|
|
- Current pipelock default: 4.5 bits/char
|
|
- Action: Warn or block based on route config
|
|
|
|
3. **Prompt injection detector** (phase 2)
|
|
- Detect attempts to exfiltrate system prompts via LLM outputs
|
|
- Pattern: responses containing "system prompt", "instructions", "directive" + credential
|
|
- Action: Block or sample for audit
|
|
|
|
### Advantages over pipelock
|
|
|
|
| Aspect | Pipelock | Mitmproxy addon |
|
|
|--------|----------|-----------------|
|
|
| Per-route rules | ❌ (global only) | ✅ (manifest-driven) |
|
|
| Response-specific config | ❌ (all-or-nothing) | ✅ (request_only, skip_extensions) |
|
|
| Request scanning overhead | ✅ (lightweight) | ~same |
|
|
| Maintenance burden | Low (third-party) | High (custom code) |
|
|
| Auditability | Closed source | ✅ (in-repo) |
|
|
| AI-specific detection | Limited | ✅ (token patterns, prompt injection) |
|
|
| Code reuse | None | ✅ (egress addon framework) |
|
|
|
|
### Disadvantages
|
|
|
|
1. **Maintenance responsibility** — We own the security logic. Any bugs in detector regexes or entropy thresholds are our problem.
|
|
2. **Feature parity gap** — Pipelock's BIP-39 detector is sophisticated. We'd need to decide: replicate it, skip it, or ship a simplified version.
|
|
3. **Performance** — Custom Python detectors will be slower than pipelock's Go implementation. Benchmarking needed.
|
|
4. **Coverage breadth** — Pipelock covers generic DLP (credit cards, SSNs, etc.). We'd focus narrowly on AI/credential exfil.
|
|
|
|
## Alternative: Configurable pipelock fork
|
|
|
|
Rather than build from scratch, fork pipelock and add `response_body_scanning` config:
|
|
|
|
```yaml
|
|
response_body_scanning:
|
|
enabled: true
|
|
skip_extensions: [".whl", ".tar.gz"]
|
|
max_response_bytes: 104857600 # 100MB
|
|
```
|
|
|
|
**Pros:**
|
|
- Reuses existing detectors and maturity
|
|
- Lower maintenance burden
|
|
- Clear path to upstream (could be PR'd)
|
|
|
|
**Cons:**
|
|
- Still maintains a fork
|
|
- Pipelock's maintainers may not want global per-host rules
|
|
- Go code is farther from our codebase (harder to audit)
|
|
- Doesn't solve prompt-injection detection
|
|
|
|
## Recommendation
|
|
|
|
**Build the mitmproxy addon** (phase 1: tokens + entropy; phase 2: prompt injection).
|
|
|
|
**Rationale:**
|
|
1. Bot-bottle already owns the mitmproxy egress addon — extending it keeps security logic in-repo and auditable.
|
|
2. Per-route DLP configuration aligns with bot-bottle's design (PRD 0017 is already per-route).
|
|
3. Replacing pipelock reduces sidecar count and operational surface.
|
|
4. AI-specific detectors (tokens, prompt injection) matter more than generic DLP for agent containment.
|
|
|
|
**Fallback:** If performance testing shows unacceptable latency in the Python addon, revisit the pipelock fork approach.
|
|
|
|
## Naive prompt injection detector design
|
|
|
|
Since pipelock doesn't detect prompt injections, we need a custom detector. Here's a permissive design that favors missing attacks over false positives:
|
|
|
|
### What to detect
|
|
|
|
**High confidence (block immediately):**
|
|
1. Response contains known credential pattern + "system prompt" phrase together
|
|
2. Response contains both "instructions" and a token pattern
|
|
|
|
**Medium confidence (warn):**
|
|
1. Response contains prompt-disclosure phrases without credentials (might be innocent documentation)
|
|
2. Multiple jailbreak keywords in single response
|
|
|
|
**Ignore (too noisy):**
|
|
- Single jailbreak keywords without additional context
|
|
- "system prompt" in documentation contexts
|
|
- Common phrases like "instructions provided"
|
|
|
|
### Naive detector pseudocode
|
|
|
|
```python
|
|
class PromptInjectionDetector:
|
|
# Phrases that suggest prompt exfiltration
|
|
DISCLOSURE_PHRASES = [
|
|
r'(?i)(system\s+prompt|instructions\s+given|your\s+role\s+is|you\s+are\s+an?)',
|
|
r'(?i)(original\s+instructions|secret\s+instructions|hidden\s+rules)',
|
|
]
|
|
|
|
# Phrases suggesting jailbreak attempts
|
|
JAILBREAK_PHRASES = [
|
|
r'(?i)(ignore\s+previous|forget\s+everything|disregard)',
|
|
r'(?i)(from\s+now\s+on|pretend|act\s+as)',
|
|
r'(?i)(bypass|circumvent|override)',
|
|
]
|
|
|
|
TOKEN_PATTERNS = [
|
|
r'AKIA[0-9A-Z]{16}', # AWS
|
|
r'ghp_[A-Za-z0-9_]{36}', # GitHub
|
|
r'sk_live_[A-Za-z0-9]{24}', # Stripe
|
|
r'Bearer\s+[A-Za-z0-9._-]{50,}', # JWT-like tokens
|
|
]
|
|
|
|
def scan_response(self, response_body):
|
|
"""Returns (severity, reason) or (None, None) if clean."""
|
|
|
|
# Rule 1: Disclosure + token = HIGH confidence block
|
|
disclosure_found = any(
|
|
re.search(phrase, response_body)
|
|
for phrase in self.DISCLOSURE_PHRASES
|
|
)
|
|
token_found = any(
|
|
re.search(pattern, response_body)
|
|
for pattern in self.TOKEN_PATTERNS
|
|
)
|
|
|
|
if disclosure_found and token_found:
|
|
return ("BLOCK", "Prompt disclosure with embedded credential")
|
|
|
|
# Rule 2: Multiple jailbreak keywords = WARN
|
|
jailbreak_count = sum(
|
|
1 for phrase in self.JAILBREAK_PHRASES
|
|
if re.search(phrase, response_body)
|
|
)
|
|
|
|
if jailbreak_count >= 2:
|
|
return ("WARN", f"{jailbreak_count} jailbreak attempts detected")
|
|
|
|
# Rule 3: Disclosure alone without tokens = WARN only if very explicit
|
|
if disclosure_found and "system prompt:" in response_body.lower():
|
|
return ("WARN", "Explicit system prompt disclosure")
|
|
|
|
# Otherwise: clean
|
|
return (None, None)
|
|
```
|
|
|
|
### Why this is permissive
|
|
|
|
1. **Single keywords ignored** — "ignore previous instructions" in a legitimate conversation doesn't trigger
|
|
2. **Context required** — disclosure phrases need tokens or multiple jailbreak attempts
|
|
3. **Documentation exemption** — "instructions provided" in a help section won't block
|
|
4. **Warn vs. block** — Only block on high-confidence signals; warn on medium
|
|
5. **No entropy-based guessing** — We don't try to be clever about detecting obfuscated prompts
|
|
|
|
### False negatives this misses
|
|
|
|
This detector intentionally lets through:
|
|
- Prompt injections using novel phrasing we haven't seen
|
|
- Obfuscated jailbreak attempts ("behave differently", "role-play")
|
|
- Exfiltration via indirect methods ("describe the system", "what are your constraints")
|
|
- Sophisticated attacks that split the prompt across multiple exchanges
|
|
|
|
**Rationale:** Better to miss a sophisticated jailbreak than block legitimate agent output 100 times/day.
|
|
|
|
### Per-route configuration
|
|
|
|
Routes can enable/disable prompt injection scanning:
|
|
|
|
```yaml
|
|
egress:
|
|
routes:
|
|
- host: api.anthropic.com
|
|
dlp:
|
|
enabled: true
|
|
detectors: [tokens, prompt_injection]
|
|
|
|
- host: internal-docs.corp
|
|
dlp:
|
|
enabled: true
|
|
detectors: [tokens] # Skip prompt injection (trusted internal)
|
|
```
|
|
|
|
## Implementation phases
|
|
|
|
### Phase 1: Secret exfiltration detection (2-3 weeks)
|
|
**Goal:** Prevent credentials from leaking to upstream services
|
|
|
|
- **Token patterns detector** — API keys, GitHub tokens, AWS credentials (regex-based)
|
|
- **Known secrets detector** — Check if provisioned credentials appear in outbound traffic
|
|
- Secrets passed to cred-proxy or agent environment
|
|
- Multiple encodings (base64, hex, URL-encoded variants)
|
|
- **Outbound scanning by default** — enabled for all routes unless explicitly disabled
|
|
- **Per-route config:** `outbound_detectors: [token_patterns, known_secrets]`
|
|
- **Action:** Block immediately on token match; warn on entropy threshold (tuned low to avoid false positives)
|
|
|
|
### Phase 2: Prompt injection detection (1-2 weeks)
|
|
**Goal:** Prevent agents from exfiltrating system prompts or being jailbroken
|
|
|
|
- **Naive injection detector** — as sketched above
|
|
- **Inbound scanning by default** — enabled for all routes unless explicitly disabled
|
|
- **Per-route config:** `inbound_detectors: [naive_injection_detection]`
|
|
- **Actions:**
|
|
- BLOCK: Credential + prompt disclosure detected
|
|
- WARN: Multiple jailbreak keywords or explicit prompt disclosure
|
|
- ALLOW: Single keywords or documentation phrases
|
|
|
|
### Phase 3: Hardening & tuning (2-3 weeks, optional)
|
|
- Real-world false positive analysis from Phase 1 & 2
|
|
- Rate limiting on DLP blocks
|
|
- Audit/sampling mode for flagged responses
|
|
- Additional encodings for known_secrets (GZIP, base32, etc.)
|
|
|
|
## Open questions
|
|
|
|
1. **Performance:** How much latency does Python string-matching add? Benchmark against pipelock.
|
|
2. **False positives:** Will entropy detector trip on legitimate high-entropy traffic (e.g., binary API responses)? Need real-world testing.
|
|
3. **Coverage:** Are regex patterns sufficient, or do we need more sophisticated token detection (e.g., format validation)?
|
|
4. **Upstream:** If we build this, should we upstream it as an option to pipelock, or keep it bot-bottle-specific?
|