docs: evaluate CaMeL prompt injection framework for integration

Add analysis of Google DeepMind's CaMeL (arXiv:2503.18813), which prevents prompt injections architecturally rather than detecting them. Key findings: - CaMeL operates at the agent execution layer (P-LLM/Q-LLM split + capability-based data flow tracking), not the network layer - Not a replacement for pipelock/DLP — different threat surface - Not viable today: research artifact, requires agent rearchitecture, doubles LLM costs, 7% utility loss on AgentDojo - Worth watching: its capability model could complement bot-bottle's network controls if it matures into production software Also clarifies pipelock's actual detection capabilities (no prompt injection detection) and adds naive detector sketch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-04 14:13:32 -04:00
parent e6b3cd1824
commit c94a2542bd
1 changed files with 79 additions and 0 deletions
@@ -418,9 +418,88 @@ class LLMPromptInjectionDetector:
 3. How to update model if new jailbreak techniques emerge?
 4. Should we ensemble: LLM + patterns for high-confidence blocks?

+## CaMeL: Architectural prompt injection defense
+
+[CaMeL](https://github.com/google-research/camel-prompt-injection) (CApabilities for MachinE Learning) is a Google DeepMind research project that takes a fundamentally different approach to prompt injection. Rather than *detecting* injections at the network layer (like everything above), CaMeL *prevents* them by architecture — it makes prompt injections structurally unable to cause harm even when the underlying LLM is fully compromised.
+
+### How CaMeL works
+
+CaMeL wraps the agent in a protective execution layer with three components:
+
+1. **Privileged LLM (P-LLM):** Sees ONLY the user's trusted query. Generates a plan as restricted Python code — which tools to call, in what order, with what data flow. Never sees untrusted content (tool outputs, external data).
+
+2. **Quarantined LLM (Q-LLM):** Processes untrusted data (parsing emails, extracting names from documents). Has NO tool access. Even if prompt-injected, it can only return parsed data — it cannot change which tools run or how they're called.
+
+3. **CaMeL interpreter:** Executes the P-LLM's plan, maintains a **data flow graph** tracking every value's provenance (where it came from, who can read it), and enforces **capability-based security policies** before each tool call.
+
+Example flow for "Send Bob the document from our meeting notes":
+
+```
+P-LLM generates plan (trusted, no external data):
+    notes = find_meeting_notes()
+    doc_name = query_quarantined_llm(f"extract doc name from {notes}")
+    doc = fetch_document(doc_name)
+    address = query_quarantined_llm(f"extract Bob's email from {notes}")
+    send_email(doc, address)
+
+Interpreter executes:
+    1. find_meeting_notes() → tags result as source:notes, readers:[user]
+    2. Q-LLM extracts doc name → even if injected, Q-LLM can't call tools
+    3. fetch_document() → tags result with source:drive
+    4. Q-LLM extracts email → even if injected to return attacker@evil.com,
+       the capability check on send_email() catches that `address` came from
+       an untrusted source and blocks or prompts the user
+```
+
+The key insight: **control flow is determined by the trusted P-LLM's code, not by data content**. Untrusted data can never alter which tools run. And capabilities track provenance so that even manipulated *data values* get caught at tool-call time.
+
+### Performance
+
+On AgentDojo benchmark: **77% task completion with provable security guarantees** vs. 84% for an undefended agent. The 7% utility cost comes from the interpreter occasionally being too conservative (blocking legitimate operations where data provenance is ambiguous).
+
+### Integration with bot-bottle: assessment
+
+**CaMeL is NOT a replacement for pipelock or a network-layer DLP scanner.** It operates at a completely different layer — it's an agent execution framework, not a proxy. It wouldn't help with the original problem (scanning `.whl` downloads for credentials).
+
+However, CaMeL is deeply relevant to bot-bottle's broader security model:
+
+| Layer | Current bot-bottle | CaMeL equivalent |
+|-------|-------------------|------------------|
+| Network egress | Pipelock (hostname allowlist + DLP) | N/A (doesn't operate here) |
+| Credential injection | Egress addon (per-route auth) | N/A |
+| Tool access control | None (agent has full permissions) | **Capability-based policies** |
+| Data provenance | None | **Data flow graph** |
+| Control flow integrity | None (agent decides everything) | **P-LLM generates plan, interpreter enforces** |
+
+**What CaMeL would add that bot-bottle lacks today:**
+- **Data flow tracking** — bot-bottle controls *which hosts* an agent can reach, but not *what data* flows to those hosts. CaMeL tracks provenance per-value.
+- **Tool-call policies** — bot-bottle doesn't restrict which tools an agent calls or what arguments it passes. CaMeL enforces policies at every tool invocation.
+- **Separation of planning and execution** — bot-bottle gives the agent full autonomy. CaMeL splits planning (trusted) from data processing (untrusted).
+
+**Why CaMeL is NOT viable for bot-bottle today:**
+
+1. **Research artifact, not production software.** The README explicitly warns: "the interpreter implementation likely contains bugs...and might not be fully secure." Apache-2.0 licensed but no maintenance commitment.
+
+2. **Requires restructuring the agent.** CaMeL doesn't wrap an existing agent — it *replaces* the agent's execution model. Claude Code / Codex would need to be fundamentally rearchitected to generate CaMeL-compatible plans instead of directly calling tools. This is not a drop-in.
+
+3. **LLM overhead.** CaMeL requires two LLM calls per step (P-LLM for planning, Q-LLM for data parsing). For a coding agent that makes hundreds of tool calls per session, this doubles API costs and adds significant latency.
+
+4. **Utility cost.** 7% task completion loss on AgentDojo. For a coding agent where correctness matters, even small degradation in capability could be unacceptable.
+
+5. **Scope mismatch.** CaMeL protects against prompt injection via untrusted data sources. Bot-bottle's primary threat model is credential exfiltration and sandbox escape — different attack surface.
+
+### Verdict
+
+**Don't integrate CaMeL now.** It solves a real problem (prompt injection via data flow manipulation) but at a layer bot-bottle doesn't currently operate at, and with maturity/integration costs that are too high.
+
+**Watch it for the future.** If CaMeL matures into a production-ready library, its capability model could complement bot-bottle's network-layer controls — bot-bottle handles "which hosts can the agent reach" while CaMeL handles "what data can flow to those hosts." The combination would be defense-in-depth across both network and application layers.
+
+**For now, our phases stand:** Phase 1 (outbound secret exfiltration via DLP addon) and Phase 2 (inbound prompt injection via naive pattern detector) address bot-bottle's immediate needs at the network layer where we already operate.
+
 ## Open questions

 1. **Performance:** How much latency does Python string-matching add? Benchmark against pipelock.
 2. **False positives:** Will entropy detector trip on legitimate high-entropy traffic (e.g., binary API responses)? Need real-world testing.
 3. **Coverage:** Are regex patterns sufficient, or do we need more sophisticated token detection (e.g., format validation)?
 4. **Upstream:** If we build this, should we upstream it as an option to pipelock, or keep it bot-bottle-specific?
+5. **CaMeL long-term:** Monitor the project for production readiness. If it stabilizes, evaluate as a complementary application-layer defense alongside our network-layer DLP.