Add analysis of Google DeepMind's CaMeL (arXiv:2503.18813), which
prevents prompt injections architecturally rather than detecting them.
Key findings:
- CaMeL operates at the agent execution layer (P-LLM/Q-LLM split +
capability-based data flow tracking), not the network layer
- Not a replacement for pipelock/DLP — different threat surface
- Not viable today: research artifact, requires agent rearchitecture,
doubles LLM costs, 7% utility loss on AgentDojo
- Worth watching: its capability model could complement bot-bottle's
network controls if it matures into production software
Also clarifies pipelock's actual detection capabilities (no prompt
injection detection) and adds naive detector sketch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove all time estimates (2-3 weeks, 1-2 weeks, etc.)
- Add detailed analysis of using LLM for prompt injection detection
- Survey existing models (none purpose-built for this)
- Sketch DistilBERT fine-tuning approach (~67MB quantized)
- Analyze latency/footprint tradeoffs (50-150ms vs. <5ms for patterns)
- Recommend pattern-based Phase 2, with LLM as optional Phase 2b
- Include code sketch of LLM detector with timeout fallback
- List open questions for LLM deployment
Conclusion: Patterns are faster/simpler for now; LLM only if patterns
miss sophisticated attacks in production.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Per feedback from PR 192:
- Restructure around outbound_detectors (requests to upstream) and
inbound_detectors (responses from upstream)
- Rename to 'secret exfiltration' detection for Phase 1
- Add 'known_secrets' detector for provisioned credentials
- Make scanning enabled by default per detector type
- Clarify that multiple encodings of secrets should be checked
Phase 1 now focuses on preventing outbound credential leaks.
Phase 2 handles inbound prompt injection attacks.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Investigates replacing pipelock with a custom mitmproxy-based DLP addon
that supports per-route configuration, response-specific rules, and
AI-specific threat detection (tokens, prompt injection).
Recommends building the addon in-repo to align with bot-bottle's
per-route design model and keep security logic auditable.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>