- Remove all time estimates (2-3 weeks, 1-2 weeks, etc.)
- Add detailed analysis of using LLM for prompt injection detection
- Survey existing models (none purpose-built for this)
- Sketch DistilBERT fine-tuning approach (~67MB quantized)
- Analyze latency/footprint tradeoffs (50-150ms vs. <5ms for patterns)
- Recommend pattern-based Phase 2, with LLM as optional Phase 2b
- Include code sketch of LLM detector with timeout fallback
- List open questions for LLM deployment
Conclusion: Patterns are faster/simpler for now; LLM only if patterns
miss sophisticated attacks in production.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Per feedback from PR 192:
- Restructure around outbound_detectors (requests to upstream) and
inbound_detectors (responses from upstream)
- Rename to 'secret exfiltration' detection for Phase 1
- Add 'known_secrets' detector for provisioned credentials
- Make scanning enabled by default per detector type
- Clarify that multiple encodings of secrets should be checked
Phase 1 now focuses on preventing outbound credential leaks.
Phase 2 handles inbound prompt injection attacks.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Investigates replacing pipelock with a custom mitmproxy-based DLP addon
that supports per-route configuration, response-specific rules, and
AI-specific threat detection (tokens, prompt injection).
Recommends building the addon in-repo to align with bot-bottle's
per-route design model and keep security logic auditable.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>