feat(dlp): websocket scanning, response headers, extended encoding variants, sk-proj pattern (PRD 0053)
This commit is contained in:
@@ -57,14 +57,15 @@ upstream attacker.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Scanning inbound response URLs or headers (inbound scan covers response
|
||||
body only; response URL is the same as the outbound request URL and is
|
||||
already scanned there).
|
||||
- Structured query-param parsing (treating `?k=v` as key/value pairs for
|
||||
per-param matching) — scanning the raw query string is sufficient.
|
||||
- Raw UDP/DNS queries — these bypass the HTTP proxy entirely and require a
|
||||
network-level DNS sinkhole (tracked separately in issue #205).
|
||||
- Structured query-param parsing — scanning the raw query string is
|
||||
sufficient.
|
||||
- Changes to the `dlp` block schema or detector names.
|
||||
- Scanning outbound request bodies for prompt injection (inbound only,
|
||||
per PRD 0052 design).
|
||||
- LLM-based semantic detection or entropy-based secret scanning (deferred,
|
||||
per PRD 0052 non-goals).
|
||||
|
||||
## Design
|
||||
|
||||
@@ -123,24 +124,47 @@ The `Authorization` header is present in `flow.request.headers` at this
|
||||
point (the strip happens below on line 115), so the auth-strip ordering
|
||||
invariant is automatically preserved.
|
||||
|
||||
### Test additions
|
||||
### `build_inbound_scan_text` in `egress_addon_core.py`
|
||||
|
||||
`tests/unit/test_egress_addon_core.py` gains:
|
||||
An analogous helper assembles the inbound response corpus (all response
|
||||
headers + body) for `scan_inbound`. The `response()` hook now passes this
|
||||
combined text instead of the body alone, closing the response-header
|
||||
injection vector.
|
||||
|
||||
- `TestBuildOutboundScanText` — verifies hostname, path, query, headers, and
|
||||
body each appear in the assembled text; checks that empty query and body
|
||||
are omitted.
|
||||
- `TestScanOutbound` — verifies `scan_outbound` blocks when a known token
|
||||
pattern appears in each surface independently (hostname, path, query,
|
||||
non-auth header, body), and returns `None` for a clean request.
|
||||
### WebSocket frame scanning
|
||||
|
||||
A new `websocket_message` hook in `EgressAddon` scans every frame after the
|
||||
HTTP 101 upgrade. Outbound frames (`from_client=True`) are scanned for
|
||||
credential patterns and known secrets; inbound frames are scanned for prompt
|
||||
injection. On a block the entire WebSocket connection is killed via
|
||||
`flow.kill()` (there is no HTTP response surface to write to after upgrade).
|
||||
|
||||
### Extended encoding variants in `_encoded_variants`
|
||||
|
||||
`_encoded_variants` is extended from 4 to 9 encoding forms:
|
||||
|
||||
| Added encoding | Rationale |
|
||||
|---|---|
|
||||
| Standard base64 without padding | Common in log lines where `=` is stripped |
|
||||
| URL-safe base64 with padding | JWT / OAuth standard alphabet |
|
||||
| URL-safe base64 without padding | Same, padding stripped |
|
||||
| Hex uppercase | Complements existing hex-lowercase variant |
|
||||
| Base32 | TOTP seeds; some DNS-exfil channels use base32 subdomains |
|
||||
| gzip + base64 | Recognisable by `H4sI` prefix; naive compression before encode |
|
||||
|
||||
### OpenAI project key pattern
|
||||
|
||||
`TOKEN_PATTERNS` gains `sk-proj-[A-Za-z0-9_\-]{48,}` covering OpenAI's
|
||||
newer project-scoped API key format.
|
||||
|
||||
## Implementation
|
||||
|
||||
Single commit:
|
||||
Delivered across three commits on the same branch:
|
||||
|
||||
1. Add `build_outbound_scan_text` to `egress_addon_core.py` and its
|
||||
`__all__`.
|
||||
2. Update `egress_addon.py` to import and call it.
|
||||
3. Add `TestBuildOutboundScanText` and `TestScanOutbound` to
|
||||
`tests/unit/test_egress_addon_core.py`.
|
||||
4. Flip this PRD `Status: Draft → Active`.
|
||||
1. **Outbound scan surfaces** — `build_outbound_scan_text`, `egress_addon.py`
|
||||
`request()` rewrite, `TestBuildOutboundScanText`, `TestScanOutbound`.
|
||||
2. **Remaining gaps** — extended `_encoded_variants`, `sk-proj-` pattern,
|
||||
`build_inbound_scan_text`, response-header scanning, `websocket_message`
|
||||
hook, and matching unit tests.
|
||||
3. **PRD flip** — `Status: Draft → Active` (committed with the first
|
||||
implementation commit; updated here to reflect final scope).
|
||||
|
||||
Reference in New Issue
Block a user