DLP injection-check perf, bounded variant cache, dedup supervise schema #312

Merged
didericis-claude merged 1 commits from dlp-supervise-quality-fixes into main 2026-06-26 23:30:20 -04:00
Collaborator

Summary

Addresses three findings from a quality eval of the DLP / supervise core. No behavior change to the egress decision surface; tests added for the rewritten proximity path.

  • dlp_detectors._closest_pair — linearize + early-out. The naive-injection detector's disclosure×jailbreak proximity check ran an O(n*m) cross product over regex matches in attacker-controlled response bodies that have already passed the body-size cap — a latent DoS. Replaced with an O(n log n) sort + O(n) two-pointer merge (advance the span that ends first), with an early-out once any pair falls inside PROXIMITY_CHARS. Extracted _match_gap so the caller reuses the same span-gap calculation.
  • Bound the encoded-variant cache. _compute_encoded_variants was memoized in an unbounded module-level dict; a long-lived proxy seeing rotating secrets would grow it without limit. Swapped to a bounded functools.lru_cache(maxsize=256).
  • De-duplicate the supervise proposal schema. egress-allow and egress-block carried a ~40-line routes_yaml inputSchema block duplicated verbatim (silent-drift risk). Extracted _ROUTES_YAML_DESCRIPTION + _proposal_input_schema() as a single source of truth.

Tests

  • New scan_naive_injection cases: a near pair hidden among far-apart matches still blocks (exercises the merge, not just first-of-each-list); many far-apart matches stay warn.
  • Full unit suite green (1486 tests); pyright clean on the changed modules.
## Summary Addresses three findings from a quality eval of the DLP / supervise core. No behavior change to the egress decision surface; tests added for the rewritten proximity path. - **`dlp_detectors._closest_pair` — linearize + early-out.** The naive-injection detector's disclosure×jailbreak proximity check ran an `O(n*m)` cross product over regex matches in attacker-controlled response bodies that have already passed the body-size cap — a latent DoS. Replaced with an `O(n log n)` sort + `O(n)` two-pointer merge (advance the span that ends first), with an early-out once any pair falls inside `PROXIMITY_CHARS`. Extracted `_match_gap` so the caller reuses the same span-gap calculation. - **Bound the encoded-variant cache.** `_compute_encoded_variants` was memoized in an unbounded module-level dict; a long-lived proxy seeing rotating secrets would grow it without limit. Swapped to a bounded `functools.lru_cache(maxsize=256)`. - **De-duplicate the supervise proposal schema.** `egress-allow` and `egress-block` carried a ~40-line `routes_yaml` `inputSchema` block duplicated verbatim (silent-drift risk). Extracted `_ROUTES_YAML_DESCRIPTION` + `_proposal_input_schema()` as a single source of truth. ## Tests - New `scan_naive_injection` cases: a near pair hidden among far-apart matches still blocks (exercises the merge, not just first-of-each-list); many far-apart matches stay `warn`. - Full unit suite green (1486 tests); pyright clean on the changed modules.
didericis-claude added 1 commit 2026-06-26 23:22:52 -04:00
perf(dlp): linearize injection proximity check; bound variant cache; dedup supervise schema
lint / lint (push) Successful in 2m21s
test / unit (pull_request) Successful in 1m1s
test / integration (pull_request) Successful in 27s
test / coverage (pull_request) Successful in 1m15s
b7f5f6439e
- dlp_detectors._closest_pair: replace the O(n*m) cross product with an
  O(n log n) sort + O(n) two-pointer merge, and early-out once a pair
  falls within the proximity threshold. The inputs are attacker-controlled
  response-body matches past the body-size cap, so the quadratic form was a
  latent DoS. Extract _match_gap to share the span-gap calc with the caller.
- dlp_detectors._compute_encoded_variants: back the memo with a bounded
  functools.lru_cache instead of an unbounded module dict, so a long-lived
  proxy seeing rotating secrets evicts rather than growing without limit.
- supervise_server: extract the duplicated routes.yaml inputSchema into
  _proposal_input_schema()/_ROUTES_YAML_DESCRIPTION so the egress-allow and
  egress-block tools can't drift.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
didericis-claude merged commit a256e5762a into main 2026-06-26 23:30:20 -04:00
Sign in to join this conversation.