perf(dlp): memoize encoded variants and linearize partial-window scan
Two per-request hot-path costs in the egress DLP scanner: - `_encoded_variants` derived the full variant set (gzip + nine encodings) for every provisioned secret on every redaction and known-secret scan — once per host, path, header, and body. Cache it per distinct secret; callers still get a fresh list so they can't corrupt the shared cached tuple. - `_find_partial_window` searched the text once per secret n-gram, giving O(len(secret) * len(text)). Build the secret's n-gram set once and sweep the text a single time: O(len(text)), no coverage loss. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
This commit is contained in:
@@ -281,6 +281,17 @@ class TestEncodedVariants(unittest.TestCase):
|
||||
v = self._variants()
|
||||
self.assertEqual(len(v), len(set(v)))
|
||||
|
||||
def test_repeated_calls_equal(self):
|
||||
# Memoization must not change observable output.
|
||||
self.assertEqual(self._variants(), self._variants())
|
||||
|
||||
def test_returns_fresh_list_each_call(self):
|
||||
# Callers mutate/iterate the result; the cached set must not be
|
||||
# exposed by reference, or one caller could corrupt another's view.
|
||||
first = self._variants()
|
||||
first.append("MUTATED")
|
||||
self.assertNotIn("MUTATED", self._variants())
|
||||
|
||||
|
||||
class TestUnicodeNormalization(unittest.TestCase):
|
||||
def test_fullwidth_chars_normalized(self):
|
||||
|
||||
Reference in New Issue
Block a user