DLP hot-path perf + manifest load_for_agent split #310

Merged

didericis-claude merged 2 commits from dlp-perf-manifest-cleanup into main

2026-06-26 23:03:41 -04:00

Author	SHA1	Message	Date
didericis	2a67a85835	refactor(manifest): split load_for_agent into eager/lazy methods lint / lint (push) Successful in 2m18s Details test / unit (pull_request) Successful in 1m1s Details test / integration (pull_request) Successful in 28s Details test / coverage (pull_request) Successful in 1m17s Details `ManifestIndex.load_for_agent` was a ~100-line method branching across the eager (from_json_obj) and lazy (from disk) resolution modes, with the git-user merge tail duplicated in both branches. Split into `_load_for_agent_eager` / `_load_for_agent_lazy` behind a small dispatcher and extract the shared tail into `_manifest_with_merged_git_user`. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9	2026-06-26 22:53:27 -04:00
didericis	0bb47bd754	perf(dlp): memoize encoded variants and linearize partial-window scan Two per-request hot-path costs in the egress DLP scanner: - `_encoded_variants` derived the full variant set (gzip + nine encodings) for every provisioned secret on every redaction and known-secret scan — once per host, path, header, and body. Cache it per distinct secret; callers still get a fresh list so they can't corrupt the shared cached tuple. - `_find_partial_window` searched the text once per secret n-gram, giving O(len(secret) * len(text)). Build the secret's n-gram set once and sweep the text a single time: O(len(text)), no coverage loss. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9	2026-06-26 22:53:27 -04:00

Author

SHA1

Message

Date

didericis

2a67a85835

refactor(manifest): split load_for_agent into eager/lazy methods

lint / lint (push) Successful in 2m18s

Details

test / unit (pull_request) Successful in 1m1s

Details

test / integration (pull_request) Successful in 28s

Details

test / coverage (pull_request) Successful in 1m17s

Details

`ManifestIndex.load_for_agent` was a ~100-line method branching across
the eager (from_json_obj) and lazy (from disk) resolution modes, with
the git-user merge tail duplicated in both branches. Split into
`_load_for_agent_eager` / `_load_for_agent_lazy` behind a small
dispatcher and extract the shared tail into
`_manifest_with_merged_git_user`. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9

2026-06-26 22:53:27 -04:00

didericis

0bb47bd754

perf(dlp): memoize encoded variants and linearize partial-window scan

Two per-request hot-path costs in the egress DLP scanner:

- `_encoded_variants` derived the full variant set (gzip + nine
  encodings) for every provisioned secret on every redaction and
  known-secret scan — once per host, path, header, and body. Cache it
  per distinct secret; callers still get a fresh list so they can't
  corrupt the shared cached tuple.
- `_find_partial_window` searched the text once per secret n-gram,
  giving O(len(secret) * len(text)). Build the secret's n-gram set once
  and sweep the text a single time: O(len(text)), no coverage loss.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9

2026-06-26 22:53:27 -04:00

DLP hot-path perf + manifest load_for_agent split #310

2 Commits