PRD: Strengthen outbound exfiltration detection #263
Reference in New Issue
Block a user
Delete Branch "strengthen-outbound-exfil-detection"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #259.
PRD
Summary
BOT_BOTTLE_CANARY) into the agent environment andEGRESS_TOKEN_CANARYinto the egress sidecar so any outbound leak is detected with zero false positivesscan_known_secretswith asensitive_prefixesparameter;scan_outboundreadsBOT_BOTTLE_SENSITIVE_PREFIXESfromenvironto support arbitrary additional credential prefixesscan_known_secretsto defeat separator-injection and chunked-substring evasionscan_entropywarn-only detector for high-entropy blobs; register"entropy"inOUTBOUND_DETECTOR_NAMESlatin-1(bijective, byte-preserving) instead ofutf-8 errors=replaceaffd628df6to00e50973c200e50973c2to217eadf9a1- _alnum_projection(): strip non-alphanumeric chars for separator-injection detection - scan_known_secrets() gains two extra passes per secret after exact-variant matching: alnum-projection exact match (catches hyphens/spaces between secret chars) and a sliding-window partial-match scan (catches chunked substrings ≥ PARTIAL_MATCH_MIN_LEN) - scan_known_secrets() accepts sensitive_prefixes param (default ("EGRESS_TOKEN_",)) so redact_tokens and call-sites can extend the scanned env-var prefix set - scan_entropy() warn-only detector flagging windows with Shannon entropy ≥ 5.5 bits/char - "entropy" added to OUTBOUND_DETECTOR_NAMES; scan_outbound opts it in only when explicitly listed in dlp.outbound_detectors (never part of the default "all" set) - scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ to extend scan_known_secrets beyond EGRESS_TOKEN_* without schema changes - Binary bodies decoded via latin-1 fallback (bijective byte↔codepoint) instead of utf-8 errors=replace, preserving ASCII secret strings in binary payloads Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>EgressPlan gains a `canary: str` field (default "") populated in Egress.prepare() using secrets.token_urlsafe(32). Each launched bottle: - sidecar receives EGRESS_TOKEN_CANARY=<value> (literal env entry, scanned by existing known-secrets detector without any detector code changes) - agent receives BOT_BOTTLE_CANARY=<value> (visible fake secret that signals exfiltration with zero false positives if it appears in outbound traffic) Docker compose and macos-container backends updated; smolmachines shares docker compose and so picks this up automatically. Unit tests cover canary uniqueness, detection via scan_known_secrets, and EgressPlan backward-compat default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>fcfe2f0849to0a7e166b35@didericis-codex
Let's not name the canary "Canary", is a bit too obvious. Let's try creating a random name with postfix SECRET as the env var. Something like
CANON_ALPHA_SECRET. Have it change on startup. If there's no stdlib way of generating random words, create a fixed set of random words and choose a random 2 for the secret name. If you think of a better way to do this or any problems with this approach let me know.Done in
4808ef5.The canary now uses a per-launch random
WORD_WORD_SECRETenv var name generated from a fixed word list. Both Docker and macOS launch paths inject that same name/value into the agent and sidecar, and the sidecar also getsBOT_BOTTLE_SENSITIVE_PREFIXES=<that exact env name>so the existing known-secret detector scans it without exposing a fixed canary-looking name.Validation: pyright clean; focused egress/rendering tests pass; full unit suite passed before the final rebase.
Good catch: smolmachines was not covered by the Docker compose launch path. It shares the sidecar image and
EgressPlan, but has separatedocker runbundle env assembly plus separate smolvm guest env stamping.I fixed that in
14ae895:<random>_SECRET=<canary>andBOT_BOTTLE_SENSITIVE_PREFIXES=<random>_SECRET<random>_SECRET=<canary>Validation: pyright clean; focused smolmachines/egress/rendering tests passed; full unit suite passed.
@@ -353,6 +353,9 @@ def _sidecar_env_entries(plan: MacosContainerBottlePlan) -> tuple[str, ...]:env: list[str] = []if plan.egress_plan.routes:env.extend(sorted(plan.egress_plan.token_env_map.keys()))if plan.egress_plan.canary and plan.egress_plan.canary_env:The fact that we needed to update all the backends to propagate the env vars here is a red flag, should be backend agnostic. Can we move this sidecar env provisioning to a location that gets shared between backends?
Addressed the backend-agnostic concern from https://gitea.dideric.is/didericis/bot-bottle/pulls/263/files#issuecomment-2326 in
5204b98.The egress launch env rules now live in shared helpers:
egress_sidecar_env_entries(plan.egress_plan)returns route token env slots plus the canary env andBOT_BOTTLE_SENSITIVE_PREFIXESegress_agent_env_entries(plan.egress_plan)returns the agent-visible canary bait envDocker, macOS container, and smolmachines all call those helpers now, so the canary/sensitive-prefix propagation no longer has per-backend policy copies. Added direct unit coverage for the shared helpers.
Validation: pyright clean; focused backend/rendering tests passed; full unit suite passed.
@@ -1,4 +1,4 @@"""DLP detectors for the egress proxy (PRD 0053)."""DLP detectors for the egress proxy (PRD 0053, prd-new).remove prd-new citations in docstrings
Addressed in
d9a9eef.Removed the
prd-newplaceholder citations from Python docstrings/section comments in the detector and related tests. I also scannedbot_bottleandtestsPython files to confirm noprd-newreferences remain.Validation: pyright clean; focused detector/egress tests passed.