From c74bd5cf26ec782fa497a9a91ad68619a21199f1 Mon Sep 17 00:00:00 2001
From: didericis <eric@dideric.is>
Date: Fri, 8 May 2026 00:00:51 -0400
Subject: [PATCH] docs: add research note on multi-encoding secret exfil
 tripwires

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../secret-exfil-tripwire-encodings.md        | 459 ++++++++++++++++++
 1 file changed, 459 insertions(+)
 create mode 100644 docs/research/secret-exfil-tripwire-encodings.md

diff --git a/docs/research/secret-exfil-tripwire-encodings.md b/docs/research/secret-exfil-tripwire-encodings.md
new file mode 100644
index 0000000..d5c4636
--- /dev/null
+++ b/docs/research/secret-exfil-tripwire-encodings.md
@@ -0,0 +1,459 @@
+# Secret exfiltration tripwires: detecting a known string in many encodings
+
+Research into tooling that, given a known secret string, generates its
+representation in many common encodings so that simple string or regex matchers
+can detect the secret in outbound traffic or logged output — regardless of
+which naive encoding a misbehaving agent uses.
+
+## Summary
+
+- **No off-the-shelf tool does this exactly.** A search of GitHub, PyPI, npm,
+  and security tool indexes (awesome-lists, awesome-honeypots, awesome-yara,
+  awesome-canaries) found no project whose stated purpose is "given a secret,
+  emit N encoded forms suitable for use as grep or regex patterns against
+  outbound traffic." The capability exists in fragments across several adjacent
+  categories, none of which compose into the complete picture.
+- **The closest existing answer is YARA's `base64` / `base64wide` string
+  modifiers** (added in YARA 3.6). A single string declaration with both
+  modifiers generates all three base64 offset permutations automatically, in
+  plain and UTF-16LE form. Combined with separate `ascii` / `wide` variants,
+  this covers five of the most common encoding forms with no Python needed.
+- **Secret scanners** (gitleaks, trufflehog, detect-secrets, ggshield) solve
+  the inverse problem: detecting *unknown* secrets matching known patterns.
+  They are not designed to scan for a specific known literal in N encodings.
+  Gitleaks (v8.20+) and TruffleHog do perform multi-encoding decodes before
+  running their detectors, but only as a preprocessing step — not as a way to
+  produce N encoded forms for downstream matchers.
+- **Canary token services** (Thinkst Canarytokens, OpenCanary) are callback
+  canaries: detection fires when the token itself is accessed and phones home.
+  They do not scan outbound streams for encoded representations of the canary
+  value. They address a different threat model.
+- **Enterprise DLP** products (Microsoft Purview, Symantec DLP/Broadcom,
+  Nightfall, Cyberhaven) perform encoding-aware matching internally as a
+  black-box feature. The capability is real but not exposed as an API and
+  not available for use in a self-hosted container sidecar. Symantec DLP
+  explicitly does not decode base64 or ROT13 in all inspection paths due
+  to processing overhead concerns.
+- Rolling this in ~100 lines of Python is feasible and is probably the right
+  path for claude-bottle v1. The limiting factor is not the encoding logic
+  — that is straightforward — but the false-positive rate from common
+  base64 alphabet collisions and the zero coverage against any re-encoding
+  that involves a key (encryption) or destroys byte boundaries (packet
+  splitting).
+
+---
+
+## Encoding catalog
+
+The following encodings are the realistic candidates for a naive agent
+performing unintentional or low-sophistication exfiltration. For each,
+the "FP risk" column notes the false-positive risk in a pattern matcher
+scanning general HTTP request bodies or log lines.
+
+| # | Encoding | Notes | FP risk |
+|---|----------|-------|---------|
+| 1 | Raw UTF-8 bytes (literal) | Baseline. Exact substring match. | Lowest |
+| 2 | Base64 standard (RFC 4648 §4) with `=` padding | Up to 3 offset variants depending on byte alignment at encode boundary; YARA covers all three with `base64` modifier. | Low–medium; base64 collisions are rare for 16+ char secrets |
+| 3 | Base64 without padding | Same byte content, trailing `=` stripped. Matches appear inside longer base64 blobs; needs anchor-free search. | Medium |
+| 4 | URL-safe base64 (RFC 4648 §5) | `+` → `-`, `/` → `_`. Required by many JWT and OAuth token formats. A separate encoding class from standard base64. | Low |
+| 5 | Base64 of UTF-16LE encoding | Re-encode the secret as UTF-16LE, then base64. Produces a different ciphertext than UTF-8 base64. YARA `base64wide` modifier covers this. | Low |
+| 6 | Base64 of UTF-16BE encoding | Same as above but big-endian byte order. Rare in practice but trivial to generate; include for completeness. | Very low |
+| 7 | Double base64 | Base64 applied twice. Some log-shipping pipelines or config-encoding layers do this naively. | Very low |
+| 8 | Hex lowercase (`0-9a-f`) | Every byte becomes two hex digits; trivially detectable with a fixed-length hex string search. | Low |
+| 9 | Hex uppercase (`0-9A-F`) | Same encoded value, different case. Separate regex needed if your matcher is case-sensitive. | Low |
+| 10 | Percent/URL encoding of all bytes | `%xx` for every byte. Browsers and curl typically only encode special chars; a misbehaving agent might encode all bytes. | Low in raw bodies; medium in URL query strings |
+| 11 | Percent encoding of non-ASCII bytes only | The common case. Only high-bytes and special chars encoded; printable ASCII runs verbatim. | Medium; overlaps with normal URL encoding |
+| 12 | Double percent encoding | `%` → `%25`, so `%41` → `%2541`. Common WAF bypass; appears in logged URLs after server-side decode. | Very low |
+| 13 | JSON string escaping | `"` → `\"`, `\` → `\\`, non-ASCII → `\uXXXX`. Relevant when a secret is serialized into a JSON payload body. | Medium; very common in request bodies |
+| 14 | HTML/XML entity encoding | `&amp;`, `&#xXX;`, `&#DDD;`. Relevant for form POST bodies and SOAP/XML egress. | Medium in HTML context; low in JSON |
+| 15 | UTF-16LE raw bytes | Interleaved NUL bytes; `ABC` becomes `41 00 42 00 43 00`. Visible in PCAP or raw log hex dumps. YARA `wide` modifier covers this. | Low in text content; medium in binary/multipart streams |
+| 16 | UTF-32LE raw bytes | Four bytes per character. Unusual in web payloads but trivial for an agent to produce via Python's `encode('utf-32-le')`. | Very low |
+| 17 | ROT13 | Caesar cipher, shift 13, printable ASCII only. Shifts letters; digits and special chars unchanged. Weak obfuscation, cheap to detect. | Medium; common English words collide |
+| 18 | ROT47 | Extends ROT13 over printable ASCII range 33–126. Transforms digits and symbols too. Less collision-prone than ROT13. | Low |
+| 19 | gzip + base64 | `gzip(plaintext)` then base64-encode the binary output. Output is always recognizable by the `H4sI` base64 prefix for RFC 1952 gzip magic bytes. | Low; `H4sI` prefix is a cheap anchor |
+| 20 | zlib/deflate + base64 | `zlib.compress()` (DEFLATE with zlib header) then base64. Output starts with `eJy` or similar zlib magic prefix in base64. | Low; magic prefix detectable |
+| 21 | Leetspeak / character substitution | `e` → `3`, `a` → `@`, `o` → `0`, `i` → `1`, etc. No fixed mapping; generates many variants. High FP against non-secret content. | High; impractical to enumerate exhaustively |
+| 22 | Reversed bytes | Secret reversed character-by-character. Trivial, occasionally used as a confusion layer. | Low |
+| 23 | Space-separated characters | `SECRET` → `S E C R E T`. Defeats substring search; requires regex `S\s+E\s+C\s+R\s+E\s+T`. | Very low |
+| 24 | Null-separated characters (wide variant) | Like space-separated but with literal `\x00` bytes. Same as UTF-16LE for ASCII-only secrets. | Very low |
+| 25 | Base32 (RFC 4648 §6) | Used in TOTP seeds, some DNS exfil channels. Alphabet is `A-Z2-7`. Longer output than base64. | Low for secrets ≥ 10 chars |
+
+**Diminishing returns note.** Encodings 1–13 cover the vast majority of
+realistic naive-exfil vectors for an agent using standard Python or shell
+tools. Encodings 14–25 are worth including in a comprehensive scanner but
+individually contribute little marginal risk reduction.
+
+---
+
+## Adjacent category survey
+
+### Secret scanners
+
+The major open-source secret scanners — gitleaks, TruffleHog, detect-secrets,
+and ggshield — all solve the *inverse* problem: given a body of text, find
+strings that look like secrets using entropy analysis or regular expressions
+trained on known credential formats. They are not designed to answer "does
+this payload contain *this specific known value* in any encoding?"
+
+The distinction matters. These tools look for patterns that match the
+structural form of, e.g., an AWS access key (`AKIA...`). They are not
+designed to take a user-provided literal and report all byte-equivalent
+encodings of it.
+
+That said, two of these tools have encoding-aware *preprocessing* steps
+that are directly relevant:
+
+- **Gitleaks** ([github.com/gitleaks/gitleaks](https://github.com/gitleaks/gitleaks))
+  added a `--max-decode-depth` flag in v8.20.0. When set to a non-zero
+  value, it recursively decodes segments of the input before running
+  regex detectors, supporting three encodings: hex, percent-encoding
+  (URL encoding), and base64. The purpose is to find secrets that have
+  been naively encoded before committing. This is functionally what
+  a content-tripwire needs to do, but hard-coded to Gitleaks' own
+  detector ruleset rather than user-supplied literals. The flag is off
+  by default.
+
+- **TruffleHog** ([github.com/trufflesecurity/trufflehog](https://github.com/trufflesecurity/trufflehog))
+  decodes four encoding types before running its detectors: UTF-8, UTF-16,
+  Base64, and Escaped Unicode (`\uXXXX` form). It also detects secrets
+  hidden in archived and compressed files. A Truffle Security blog post
+  from October 2024 documents this in detail
+  ([trufflesecurity.com/blog/secret-scanning-encoded-and-archived-data](https://trufflesecurity.com/blog/secret-scanning-encoded-and-archived-data)).
+  Same caveat as gitleaks: the decode step feeds the existing pattern
+  detectors, not a user-supplied literal search.
+
+- **detect-secrets** ([github.com/Yelp/detect-secrets](https://github.com/Yelp/detect-secrets))
+  and **ggshield** ([github.com/GitGuardian/ggshield](https://github.com/GitGuardian/ggshield))
+  do not appear to have multi-encoding decode steps; they operate on
+  the input text as-is.
+
+None of these tools expose a "match this literal in N encodings" API.
+The closest workflow would be to feed a custom gitleaks rule that matches
+pre-computed encoded variants, but that requires generating those variants
+externally (i.e., the exact gap this research note addresses).
+
+### Canary token services
+
+Canary token services operate on a fundamentally different detection model
+and should not be confused with matcher canaries.
+
+**Callback canaries** work by embedding a unique URL or resource reference
+in a document, credential file, or environment variable. When an agent
+(or attacker) reads and uses the credential, the canary service receives
+an HTTP callback. The detection signal is the *access of the resource*,
+not the presence of an encoded form in an outbound byte stream.
+
+- **Thinkst Canarytokens** ([canarytokens.org](https://canarytokens.org) /
+  [github.com/thinkst/canarytokens](https://github.com/thinkst/canarytokens))
+  offers AWS key canaries, Azure login canaries, PDF canaries, and
+  many others. All rely on callback detection. A Canarytokens bypass
+  issue ([github.com/thinkst/canarytokens/issues/36](https://github.com/thinkst/canarytokens/issues/36))
+  specifically documents that an attacker who extracts the canary value
+  and uses it without triggering the callback URL (e.g., by sending the
+  raw credential string to an external API over a non-canary channel) can
+  bypass the detection entirely. This is the exact gap that encoding-aware
+  content inspection would close.
+
+- **OpenCanary** ([github.com/thinkst/opencanary](https://github.com/thinkst/opencanary))
+  is Thinkst's self-hosted daemon that mimics network services (SSH,
+  FTP, Telnet, HTTP, SMB, etc.) and alerts when they are probed. It is
+  a network-layer honeypot, not an outbound content scanner. Detection
+  is interaction-based, not encoding-aware content matching.
+
+- **IndicatorOfCanary** by HackingLZ
+  ([github.com/HackingLZ/IndicatorOfCanary](https://github.com/HackingLZ/IndicatorOfCanary))
+  is conceptually the nearest to what is needed: it is a red-team tool
+  for *detecting the presence of canary tokens inside files* before using
+  those files, to avoid triggering callback alerts. It searches for known
+  canary IoCs (callback domain patterns) in file metadata and content.
+  It is the adversary-side mirror image — red team detecting canaries
+  before they can be tripped — but it shows the art of the possible for
+  encoding-aware document inspection.
+
+**The gap**: no canary service offers a "here is your secret; here are
+12 encoded forms of it; ingest these into your egress scanner" API.
+
+### Enterprise DLP
+
+Enterprise DLP products do perform encoding-aware content matching, but
+as an internal, closed-source capability:
+
+- **Symantec DLP (Broadcom)**
+  ([broadcom.com](https://www.broadcom.com/products/cybersecurity/information-protection/data-loss-prevention))
+  An official Broadcom knowledge base article explicitly states that
+  Symantec DLP is not able to inspect and alert on base64 and ROT13
+  encoded files in all inspection paths, citing processing overhead as
+  the reason
+  ([knowledge.broadcom.com/external/article/184415](https://knowledge.broadcom.com/external/article/184415/is-symantec-data-loss-prevention-dlp-abl.html)).
+  This is a documented limitation, not marketing copy.
+
+- **Microsoft Purview DLP**
+  ([learn.microsoft.com/en-us/purview/dlp-policy-reference](https://learn.microsoft.com/en-us/purview/dlp-policy-reference))
+  supports custom sensitive information types and trainable classifiers
+  but the encoding-awareness of its content matching engine is not
+  publicly documented at the rule-authoring level. No public API exists
+  for generating encoded variant patterns.
+
+- **Nightfall AI** ([nightfall.ai](https://www.nightfall.ai/))
+  uses deep-learning classifiers rather than regex, with 100+ AI-based
+  detectors. It offers a REST API that accepts arbitrary strings and files
+  and returns findings. Its encoding-awareness is model-dependent and
+  not configurable by the caller. No "user-supplied literal + encoding
+  sweep" mode is documented.
+
+- **Cyberhaven** ([cyberhaven.com](https://www.cyberhaven.com/))
+  is notable for its data-lineage approach: it tracks data transformations
+  (copy, compress, rename, convert) and ties exfiltration events to
+  original sensitive files even after transformation. This is a more
+  powerful model than pure byte matching but requires a full endpoint
+  agent and cloud backend. Not suitable for a container sidecar.
+
+The enterprise DLP space confirms that encoding-aware detection is a solved
+problem at enterprise scale, but the implementations are either closed-source
+SaaS products, require endpoint agents, or are not configurable with
+user-supplied literals.
+
+### Pentest / red-team encoding generators
+
+Several red-team tools generate many encodings of a payload, treating
+encoding as a *generation* problem rather than a detection problem. They
+are directly useful for producing the encoding catalog needed to build
+tripwire patterns.
+
+- **hURL** ([github.com/fnord0/hURL](https://github.com/fnord0/hURL))
+  is a command-line encoder/decoder supporting URL encoding, double URL
+  encoding, base64, HTML entities, ASCII-to-hex, integer-to-hex, ROT13,
+  and SHA family hashes. It is packaged in Kali Linux (`apt install hurl`).
+  It does not produce a "all encodings of this string" output in one
+  command — each encoding is a separate invocation flag — but the
+  encoding catalog it covers aligns well with the practical catalog above.
+
+- **CyberChef** ([gchq.github.io/CyberChef](https://gchq.github.io/CyberChef) /
+  [github.com/gchq/CyberChef](https://github.com/gchq/CyberChef))
+  is the GCHQ "cyber Swiss Army Knife," a browser-based tool with 400+
+  encoding/decoding/transformation operations. It can be scripted via
+  the `cyberchef-node` npm package
+  ([github.com/nicowillis/cyberchef-node](https://github.com/nicowillis/cyberchef-node))
+  to generate many encodings programmatically. The community recipe list
+  ([github.com/mattnotmax/cyberchef-recipes](https://github.com/mattnotmax/cyberchef-recipes))
+  is a good reference for the encoding chains real attackers use. CyberChef
+  is the best single reference for what an exhaustive encoding catalog
+  looks like in practice.
+
+- **Burp Suite Intruder** (PortSwigger, commercial with community edition)
+  has a payload processing rule chain in its Intruder module that can apply
+  sequences of encoding transformations (URL, HTML, base64, ASCII hex,
+  built-in strings) to a wordlist. Not scriptable outside Burp; primarily
+  useful for interactive enumeration during a pentest.
+
+- **wfuzz** ([github.com/xmendez/wfuzz](https://github.com/xmendez/wfuzz))
+  supports encoder plugins (base64, urlencode, md5, sha1, double-urlencode,
+  html, etc.) that can be chained with the `@` syntax in payload specs.
+  It is a brute-force fuzzer, not a pattern generator, but its encoder
+  catalog is a useful reference list.
+
+None of these tools emit "N regex patterns for detecting this secret in
+any of its encoded forms in an outbound stream." They are all generation
+tools for attacks, not detection tools for defense.
+
+---
+
+## YARA string modifiers — the closest existing answer
+
+YARA ([virustotal.github.io/yara-x](https://virustotal.github.io/yara-x) /
+[yara.readthedocs.io](https://yara.readthedocs.io/en/stable/writingrules.html))
+has the most complete existing treatment of "match this string in multiple
+encoding forms" via its text-string modifier system. This was designed for
+malware detection in binary files and network captures, but the same logic
+applies to outbound traffic inspection.
+
+### Available modifiers
+
+Four modifiers apply directly to the encoding problem:
+
+- **`ascii`** — match the string as raw ASCII/UTF-8 bytes. This is the default
+  when no modifiers are specified.
+- **`wide`** — match the string in UTF-16LE form (each ASCII byte interleaved
+  with a NUL byte). Designed for detecting strings in Windows PE binaries.
+- **`base64`** — generate all three base64 offset permutations of the string
+  and search for any of them. The three permutations arise because base64
+  encodes 3 bytes at a time; depending on where a string starts within the
+  3-byte boundary, its encoding shifts by 0, 1, or 2 base64 characters.
+  YARA computes all three at compile time and emits patterns for each,
+  so the rule author does not need to pre-compute them.
+- **`base64wide`** — same as `base64`, but applied to the UTF-16LE form of
+  the string. Covers the case where the secret was stored as a wide string
+  (UTF-16LE) before being base64-encoded.
+
+Modifiers can be combined on a single string declaration. A rule that
+covers all four of these forms simultaneously looks like:
+
+```yara
+rule secret_tripwire {
+  strings:
+    $s = "my-secret-value" ascii wide base64 base64wide
+  condition:
+    $s
+}
+```
+
+YARA will generate and search for (at minimum) seven patterns from
+this single declaration: raw UTF-8, raw UTF-16LE, and three base64
+variants of UTF-8, and three base64wide variants of UTF-16LE.
+
+A fifth modifier, **`xor`** (added in YARA 3.8), searches for single-byte XOR
+obfuscated variants of the string across all 255 non-zero keys. The `xor`
+modifier cannot be combined with `base64` or `base64wide` in a single string
+declaration (it causes a compiler error). To cover both XOR and base64, two
+separate string declarations are required.
+
+**Custom base64 alphabets:** The `base64` and `base64wide` modifiers accept
+an optional 64-character custom alphabet string. This covers URL-safe
+base64 (`-_` substituted for `+/`) and any custom alphabets.
+
+### Limitations of YARA for this use case
+
+- YARA does not natively cover hex encoding, percent encoding, JSON string
+  escaping, gzip+base64, ROT13, or the other entries in the encoding catalog
+  above. Those would require pre-computing the encoded forms externally and
+  writing them as explicit hex-pattern strings in the rule.
+- YARA operates on files or byte buffers passed by the calling application;
+  it does not natively hook network streams. Integration with a proxy or
+  a log-scanning pipeline requires an application layer to call
+  `libyara` or the `yara-python` bindings on each captured request body.
+- YARA's `base64` modifier has a documented minimum-length constraint: strings
+  shorter than three characters cannot be base64-matched reliably due to
+  the offset permutation math. This is unlikely to matter for real secrets
+  but worth noting.
+
+### DissectMalware/base64_substring
+
+The tool `base64_substring`
+([github.com/DissectMalware/base64_substring](https://github.com/DissectMalware/base64_substring))
+generates a YARA rule to find base64-encoded files containing a specific
+keyword, by enumerating all three offset permutations and emitting them as
+a YARA rule. This predates YARA's built-in `base64` modifier and is largely
+superseded by it, but the repository is useful as a reference for the
+permutation math.
+
+---
+
+## DIY sketch
+
+There is no off-the-shelf tool that takes a known secret and emits N patterns
+for outbound stream matching. The remaining question is how much work it would
+be to write one.
+
+A minimal `tripwire-encode` script in Python (~80–120 lines) would:
+
+1. Accept a secret string on stdin or as a CLI argument.
+2. Emit one encoded form per line (or a JSON object mapping encoding names
+   to encoded values) for encodings 1–20 from the catalog above.
+3. The encoding logic for each form is 1–4 lines of Python using the
+   standard library (`base64`, `codecs`, `urllib.parse`, `gzip`, `io`);
+   no third-party dependencies are required.
+4. For the YARA output mode, emit a `.yar` rule with one string declaration
+   per encoding (or use `ascii wide base64 base64wide` for the first four,
+   then add explicit hex-string patterns for the remaining forms).
+
+A companion `tripwire-grep` script (~30–50 lines) would:
+1. Accept the secret (or the pre-computed encoding list) and a stream on stdin.
+2. Compile the encodings into a single `re.search` call or `bytes.find` loop.
+3. Exit non-zero and print the matching line/offset if any form is found.
+
+For a proxy integration, the same encoding set can be compiled once at
+container startup and injected into a mitmproxy addon or a small filter
+script that wraps the proxy log output.
+
+**Rough size estimate:** the full encoding catalog (25 entries) in Python
+requires roughly 80 lines of encoding logic, 30 lines of CLI scaffolding,
+and 40 lines of grep/match logic. Total under 200 lines.
+
+**Important caveat on ROT13 and leetspeak.** These encodings have
+non-trivial false-positive rates for short secrets and for secrets
+containing common English substrings. If the secret contains only
+alphanumerics, ROT13 and ROT47 patterns should be included but flagged
+at warning level rather than error level in the output.
+
+**The honest limitation of this entire approach** is that it is effective
+only against naive, deterministic encodings where a single fixed plaintext
+always produces the same fixed ciphertext. Any exfiltration method that
+involves:
+- A secret key (AES, ChaCha, XOR with a non-trivial key),
+- Compression with a novel dictionary,
+- Splitting the secret across multiple messages or packets,
+- Steganography (hiding data in image pixels, audio samples, whitespace),
+
+...will produce output with no detectable relationship to the input plaintext.
+For claude-bottle's threat model — an untrusted but not adversarially
+sophisticated agent that might accidentally or naively encode secrets before
+logging or posting them — naive-encoding detection covers the realistic concern.
+A deliberately adversarial agent with knowledge of its own monitoring is a
+different threat model entirely.
+
+---
+
+## Integration point in claude-bottle
+
+The natural choke point for encoding-aware content inspection in claude-bottle
+is the body of outbound HTTP/HTTPS requests, visible at the egress proxy layer.
+The egress-guard note documents smokescreen (Stripe,
+[github.com/stripe/smokescreen](https://github.com/stripe/smokescreen))
+and squid as sidecar proxy options. Both can be configured to pass request
+bodies through a content-inspection hook. Smokescreen has an experimental
+MITM mode with detailed HTTP logging; squid supports `icap_service` ICAP
+integration for content inspection. Either could be extended with a small
+filter script that runs the encoded-form search against each captured body.
+
+**SSH is opaque to a proxy.** SSH connections carry encrypted payloads;
+the proxy cannot inspect their contents. Egress SSH (e.g., git push to a
+permitted remote) is outside the reach of this defense unless the container
+is configured to log git command arguments before they execute — which would
+catch a literal secret passed on the command line but not a secret embedded
+in a commit object or passed via an environment variable to a subprocess.
+For v1, this defense should be scoped explicitly to HTTP/HTTPS request bodies
+and to logged shell command output, and SSH egress should be called out as
+a gap.
+
+---
+
+## Limitations and bypass surface
+
+This defense catches secrets in transit only when the encoding is deterministic
+and pre-computable. The following techniques defeat it entirely:
+
+- **Symmetric encryption with a key** (AES-CBC, ChaCha20, XOR with a 16+
+  byte key). The ciphertext has no statistical relationship to the plaintext.
+  Detection requires key recovery, which is out of scope for a content scanner.
+
+- **Splitting across messages or packets.** If a secret is split as
+  `SECRET[0:4]` in one request and `SECRET[4:]` in another, no single-
+  message content scanner will fire. Stateful stream reassembly would be
+  required, which is a much harder problem.
+
+- **Compression with a novel or pre-shared dictionary** (LZ4, Brotli,
+  zstd). Standard gzip/zlib produce recognizable magic-byte prefixes in
+  base64, but a custom compression configuration can produce output that
+  is entropy-high and format-unrecognizable.
+
+- **Steganography.** Encoding data into image LSBs, audio samples,
+  whitespace-as-binary in a text file, or Unicode homoglyphs. The output
+  looks like a legitimate artifact; no regex will find it.
+
+- **Hash-chained exfiltration.** Sending the SHA-256 of secret fragments
+  in sequence; a receiver who knows the secret can verify, but no string
+  scanner can reconstruct the original.
+
+- **Protocol-level encoding not visible at the HTTP body layer.** DNS
+  exfiltration (encoding data in query subdomains), ICMP covert channels,
+  or TCP sequence number steganography all operate below HTTP and are not
+  intercepted by an HTTP proxy at all. The egress-guard note covers DNS
+  separately via a controlled resolver.
+
+In the claude-bottle context, the primary realistic concern is an agent
+that naively embeds a secret in a log line, a curl argument, a JSON body,
+or a shell heredoc without specifically intending to obfuscate. All of the
+above bypass techniques require deliberate, adversarially-motivated engineering
+from a process that is assumed to be untrusted but not intentionally malicious.
+The encoding-aware tripwire is an appropriate control for this threat model,
+provided its scope is communicated clearly.