- _alnum_projection(): strip non-alphanumeric chars for separator-injection detection
- scan_known_secrets() gains two extra passes per secret after exact-variant matching:
alnum-projection exact match (catches hyphens/spaces between secret chars) and a
sliding-window partial-match scan (catches chunked substrings ≥ PARTIAL_MATCH_MIN_LEN)
- scan_known_secrets() accepts sensitive_prefixes param (default ("EGRESS_TOKEN_",))
so redact_tokens and call-sites can extend the scanned env-var prefix set
- scan_entropy() warn-only detector flagging windows with Shannon entropy ≥ 5.5 bits/char
- "entropy" added to OUTBOUND_DETECTOR_NAMES; scan_outbound opts it in only when
explicitly listed in dlp.outbound_detectors (never part of the default "all" set)
- scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ to extend
scan_known_secrets beyond EGRESS_TOKEN_* without schema changes
- Binary bodies decoded via latin-1 fallback (bijective byte↔codepoint) instead
of utf-8 errors=replace, preserving ASCII secret strings in binary payloads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A 403 "egress DLP: URL-encoded CRLF (%0d%0a)" was firing on legitimate
requests (e.g. the Claude Code login flow) and bypassing the on-match
policy entirely, because CRLF blocks carry no matched value and were
routed straight to a hard 403.
Root cause: CRLF injection is only an attack in the request line and
headers. An HTTP body is delimited by Content-Length, so CRLF bytes in
the body cannot split the request — but the scan flattened the body into
the same blob it checked, so form-encoded / multi-line body content
(which legitimately contains %0d%0a) tripped it.
Fix:
- scan_outbound takes a crlf_text param; the addon scans CRLF only over
the body-excluded request line + headers. crlf_text=None keeps the
old full-blob behavior for host-side callers/tests; the websocket path
passes "" since a data frame is not a request line.
- The redact policy now also scrubs CRLF (new strip_crlf helper) from the
path and headers, so redact is a complete escape hatch and structural
CRLF in the URL/headers can be forwarded when a route opts into it.
Tests: strip_crlf unit tests; scan_outbound crlf_text body-exclusion and
backward-compat tests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
Give each egress route a policy for what the proxy does when an outbound
DLP detector matches a token, defaulting to the supervise flow added in
the previous commit. The goal is cutting false-positive friction without
weakening default-deny.
- redact: scrub the matched value(s) from the body, non-host headers, and
path/query via redact_tokens, then re-scan. Forward if clean; fail
closed with a 403 if a match remains on a surface redaction can't
rewrite (the hostname, or a unicode-evasion token). For routes where a
token-shaped value is noise the upstream doesn't need.
- block: the original hard 403, never overridable.
- supervise (default, unset): hold the request for operator approval.
Structural blocks (CRLF, no safelist-able value) stay hard 403s under
every policy.
Threads outbound_on_match from the bottle manifest (manifest_egress)
through the resolved EgressRoute and rendered routes.yaml (egress.py) to
the addon's Route (egress_addon_core), and round-trips it via the
list-egress-routes introspection endpoint. The allow/egress-block tool
descriptions document the new key.
Tests: manifest parse/validation, core parse/validation, full
manifest->render->addon round-trip for redact. README + PRD 0062 updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
When the outbound DLP catches a token, route the block through the
existing supervisor approval queue instead of returning 403 outright.
The egress proxy holds the request open until the operator answers, then
remembers an approved value for the life of the proxy so the request --
and later ones carrying it -- flow through. Fails closed on rejection,
timeout, malformed response, or when supervise is disabled.
- ScanResult.matched carries the raw matched substring (sidecar-only;
never logged or written to the proposal). scan_outbound and the token
detectors take a safe_tokens set and skip approved values, continuing
past a safelisted match so a second secret in the same request is
still caught.
- New egress-token-allow proposal tool, written directly to the queue by
the addon (the gitleaks-allow pattern from PRD 0061). build_token_allow
_payload renders host/method/path/detector reason + redacted context.
- Async request hook polls the queue without stalling the proxy event
loop; EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS (default 300) bounds the wait.
- Supervisor TUI renders egress-token-allow like gitleaks-allow: report
only, modify unavailable, approval requires a recorded reason.
- Unit tests for the matched/safe-tokens plumbing, payload builder, tool
constant round-trip, and TUI paths; README + PRD 0062.
Closes#261.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HnvBjPZC5V7qeQpFbQdDmS
- Set http_proxy/https_proxy (lowercase) alongside uppercase variants in smolmachines guest env for tools that only check lowercase
- Replace dataclasses.asdict with route_to_yaml_dict in /allowlist introspection so returned routes use YAML-schema-compatible keys
- Expand routes_yaml tool description in supervise_server to document all accepted route keys, making the round-trip from list-egress-routes to propose/apply explicit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each DLP block/warn now reports where the match was found (body,
authorization header, response body) and includes a context snippet:
SNIPPET_CONTEXT chars before and after the match, with the matched
value replaced by REDACT ("********").
scan_token_patterns/scan_known_secrets/scan_naive_injection all gain
`location` and `context` fields on their ScanResult returns. The
outbound scanner takes `auth_header` as a separate kwarg so the two
locations are scanned and reported independently.
redact_tokens() is added to dlp_detectors and used in egress_addon.py
to scrub token patterns and provisioned secrets from host/path fields
before they appear in any log output (level 1 and 2).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Level 0 (off, default): no stderr output beyond boot line.
Level 1 (blocks): each block/warn emitted as JSON with reason and
request context (host, method, path, response_status for inbound).
Level 2 (full): level-1 events + egress_request and egress_response
JSON lines for every forwarded connection.
Block logging at level 1+ replaces the previous plain-text stderr write.
DLP warn logging is also gated on level 1+. All block call sites now pass
_req_ctx(flow) so the blocked request is visible in the log entry.
Boot message shows log level label (off/blocks/full).
Adds PRD 0053 documenting wire format, manifest format, and all log event
shapes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a top-level `log: true` option to the egress config that logs the
full request (method, path, headers, body) and response (status, headers,
body) for every forwarded connection as JSON lines on stderr.
Wire format: `log: true` at the root of routes.yaml, parsed into the new
`Config` dataclass alongside `routes`. The sidecar addon switches from
`self.routes` to `self.config` and writes `_log_request` / `_log_response`
JSON lines when `self.config.log` is set.
Manifest: `egress.log: true` in bottle YAML flows through `EgressConfig.Log`
→ `Egress.prepare()` → `egress_render_routes(..., log=)` → routes.yaml.
`EgressPlan` also carries the flag for introspection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip pipelock from all unit and integration test fixtures:
proxy_plan fields removed from DockerBottlePlan/SmolmachinesBottlePlan
constructors; pipelock-specific test classes deleted or renamed
- Update test_sidecar_init: remove test_pipelock_loses_egress_tokens,
rename "pipelock" daemon fixtures to "git-gate" throughout
- Remove test_pipelock_binary_present_and_versioned from integration test
- Remove test_pipelock_answers_on_bundle_ip from smolmachines launch test
- Update _SANDBOX_BLOCK_MARKERS: remove "pipelock" marker (egress blocks)
- Dockerfile.sidecars: remove pipelock build stage and COPY; update layout
comments and port table
- egress_entrypoint.sh: update comments now that egress is sole proxy
- Clean up pipelock references in comments/docstrings across backend,
network, manifest, supervise, git_gate, yaml_subset, agent_provider,
sidecar_bundle, sidecar_init, egress_addon_core modules
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove 35+ unused imports across 20+ files (W0611). Wrap 19 lines
to fit under 100 character limit (C0301). Add type casts and
annotations in egress_addon_core.py to resolve pyright errors
caused by JSON parsing of untyped objects.
Key changes:
- Remove unused imports (abstractmethod, mock utilities, etc)
- Split long lines at logical breaks (method calls, error messages)
- Add typing.cast() for proper type inference in JSON parsing
- Explicit type annotations for dict/list accesses
Results:
- Pylint rating: 8.73/10
- egress_addon_core.py: 0 pyright errors (was 15)
- All W0611 and C0301 issues fixed
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>