feat(egress-proxy): retarget remediation flow (PRD 0017 chunk 3) #30
Reference in New Issue
Block a user
Delete Branch "egress-proxy-block-remediation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Final chunk of PRD 0017. The
cred-proxy-blockMCP tool is renamed and its remediation apply path retargeted at egress-proxy. Includes a follow-up commit that reinstates a single-valuerolemarker on egress-proxy routes so the agent's placeholder OAuth env trigger isn't locked to a specifictoken_refstring.Net +174 LOC, 364 unit + integration tests pass.
What changes for users
egress-proxy-block(wascred-proxy-block). The tool description points at/etc/claude-bottle/current-config/routes.yamlas the current state to compose against; the agent passes the full newroutes.yamlcontent as JSON.egress-proxycomponent (wascred-proxy).routes edit <bottle>in the dashboard now writes a.yamlextension and discovers running egress-proxy sidecars.CLAUDE_CODE_OAUTH_TOKENplaceholder set (so claude-code starts) declarerole: claude_code_oauthon the route that injects the Anthropic OAuth header. Host env-var name is the operator's choice — the role tag is what triggers the placeholder + telemetry-off envs.Apply flow
The addon's request-hook SIGHUP handler (chunk 1) swaps the route table atomically without dropping in-flight connections.
Role field
Reinstating a minimal role marker on
EgressProxyRoute:EGRESS_PROXY_ROLES = frozenset({"claude_code_oauth"})— one marker for now; the field is back so the role enum can grow.EGRESS_PROXY_SINGLETON_ROLES—claude_code_oauthis a singleton (only one route per bottle can carry it).Role: tuple[str, ...]field onEgressProxyRoute(manifest + runtime), parsed as string or list-of-strings; unknown roles are rejected so typos can't become silent no-ops.prepare.py:has_anthropic_authchecks for"claude_code_oauth" in r.rolesinstead of matching a literal token_ref string. Bottles name their host OAuth env var anything (e.g.CLAUDE_BOTTLE_OAUTH_TOKEN); the role marker is what flips onCLAUDE_CODE_OAUTH_TOKEN=<placeholder>and the telemetry-off env vars on the agent.Code-level
supervise.py—TOOL_CRED_PROXY_BLOCK→TOOL_EGRESS_PROXY_BLOCK;COMPONENT_FOR_TOOLrewired.supervise_server.py— tool definition + description rewritten for egress-proxy semantics.validate_proposed_filedispatches on the new tool ID.backend/docker/egress_proxy_apply.py— renamed fromcred_proxy_apply.py. Validation goes throughegress_proxy_addon_core.load_routesso both sides agree on shape (catches partial auth pairs etc. before SIGHUP).cli/dashboard.py— wires the new apply +discover_egress_proxy_slugs; operator-edit flow writes.yaml. Removed stale follow-up comment about path-aware filtering (PRD 0017 settled it).manifest.py—EGRESS_PROXY_ROLESconstant,Rolefield onEgressProxyRoute, singleton-role validation.egress_proxy.py—rolespropagated onto the runtimeEgressProxyRoute.backend/docker/prepare.py— anthropic placeholder detection switched from token_ref string match to"claude_code_oauth" in r.roles.tests/integration/test_supervise_sidecar.py— restores the round-trip approval test that chunk 2 had switched to reject. Stubsapply_routes_changeso the test focuses on supervise plumbing, not docker-exec.tests/unit/test_egress_proxy_apply.py— rewritten validator tests; covers JSON shape, missing keys, partial auth.tests/unit/test_manifest_egress_proxy.py— 7 new role-validation tests.PRD annotations
docs/prds/0010-cred-proxy.md— Status: Superseded by PRD 0017. Historical text preserved; header callout points at the migration section.docs/prds/0014-cred-proxy-block-remediation.md— Status: Retargeted by PRD 0017. Same callout explaining the tool rename + apply-path move + audit-component change.Validated locally
python3 -m unittest discover -s tests -t .→ 364 unit + integration pass (1 environment-dependent skip).Finishes PRD 0017. The `cred-proxy-block` MCP tool is renamed and its remediation apply path is repointed at egress-proxy. - `claude_bottle/supervise.py` — `TOOL_CRED_PROXY_BLOCK` → `TOOL_EGRESS_PROXY_BLOCK`; `COMPONENT_FOR_TOOL` maps the new tool ID to `egress-proxy` for audit-log routing. - `claude_bottle/supervise_server.py` — tool definition renamed + description rewritten: "Call when egress-proxy refused your HTTPS request ... Read the current routes.yaml from /etc/ claude-bottle/current-config/routes.yaml, compose a modified version, pass the full new file plus a justification." The syntactic validator dispatches on the new tool ID. - `claude_bottle/backend/docker/egress_proxy_apply.py` — renamed from `cred_proxy_apply.py`. Reads routes.yaml from /etc/egress-proxy/routes.yaml via `docker exec cat`; validates via `egress_proxy_addon_core.load_routes` (so both sides use the same parser); writes via `docker cp`; SIGHUPs egress-proxy with `docker kill --signal HUP`. `EgressProxyApplyError` replaces `CredProxyApplyError`. - `claude_bottle/cli/dashboard.py` — wires the new apply + `discover_egress_proxy_slugs` helper; the operator-initiated `routes edit <bottle>` verb now writes to egress-proxy with `.yaml` suffix. Stale follow-up comment about path-aware filtering removed — PRD 0017 settled that question. - `tests/integration/test_supervise_sidecar.py` — restores the approval round-trip test (chunk 2 had switched it to a reject path because no cred-proxy existed). Approval stubs `apply_routes_change` so the test focuses on the supervise queue/response plumbing rather than docker-exec into a real egress-proxy sidecar (that's covered separately). - `tests/unit/test_egress_proxy_apply.py` — rewritten against the new validator; covers JSON shape, missing routes key, partial-auth-pair rejection (the addon-core parser catches these before SIGHUP). - PRDs 0010 + 0014 — status headers updated to Superseded / Retargeted with a callout block pointing at PRD 0017's migration section. Historical text preserved. 384 unit + integration tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>The chunk 2 detection keyed on `token_ref == "CLAUDE_CODE_OAUTH_TOKEN"`, which broke any bottle whose host env var has a different name (e.g. `CLAUDE_BOTTLE_OAUTH_TOKEN`). The token_ref is the user's choice — the placeholder-env trigger shouldn't be locked to one specific string. Restoring a minimal `role` marker on `EgressProxyRoute`: - `EGRESS_PROXY_ROLES = frozenset({"claude_code_oauth"})` — one marker for now; the field is back so we can grow it. - `EGRESS_PROXY_SINGLETON_ROLES` — claude_code_oauth is a singleton (only one route per bottle can carry it). - `Role: tuple[str, ...]` field on `EgressProxyRoute` (manifest + runtime), parsed as string or list-of-strings; unknown roles are rejected so typos can't become silent no-ops. `prepare.py:has_anthropic_auth` now checks for `"claude_code_oauth" in r.roles` instead of matching a literal token_ref string. Bottles can name their host OAuth env var anything; the role marker is what flips on `CLAUDE_CODE_OAUTH_TOKEN=<placeholder>` and the telemetry-off env vars on the agent. Test coverage: 7 new manifest tests (omitted / string / list / unknown role rejected / non-string rejected / list-item non-string rejected / singleton enforced). 364 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>mitmdump crashed at boot with PermissionError on ~/.mitmproxy/mitmproxy-ca.pem. Cause: `docker cp` preserves the host file's mode AND uid. The CA files were 0600 owned by the host user (uid 501 on macOS), so inside the container the mitmproxy user (uid 1000, set by USER directive in Dockerfile) couldn't read them. Fix: - `egress_proxy_tls_init`: chmod 644 the cert-only + the cert+key concat on the host stage dir. - `DockerEgressProxy.start`: chmod 644 routes.yaml and the pipelock CA before `docker cp` into the egress-proxy container (pipelock itself runs as root so its in-pipelock copy is unaffected). The host stage_dir is mode 700 — other host users still can't traverse in, so the cert+key concat isn't actually exposed despite the 644 mode. The container side gets world-readable, which is fine inside the per-bottle container. Reproduces against today's main: bottle's egress-proxy sidecar crashes with PermissionError; after this patch mitmdump boots and listens on :9099. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>`--set ssl_verify_upstream_trusted_ca` REPLACES mitmproxy's default trust store with the file we point it at. The earlier wiring pointed it at just pipelock's CA, which broke for any host pipelock passes through (api.anthropic.com is in DEFAULT_TLS_PASSTHROUGH): pipelock CONNECT-tunnels the handshake to the real upstream, egress-proxy sees the real public cert (signed by e.g. DigiCert), and refuses to validate because pipelock's CA doesn't sign it. Fix in Dockerfile entrypoint: when EGRESS_PROXY_UPSTREAM_CA is set, concatenate /etc/ssl/certs/ca-certificates.crt + the pipelock CA into /home/mitmproxy/.mitmproxy/combined-trust.pem, and pass that as ssl_verify_upstream_trusted_ca. Covers both legs: - pipelock-MITM'd hosts → leaf cert signed by pipelock CA → validates against the pipelock half of the bundle. - pipelock-passthrough hosts (api.anthropic.com et al.) → real upstream cert → validates against the system half. Standalone runs of the image (no EGRESS_PROXY_UPSTREAM_CA) skip the concat and use mitmproxy's default trust store. Reproduces against today's main: agent gets "Unable to connect to API: SSL certificate verification failed" on api.anthropic.com, egress-proxy logs "Server TLS handshake failed. Certificate verify failed: unable to get local issuer certificate". After this patch the trust bundle includes the real upstream root + pipelock's CA and both validation paths succeed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>CVE-2016-5388 ("httpoxy") mitigation: libcurl ignores uppercase HTTP_PROXY for http:// URLs to prevent untrusted CGI HTTP_* headers from hijacking the proxy. Only lowercase http_proxy is honored for HTTP. Without the lowercase var, plain-HTTP requests from the agent skip egress-proxy entirely — they go direct, which is "network unreachable" on the agent's --internal bridge, not the egress-proxy 403 we expect. Confirmed against a live bottle: `curl http://1.1.1.1/` reported "Immediate connect fail for 1.1.1.1: Network is unreachable" instead of the addon's "host not in allowlist" 403. With both cases set the agent's curl honors the proxy and our allowlist enforcement kicks in. Also set lowercase HTTPS_PROXY + NO_PROXY for symmetry. Some tools check one case only; sending both means we don't have to audit which convention each tool uses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>apply_routes_change wrote the proposed routes via `tempfile.mkstemp` (default mode 0600) then `docker cp`'d into the egress-proxy container. docker cp preserves mode + host uid, so the file landed inside the container as 0600 owned by the host user's uid — which is not the mitmproxy user (uid 1000) the addon runs as. The SIGHUP-triggered reload then failed with PermissionError on the re-read, the old routes table stayed in memory, and the operator-approved route never took effect. Symptoms reported: - Operator approves egress-proxy-block proposal that adds google.com to routes. - Agent retries `curl https://google.com` and still gets 403 "egress-proxy: host 'google.com' is not in the bottle's egress_proxy.routes allowlist." - `docker exec <egress-proxy> cat /etc/egress-proxy/routes.yaml` returns "Permission denied" (mitmproxy user can't read it, so the reload couldn't either). Fix: chmod 0644 on the host tmp file before docker cp. Mirrors the same pattern in DockerEgressProxy.start which already chmods the original routes.yaml + the CAs before cp. The proposed routes content carries no secrets (tokens live in the egress-proxy container's environ, not the routes file), so 0644 in /tmp for the brief window between write and cp is safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>When the operator approves an egress-proxy-block proposal that adds a host to egress-proxy's routes, the request would still 403 downstream at pipelock — pipelock's hostname allowlist is set at bottle launch and doesn't learn about routes added later. The agent saw "Approved" but the very next retry still failed. Fix: `apply_routes_change` now mirrors every host in the proposed routes onto pipelock's allowlist before flipping egress-proxy. Order matters — pipelock first so a pipelock failure doesn't leave egress-proxy in a half-state: 1. Validate the new routes content. 2. Extract the hosts. 3. Merge them onto pipelock's current allowlist (`apply_allowlist_change` — restarts pipelock with the merged yaml). No-op when every host is already present. 4. docker cp the new routes.yaml into egress-proxy + SIGHUP. If pipelock's restart fails, egress-proxy is untouched and the operator gets a clear error pointing at the pipelock half-state. If egress-proxy's update fails after pipelock succeeded, pipelock just has the host pre-allowlisted — harmless extra-permissive until the operator retries. Adds `_hosts_in_routes` helper using the addon's own parser (so the mirrored host set matches exactly what the addon will match on). 4 new unit tests; 368 total pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Reshape the allowlist topology so the egress-proxy is the bottle's single allowlist surface, and replace the agent-side routes/allowlist file mounts with a live MCP tool. Policy change (move defaults to egress-proxy): - `egress_proxy_routes_for_bottle(bottle)` now folds in DEFAULT_ALLOWLIST (the claude-code defaults) and `bottle.egress.allowlist` (user adds) as bare-pass routes (no auth, no path filter), on top of the bottle's `egress_proxy.routes`. Manifest routes win on host collision. - `pipelock_effective_allowlist(bottle)` mirrors egress-proxy's effective host set when egress-proxy is in use. Pipelock is no longer the bottle's primary allowlist authority; it enforces a downstream copy as defense-in-depth + does DLP body scanning. - Split out `egress_proxy_manifest_routes(bottle)` for callers that want just the manifest entries (tests, internal use). - DEFAULT_ALLOWLIST moves from `pipelock.py` to `egress_proxy.py` (pipelock re-imports for the no-egress-proxy fallback path). - Dropped the `egress-proxy` auto-allow on pipelock's allowlist — the agent never dials egress-proxy via the proxy mechanism; pipelock only sees upstream hostnames from egress-proxy's CONNECTs. Introspection endpoint (existing mitmproxy feature): - Egress-proxy addon recognises requests to the magic host `_egress-proxy.local` and synthesizes responses via `flow.response = http.Response.make(...)` — no upstream connection, no allowlist enforcement on the magic host. - `GET /allowlist` returns the in-memory route table as JSON (host + path_allowlist + auth_scheme + token_env per route; no token VALUES). - Smoke-tested end-to-end against a real egress-proxy container. MCP tool (existing supervise plumbing): - New `list-egress-proxy-routes` tool (no inputs, no operator approval). Handler fetches via egress-proxy's introspection endpoint using urllib's ProxyHandler against `EGRESS_PROXY_FORWARD_PROXY`. Returns the JSON payload as the tool's text content; `isError: true` if the proxy is unreachable. - `egress-proxy-block` description now points the agent at `list-egress-proxy-routes` instead of a staged file path. - `pipelock-block` description acknowledges the mirror — agents should prefer `egress-proxy-block` to add hosts; pipelock-block stays for the rare divergence case. Drop agent-side file mounts: - Supervise's `current-config` dir staging no longer writes routes.yaml / allowlist. Only `Dockerfile` remains (capability-block still reads it from `/etc/claude-bottle/current-config/Dockerfile`). - `prepare.py` stops passing `routes_content` / `allowlist_content` to `supervise.prepare`. - `Supervise.prepare` signature simplified to one `dockerfile_content` kwarg. Tests: 400 unit + integration pass. Added coverage for defaults-folding (`TestRoutesForBottleFoldsDefaults`), the new tool definition + handler, and the updated supervise.prepare shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Instead of asking the agent to compose and submit a full routes file, the tool now takes ONE proposed route — host + optional path_allowlist + optional auth — and the supervisor merges it into the live routes table at approval time. The agent no longer needs to fetch / reproduce / extend the existing allowlist; it just describes the host it wants reachable. Tool input (new): - `host` (required) - `path_allowlist` (optional, array of absolute path prefixes) - `auth` (optional, {scheme, token_ref}) - `justification` (required) Merge semantics (in `egress_proxy_apply._merge_single_route`): - Host NOT in current routes → append the proposed route as a new entry. If `auth` is set, assign the next EGRESS_PROXY_TOKEN_N slot. - Host already present → union the proposed `path_allowlist` with the existing one (proposed entries appended after existing, deduped). Existing `auth_scheme` / `token_env` preserved; proposed `auth` ignored (operator-controlled, not agent-controlled). - Hostname comparison is case-insensitive. Dashboard wiring: `approve()` on an egress-proxy-block proposal now calls `add_route(slug, proposed_route_json)` instead of `apply_routes_change(slug, full_file)`. add_route fetches the current routes from the running egress-proxy, merges, and calls apply_routes_change with the merged content — so the pipelock-mirror + SIGHUP plumbing from chunk 3 still runs end-to-end. Audit diff still captures the full-file before/after. Tool description rewritten to make the new shape obvious and to stop pointing the agent at the routes file. The `list-egress-proxy-routes` tool stays available for agents that want to see what's currently allowed. Tests: 9 new `_merge_single_route` cases (host absent/present, path-allowlist union+dedup, auth-slot indexing, case-insensitive match, existing-auth preservation, missing-host rejection, malformed-current rejection). 407 unit + integration pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>`_mirror_hosts_to_pipelock` runs BEFORE the egress-proxy write in `apply_routes_change` — if it raises, egress-proxy is left intact. The error message claimed the opposite ("egress-proxy routes updated but pipelock allowlist mirror failed"), pointing the operator at the wrong half-state. Reword to make the actual state clear: pipelock failed, egress-proxy NOT updated, fix pipelock manually with `pipelock edit <bottle>` then retry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Previous fix stripped wildcard hosts entirely from the pipelock mirror; the operator wanted the suffix kept so pipelock pins the base hostname. Now `*.example.com` becomes `example.com` in the mirror — egress-proxy keeps the wildcard for its own host match, pipelock allows the suffix. Behavior change: - `*.example.com` → `example.com` (was: dropped) - `*.foo.bar.com` → `foo.bar.com` (one `*.` strip, not recursive) - `*` → dropped (normalises to empty) - `example.com` → `example.com` (unchanged) - `[::1]`, etc. → dropped (still off pipelock's charset after any prefix strip) Adds explicit de-dup so `*.example.com` + `example.com` collapse to one entry. Existing wildcard-strip test reshaped + 3 new edge-case tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>PRD 0017 v1 deliberately punted wildcards ("Exact match in v1 — globs / wildcards are a follow-up"). Now that the supervise mirror strips `*.` to its suffix for pipelock, the addon needs to actually match wildcard hosts on its side or the route is dead weight. Addon `match_route` now does two passes: 1. Exact (case-insensitive) literal match on the hostname. 2. Wildcard suffix match: a route whose host starts with `*.` matches any request host that ends with `.<suffix>`. So `*.example.com` matches `foo.example.com` and `a.b.example.com`, but NOT the apex `example.com` and not `barexample.com` (the leading `.` of the suffix is required). Exact wins — operators can layer a specific route (e.g. `api.github.com` with auth) on top of a broader wildcard (e.g. `*.github.com` bare-pass). 8 new unit tests: direct subdomain match, nested subdomain match, apex rejection, overlapping-suffix rejection, case-insensitive, exact-wins-over-wildcard (both route orders), no-match fall-through. 395 unit + integration pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>`*.example.com` now matches `example.com` itself in addition to every subdomain. RFC 6125 TLS-wildcard semantics excluded the apex; an allowlist's natural reading of `*.example.com` is "all of example.com" — and the pipelock mirror already strips `*.example.com` to `example.com`, so without the apex match the two layers disagreed (pipelock allowed the apex, egress-proxy blocked it). Behavior: - `*.example.com` matches `example.com` (apex) - `*.example.com` matches `foo.example.com` (subdomain) - `*.example.com` matches `a.b.example.com` (nested) - `*.example.com` does NOT match `barexample.com` (label boundary required) Test renamed: `test_wildcard_does_not_match_apex` → `test_wildcard_matches_apex`. 395 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>