feat(egress-proxy): cutover from cred-proxy (PRD 0017 chunk 2)

Hard cutover. cred-proxy is deleted; egress-proxy is now the agent's HTTP_PROXY (when routes are declared) with pipelock on its outbound leg. Two per-bottle CAs are minted: egress-proxy's (agent trust store) and pipelock's (egress-proxy's outbound trust store). Manifest: - `bottle.cred_proxy` → hard error with a migration recipe. - `bottle.egress_proxy` is the new shape (PRD 0017 chunk 1). - CredProxy* types + role validators removed. Wiring: - launch.py: `egress_proxy_tls_init` mints the egress-proxy CA (cert+key concat for mitmproxy + cert-only for agent trust); `DockerEgressProxy.start` docker-cps both CAs in, sets `HTTPS_PROXY=pipelock` + `EGRESS_PROXY_UPSTREAM_CA` so mitmdump trusts pipelock's MITM. Agent's HTTP_PROXY points at egress-proxy when routes exist, else falls back to pipelock (no-routes bottles unchanged). - prepare.py / backend.py: `cred_proxy` arg → `egress_proxy`; sidecar-orphan probe + plan field + dashboard view all renamed. - provision_ca: selects the egress-proxy CA when present, else pipelock's (filename renamed to claude-bottle-mitm-ca.crt). - bottle.provision: cred-proxy dotfile rewrites (~/.npmrc, ~/.gitconfig insteadOf, tea config) are gone — HTTP_PROXY catches everything respecting it. Pipelock helpers: - `pipelock_token_hosts` → `pipelock_route_hosts` (now reading egress_proxy.routes). - cred-proxy hostname auto-allow → egress-proxy hostname auto-allow. - Anthropic seed-phrase workaround now triggers when an egress_proxy route targets api.anthropic.com (was based on the cred-proxy `anthropic-base-url` role). Dockerfile.egress-proxy: - Entrypoint conditionally passes `--set ssl_verify_upstream_trusted_ca=$EGRESS_PROXY_UPSTREAM_CA` (via the `${VAR:+...}` shell expansion) so standalone runs without a mounted pipelock CA still boot. - mkdirs `/home/mitmproxy/.mitmproxy` ahead of `docker cp`. Deleted: claude_bottle/{cred_proxy,cred_proxy_server}.py, backend/docker/{cred_proxy,provision/cred_proxy}.py, Dockerfile.cred-proxy, plus the corresponding unit + integration tests. backend/docker/cred_proxy_apply.py stays as a stub for chunk 3 to rewrite (its container-name + routes-path constants are inlined so it survives without the deleted module). Test changes: - test_pipelock_allowlist rewritten against egress-proxy routes + the new `pipelock_route_hosts`. - test_manifest_md_load + test_pipelock_yaml + test_yaml_subset fixtures migrated to the `egress_proxy: { routes: [...] }` shape. - test_supervise_sidecar's round-trip test switched from `dashboard.approve` to `dashboard.reject`: the approval-apply path on cred-proxy-block proposals hits a deleted sidecar in chunk 2's transitional state. Chunk 3 restores the approval test once the remediation flow is retargeted at egress-proxy. 376 tests pass (was 427; net delta is removed cred-proxy tests). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:30:39 -04:00
parent 9e41845a2b
commit 70f773ac61
30 changed files with 573 additions and 3451 deletions
@@ -3,9 +3,14 @@
 Pipelock (https://github.com/luckyPipewrench/pipelock) is an HTTP
 forward proxy with hostname allowlisting + DLP scanning + URL-entropy
 checks. One sidecar per agent, attached to the agent's --internal
-network and a per-agent user-defined egress bridge. Combined with
-HTTPS_PROXY/HTTP_PROXY pointing at the sidecar's service name, pipelock
-is the only egress route the agent has.
+network and a per-agent user-defined egress bridge.
+
+Post-PRD-0017 topology: the agent's HTTP_PROXY points at egress-proxy
+(not pipelock); egress-proxy sets `HTTPS_PROXY=pipelock` on its
+outbound leg. So pipelock no longer sees the agent's connections
+directly — it sees the egress-proxy → upstream leg, applies the
+hostname allowlist + DLP body scan there, and forwards to the real
+upstream.

 Image pin: ghcr.io/luckypipewrench/pipelock@sha256:<digest> for tag 2.3.0.
 """
@@ -17,7 +22,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import cast

-from .cred_proxy import CRED_PROXY_HOSTNAME
+from .egress_proxy import EGRESS_PROXY_HOSTNAME
 from .supervise import SUPERVISE_HOSTNAME
 from .manifest import Bottle

@@ -57,48 +62,45 @@ def pipelock_bottle_allowlist(bottle: Bottle) -> list[str]:
    return list(bottle.egress.allowlist)


-def pipelock_token_hosts(bottle: Bottle) -> list[str]:
-    """Hostnames the cred-proxy sidecar (PRD 0010) talks to upstream
-    on the agent's behalf. Derived from each route's
-    `upstream.UpstreamHost` in `bottle.cred_proxy.routes`. Returned
-    sorted+deduped.
+def pipelock_route_hosts(bottle: Bottle) -> list[str]:
+    """Hostnames declared in `bottle.egress_proxy.routes`. Returned
+    sorted + deduped.

-    These hosts must be on pipelock's allowlist so cred-proxy's
-    outbound HTTPS traffic can leave the egress network. They are
-    NOT auto-added to passthrough_domains: cred-proxy's HTTPS client
-    trusts pipelock's per-bottle CA at runtime (installed via
-    docker cp + update-ca-certificates in the cred-proxy image),
-    so pipelock MITMs and body-scans the cred-proxy → upstream leg
-    the same way it does direct agent traffic."""
-    hosts = {r.UpstreamHost for r in bottle.cred_proxy.routes if r.UpstreamHost}
+    Post-cutover topology (PRD 0017): the agent's HTTPS_PROXY points
+    at egress-proxy, not pipelock; egress-proxy's outbound leg sets
+    `HTTPS_PROXY=pipelock`. So pipelock no longer terminates the
+    agent's connections — it sees the egress-proxy → upstream leg
+    only. Each declared route's host still needs to be on pipelock's
+    allowlist so that leg can leave the egress network."""
+    hosts = {r.Host for r in bottle.egress_proxy.routes if r.Host}
    return sorted(hosts)


 def pipelock_effective_allowlist(bottle: Bottle) -> list[str]:
    """Deduplicated union of: baked-in defaults, bottle.egress.allowlist,
-    the cred-proxy upstream hosts derived from bottle.cred_proxy.routes,
-    the cred-proxy sidecar's own hostname when any cred_proxy route is
-    declared, and the supervise sidecar's hostname when bottle.supervise
-    is enabled. Sorted for stability. Git upstreams declared in
-    `bottle.git` do NOT contribute here — git traffic flows through the
-    per-agent git-gate sidecar (PRD 0008), not pipelock.
+    the egress-proxy route hosts (from bottle.egress_proxy.routes), the
+    egress-proxy sidecar's own hostname when any route is declared, and
+    the supervise sidecar's hostname when bottle.supervise is enabled.
+    Sorted for stability. Git upstreams declared in `bottle.git` do NOT
+    contribute here — git traffic flows through the per-agent git-gate
+    sidecar (PRD 0008), not pipelock.

-    The cred-proxy + supervise hostnames are auto-added because the
-    agent's HTTP_PROXY points at pipelock, so a manifest-driven URL
-    like `http://cred-proxy:9099/anthropic/...` or
-    `http://supervise:9100/` arrives at pipelock as a request for the
-    sidecar hostname. Without this auto-allow, pipelock would 403 the
-    request before it reached the sidecar."""
+    The egress-proxy + supervise hostnames are auto-added because the
+    sidecars sit on the bottle's internal network alongside the agent;
+    requests that pass through pipelock for `egress-proxy:9099` or
+    `supervise:9100` (e.g. when egress-proxy uses HTTPS_PROXY=pipelock
+    on its upstream leg) would otherwise be 403'd by pipelock's
+    hostname gate."""
    seen: dict[str, None] = {}
    for h in DEFAULT_ALLOWLIST:
        seen.setdefault(h, None)
    for h in pipelock_bottle_allowlist(bottle):
        if h:
            seen.setdefault(h, None)
-    for h in pipelock_token_hosts(bottle):
+    for h in pipelock_route_hosts(bottle):
        seen.setdefault(h, None)
-    if bottle.cred_proxy.routes:
-        seen.setdefault(CRED_PROXY_HOSTNAME, None)
+    if bottle.egress_proxy.routes:
+        seen.setdefault(EGRESS_PROXY_HOSTNAME, None)
    if bottle.supervise:
        seen.setdefault(SUPERVISE_HOSTNAME, None)
    return sorted(seen.keys())
@@ -122,16 +124,16 @@ def pipelock_seed_phrase_detection_enabled(bottle: Bottle) -> bool:

    Empirically only `seed_phrase_detection.enabled: false`
    actually stops the block (verified by sending a 12-word BIP-39
-    body through three pipelock instances). It is a global toggle
-    — there is no per-path / per-host knob in pipelock 2.3.0 — so
-    we turn the detector off for the entire bottle when an
-    `anthropic-base-url` route is declared. The trade-off is
+    body through three pipelock instances). It is a global toggle —
+    no per-path / per-host knob in pipelock 2.3.0 — so we turn the
+    detector off for the entire bottle when the bottle declares an
+    egress-proxy route to `api.anthropic.com`. The trade-off is
    accepted: BIP-39 detection has little value in claude-bottle's
-    threat model (the agent has no access to a user's crypto
-    wallet seeds; the patterns that matter — gh*_, sk-ant-, AKIA,
-    etc. — keep firing)."""
+    threat model (the agent has no access to a user's crypto wallet
+    seeds; the patterns that matter — gh*_, sk-ant-, AKIA, etc. —
+    keep firing)."""
    return not any(
-        "anthropic-base-url" in r.Role for r in bottle.cred_proxy.routes
+        r.Host == "api.anthropic.com" for r in bottle.egress_proxy.routes
    )


@@ -143,16 +145,12 @@ def pipelock_effective_tls_passthrough(bottle: Bottle) -> list[str]:
    other allowlisted host is MITM'd by pipelock's per-bottle CA so
    its body scanner sees the cleartext.

-    cred-proxy upstream hosts (github, gitea, npm) are deliberately
-    NOT auto-added here. cred-proxy's HTTPS client trusts pipelock's
-    CA at runtime (folded into its trust store via docker cp +
-    update-ca-certificates), so pipelock can MITM the cred-proxy →
-    upstream leg and body-scan it the same way it body-scans the
-    agent's direct HTTPS traffic. Without this, an agent that pushed
-    a secret via cred-proxy's /gh-git/ path would have no body
-    scanner in front of it. The PRD's earlier reasoning that
-    cred-proxy hosts needed passthrough was a workaround for the
-    cert-trust gap that no longer exists.
+    egress-proxy route hosts (github, gitea, npm) are deliberately
+    NOT auto-added here. egress-proxy's HTTPS client trusts pipelock's
+    CA at runtime (folded into its trust store via docker cp), so
+    pipelock MITMs and body-scans the egress-proxy → upstream leg the
+    same way it body-scanned the agent's direct HTTPS traffic before
+    the PRD 0017 cutover.

    `bottle` is kept on the signature for forward-compat (a future
    knob might let a manifest opt a host into passthrough); today
@@ -207,13 +205,13 @@ def pipelock_build_config(

    `ssrf_ip_allowlist` is the list of IPs / CIDRs that bypass
    pipelock's SSRF guard. Pipelock blocks RFC1918-resolved
-    destinations by default, which would catch the agent's
-    cred-proxy traffic (cred-proxy sits on the bottle's internal
-    Docker network in 172.x space). Pass the bottle's internal
-    network CIDR here so `cred-proxy:9099` requests get through
-    pipelock while api_allowlist + body-scanning still apply. Empty
-    by default; omitted from the rendered yaml when empty so
-    pipelock keeps its built-in SSRF defaults."""
+    destinations by default, which would catch sibling-sidecar
+    traffic on the bottle's internal Docker network in 172.x space
+    (e.g. egress-proxy → pipelock on the upstream leg). Pass the
+    bottle's internal network CIDR here so internal-network requests
+    pass through pipelock while api_allowlist + body-scanning still
+    apply. Empty by default; omitted from the rendered yaml when
+    empty so pipelock keeps its built-in SSRF defaults."""
    cfg: dict[str, object] = {
        "version": 1,
        "mode": "strict",
@@ -322,9 +320,9 @@ class PipelockProxyPlan:
    that they are populated.

    `internal_network_cidr` ends up on pipelock's `ssrf.ip_allowlist`
-    so the agent's requests at `cred-proxy:9099` (or any other
-    bottle-internal sidecar) bypass pipelock's RFC1918 SSRF guard
-    while api_allowlist and body-scanning still apply."""
+    so traffic from sibling sidecars (egress-proxy → pipelock on the
+    upstream leg, etc.) bypasses pipelock's RFC1918 SSRF guard while
+    api_allowlist and body-scanning still apply."""

    yaml_path: Path
    slug: str