Files
bot-bottle/claude_bottle/egress_proxy.py
T
didericis 3be70eb07a
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m7s
feat(supervise): list-egress-proxy-routes MCP tool, defaults on egress-proxy
Reshape the allowlist topology so the egress-proxy is the bottle's
single allowlist surface, and replace the agent-side
routes/allowlist file mounts with a live MCP tool.

Policy change (move defaults to egress-proxy):

  - `egress_proxy_routes_for_bottle(bottle)` now folds in
    DEFAULT_ALLOWLIST (the claude-code defaults) and
    `bottle.egress.allowlist` (user adds) as bare-pass routes (no
    auth, no path filter), on top of the bottle's
    `egress_proxy.routes`. Manifest routes win on host collision.
  - `pipelock_effective_allowlist(bottle)` mirrors egress-proxy's
    effective host set when egress-proxy is in use. Pipelock is
    no longer the bottle's primary allowlist authority; it
    enforces a downstream copy as defense-in-depth + does DLP body
    scanning.
  - Split out `egress_proxy_manifest_routes(bottle)` for callers
    that want just the manifest entries (tests, internal use).
  - DEFAULT_ALLOWLIST moves from `pipelock.py` to `egress_proxy.py`
    (pipelock re-imports for the no-egress-proxy fallback path).
  - Dropped the `egress-proxy` auto-allow on pipelock's allowlist
    — the agent never dials egress-proxy via the proxy mechanism;
    pipelock only sees upstream hostnames from egress-proxy's
    CONNECTs.

Introspection endpoint (existing mitmproxy feature):

  - Egress-proxy addon recognises requests to the magic host
    `_egress-proxy.local` and synthesizes responses via
    `flow.response = http.Response.make(...)` — no upstream
    connection, no allowlist enforcement on the magic host.
  - `GET /allowlist` returns the in-memory route table as JSON
    (host + path_allowlist + auth_scheme + token_env per route;
    no token VALUES).
  - Smoke-tested end-to-end against a real egress-proxy container.

MCP tool (existing supervise plumbing):

  - New `list-egress-proxy-routes` tool (no inputs, no operator
    approval). Handler fetches via egress-proxy's introspection
    endpoint using urllib's ProxyHandler against
    `EGRESS_PROXY_FORWARD_PROXY`. Returns the JSON payload as the
    tool's text content; `isError: true` if the proxy is
    unreachable.
  - `egress-proxy-block` description now points the agent at
    `list-egress-proxy-routes` instead of a staged file path.
  - `pipelock-block` description acknowledges the mirror — agents
    should prefer `egress-proxy-block` to add hosts; pipelock-block
    stays for the rare divergence case.

Drop agent-side file mounts:

  - Supervise's `current-config` dir staging no longer writes
    routes.yaml / allowlist. Only `Dockerfile` remains
    (capability-block still reads it from
    `/etc/claude-bottle/current-config/Dockerfile`).
  - `prepare.py` stops passing `routes_content` /
    `allowlist_content` to `supervise.prepare`.
  - `Supervise.prepare` signature simplified to one
    `dockerfile_content` kwarg.

Tests: 400 unit + integration pass. Added coverage for
defaults-folding (`TestRoutesForBottleFoldsDefaults`), the new
tool definition + handler, and the updated supervise.prepare
shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:23:01 -04:00

347 lines
13 KiB
Python

"""Per-bottle egress proxy (PRD 0017).
Replaces the cred-proxy sidecar (PRD 0010) with a mitmproxy-based
sidecar that becomes the agent's `HTTP_PROXY` / `HTTPS_PROXY`. It
owns three jobs:
1. MITM the agent's HTTPS with the per-bottle CA (moved from
pipelock).
2. Enforce manifest-declared `path_allowlist` per route.
3. Inject `Authorization` headers for routes that declare an
`auth` block, the same way cred-proxy does today.
This module defines the abstract proxy (`EgressProxy`), its plan
dataclass (`EgressProxyPlan`), and the resolved per-route shape
(`EgressProxyRoute`). The sidecar's start/stop lifecycle is backend-
specific and lives on concrete subclasses (see
`claude_bottle/backend/docker/egress_proxy.py`).
Chunks 1+2 of the PRD: this module + the mitmproxy addon + the Docker
lifecycle are wired into the agent's `HTTP_PROXY` path; cred-proxy
has been removed. Chunk 3 retargets the cred-proxy-block remediation
flow (PRD 0014) at egress-proxy and renames the MCP tool.
"""
from __future__ import annotations
import json
from abc import ABC, abstractmethod
from dataclasses import dataclass
from pathlib import Path
from .log import die
from .manifest import Bottle
# DNS name agents will dial for the per-bottle egress-proxy sidecar.
# Backend-agnostic by contract: every concrete backend (Docker today,
# others later) attaches this name to its sidecar on the bottle's
# internal network. The agent's `HTTP_PROXY` env var resolves to
# `http://egress-proxy:<port>` once chunk 2 cuts over.
EGRESS_PROXY_HOSTNAME = "egress-proxy"
# In-container path the addon reads. Pre-created in
# `Dockerfile.egress-proxy` so `docker cp` can drop the file directly.
# `.yaml` extension per PRD 0017 — content is JSON (valid YAML) so
# both sides can use stdlib `json`.
EGRESS_PROXY_ROUTES_IN_CONTAINER = "/etc/egress-proxy/routes.yaml"
@dataclass(frozen=True)
class EgressProxyRoute:
"""One resolved route on the egress-proxy sidecar.
`host` matches the request's hostname (case-insensitive). The
optional `path_allowlist` constrains the URL path; empty tuple
means no path-level filtering. The `auth_scheme` / `token_env` /
`token_ref` triple is the credential-injection config; empty
strings mean "no auth injection" (the manifest's nested `auth`
block was omitted).
`token_env` is the env-var slot inside the egress-proxy container
(e.g. `EGRESS_PROXY_TOKEN_0`); `token_ref` is the host env var
the CLI reads at launch and forwards into the container's environ
under `token_env`. Routes that share a `token_ref` coalesce to
one `token_env` slot.
`roles` carries the manifest route's optional role markers (see
`manifest.EGRESS_PROXY_ROLES`). The launch step reads these for
side effects like the claude-code OAuth placeholder env."""
host: str
path_allowlist: tuple[str, ...] = ()
auth_scheme: str = ""
token_env: str = ""
token_ref: str = ""
roles: tuple[str, ...] = ()
@dataclass(frozen=True)
class EgressProxyPlan:
"""Output of EgressProxy.prepare; consumed by .start.
The slug + routes_path + routes + token_env_map fields are
filled at prepare time (host-side, side-effect-free on docker).
The network + CA + pipelock fields are populated by the backend's
launch step via `dataclasses.replace` once those resources
exist. Empty defaults are sentinels meaning "not yet set";
`.start` validates that they are populated.
`token_env_map` is `{<token_env in container>: <token_ref on host>}`.
The backend's start step reads `os.environ[token_ref]` and
forwards the value into the egress-proxy container's environ
under `token_env`. The plan itself never holds token values —
secrets never land in a dataclass that might be logged.
`mitmproxy_ca_host_path` is the host path of the per-bottle
egress-proxy CA (single PEM with cert+key concatenated) minted
by `egress_proxy_tls_init`. `.start` docker-cps it into the
sidecar at `~/.mitmproxy/mitmproxy-ca.pem` — mitmproxy reads
that file at boot to mint per-host leaf certs.
`mitmproxy_ca_cert_only_host_path` is the cert-only PEM (no
key) for installing into the agent's trust store via
`provision_ca`. Separate file rather than re-parsing the
concat so secrets and trust artefacts stay on distinct paths.
`pipelock_ca_host_path` is the host path of the pipelock CA
(cert only). `.start` docker-cps it into the sidecar so the
proxy's outbound HTTPS client trusts pipelock's MITM on the
egress-proxy → upstream leg.
`pipelock_proxy_url` is the URL egress-proxy sets as `HTTPS_PROXY`
in its environ so outbound HTTPS traverses pipelock — keeping
pipelock's hostname allowlist + DLP body scanner on the
egress-proxy → upstream leg.
"""
slug: str
routes_path: Path
routes: tuple[EgressProxyRoute, ...]
token_env_map: dict[str, str]
internal_network: str = ""
egress_network: str = ""
mitmproxy_ca_host_path: Path = Path()
mitmproxy_ca_cert_only_host_path: Path = Path()
pipelock_ca_host_path: Path = Path()
pipelock_proxy_url: str = ""
# Hosts the agent needs by default for claude-code itself. Folded
# into every bottle's egress-proxy routes table as bare-pass entries
# (no auth, no path filter) so the agent reaches them without each
# bottle having to opt in. Pipelock used to own this list; PRD 0017
# moves it to egress-proxy because egress-proxy is the primary gate
# now and pipelock's allowlist is mirrored from egress-proxy.
DEFAULT_ALLOWLIST: tuple[str, ...] = (
"api.anthropic.com",
"statsig.anthropic.com",
"sentry.io",
"claude.ai",
"platform.claude.com",
"downloads.claude.ai",
"raw.githubusercontent.com",
)
def egress_proxy_manifest_routes(
bottle: Bottle,
) -> tuple[EgressProxyRoute, ...]:
"""Lift each `bottle.egress_proxy.routes[]` manifest entry into a
resolved EgressProxyRoute. Order is preserved so route lookup at
the proxy is stable.
Token-env slots are assigned per distinct `token_ref`: the first
authenticated route with `token_ref` "GH_PAT" gets
`EGRESS_PROXY_TOKEN_0`; a second route with the same `token_ref`
shares slot 0. Unauthenticated routes (`auth` omitted) contribute
no slot.
Does NOT include the folded-in DEFAULT_ALLOWLIST /
bottle.egress.allowlist bare-pass entries — see
`egress_proxy_routes_for_bottle` for the effective set the
addon enforces."""
out: list[EgressProxyRoute] = []
slot_for_token: dict[str, str] = {}
for r in bottle.egress_proxy.routes:
if r.AuthScheme and r.TokenRef:
token_env = slot_for_token.get(r.TokenRef)
if token_env is None:
token_env = f"EGRESS_PROXY_TOKEN_{len(slot_for_token)}"
slot_for_token[r.TokenRef] = token_env
out.append(EgressProxyRoute(
host=r.Host,
path_allowlist=r.PathAllowlist,
auth_scheme=r.AuthScheme,
token_env=token_env,
token_ref=r.TokenRef,
roles=r.Role,
))
else:
out.append(EgressProxyRoute(
host=r.Host,
path_allowlist=r.PathAllowlist,
roles=r.Role,
))
return tuple(out)
def egress_proxy_routes_for_bottle(
bottle: Bottle,
) -> tuple[EgressProxyRoute, ...]:
"""Effective egress-proxy routes: manifest routes followed by
bare-pass entries for DEFAULT_ALLOWLIST hosts and
`bottle.egress.allowlist` hosts. This is what gets rendered into
routes.yaml + what the addon enforces.
Manifest routes win over defaults on host collision (manifest
routes carry more specific config — auth, path filter, role
markers). Hostname comparison is case-insensitive."""
out: list[EgressProxyRoute] = list(egress_proxy_manifest_routes(bottle))
claimed: set[str] = {r.host.lower() for r in out}
for host in DEFAULT_ALLOWLIST:
if host.lower() not in claimed:
out.append(EgressProxyRoute(host=host))
claimed.add(host.lower())
for host in bottle.egress.allowlist:
if host and host.lower() not in claimed:
out.append(EgressProxyRoute(host=host))
claimed.add(host.lower())
return tuple(out)
def egress_proxy_token_env_map(
routes: tuple[EgressProxyRoute, ...],
) -> dict[str, str]:
"""Collapse the route list into `{token_env: token_ref}` for the
authenticated routes. Routes without `auth` contribute no entry.
Conflict detection: two routes that share a `token_env` slot but
name different `token_ref` host vars is a programming error in
`egress_proxy_routes_for_bottle`; surface it as a die rather than
silently picking one."""
out: dict[str, str] = {}
for r in routes:
if not r.token_env:
continue
existing = out.get(r.token_env)
if existing is not None and existing != r.token_ref:
die(
f"egress-proxy plan conflict: {r.token_env} maps to both "
f"{existing!r} and {r.token_ref!r}. Two routes sharing a "
f"token slot must reference the same host env var."
)
out[r.token_env] = r.token_ref
return out
def egress_proxy_render_routes(
routes: tuple[EgressProxyRoute, ...],
) -> str:
"""Serialize the route table for the addon to read.
JSON content (valid YAML), no token values, no host env-var
names — the only thing the addon needs at runtime is the host →
path_allowlist + auth_scheme + in-container env-var mapping. The
actual token values arrive via the container's environ.
Authenticated routes carry `auth_scheme` + `token_env`;
unauthenticated routes omit both keys (the addon's parser
enforces both-or-neither)."""
payload_routes: list[dict[str, object]] = []
for r in routes:
entry: dict[str, object] = {"host": r.host}
if r.path_allowlist:
entry["path_allowlist"] = list(r.path_allowlist)
if r.auth_scheme and r.token_env:
entry["auth_scheme"] = r.auth_scheme
entry["token_env"] = r.token_env
payload_routes.append(entry)
payload = {"routes": payload_routes}
return json.dumps(payload, indent=2, sort_keys=False) + "\n"
def egress_proxy_resolve_token_values(
token_env_map: dict[str, str],
host_env: dict[str, str],
) -> dict[str, str]:
"""Read `host_env[TokenRef]` for each entry in `token_env_map` and
return `{token_env: <value>}`. Dies (with a pointer at the missing
var name) if any TokenRef is unset.
Pure function: takes the host env as an argument so tests can pass
a sealed mapping without touching `os.environ`."""
out: dict[str, str] = {}
for token_env, token_ref in token_env_map.items():
value = host_env.get(token_ref)
if value is None:
die(
f"egress-proxy: host env var '{token_ref}' is unset. Set it "
f"before launching, or remove the corresponding auth block "
f"from bottle.egress_proxy.routes."
)
if not value:
die(
f"egress-proxy: host env var '{token_ref}' is empty. The "
f"egress-proxy will not inject an empty token; set it to "
f"the real value or remove the route's auth block."
)
out[token_env] = value
return out
class EgressProxy(ABC):
"""The per-bottle egress proxy. Encapsulates the host-side prepare
(route lift + routes.yaml render + token-env-map derivation); the
sidecar's start/stop lifecycle is backend-specific and lives on
concrete subclasses."""
def prepare(self, bottle: Bottle, slug: str, stage_dir: Path) -> EgressProxyPlan:
"""Lift `bottle.egress_proxy.routes` into resolved routes,
render the routes file (mode 600) under `stage_dir`, and
return the plan. Pure host-side, no docker subprocess. The
token-env map records the mapping the launch step uses to
forward values from the host's environ into the sidecar's
environ.
Returned plan is incomplete: the launch step must fill
`internal_network` / `egress_network` / `pipelock_proxy_url`
via `dataclasses.replace` before passing it to `.start`."""
routes = egress_proxy_routes_for_bottle(bottle)
routes_path = stage_dir / "egress_proxy_routes.yaml"
routes_path.write_text(egress_proxy_render_routes(routes))
routes_path.chmod(0o600)
return EgressProxyPlan(
slug=slug,
routes_path=routes_path,
routes=routes,
token_env_map=egress_proxy_token_env_map(routes),
)
@abstractmethod
def start(self, plan: EgressProxyPlan) -> str:
"""Bring up the egress-proxy sidecar according to `plan`.
Returns the target string identifying the running instance —
the same value to pass to `.stop`. Backend-specific."""
@abstractmethod
def stop(self, target: str) -> None:
"""Tear down the egress-proxy sidecar identified by `target`
(the value `.start` returned). Idempotent: a missing target
is success. Backend-specific."""
__all__ = [
"DEFAULT_ALLOWLIST",
"EGRESS_PROXY_HOSTNAME",
"EGRESS_PROXY_ROUTES_IN_CONTAINER",
"EgressProxy",
"EgressProxyPlan",
"EgressProxyRoute",
"egress_proxy_manifest_routes",
"egress_proxy_render_routes",
"egress_proxy_resolve_token_values",
"egress_proxy_routes_for_bottle",
"egress_proxy_token_env_map",
]