PRD 0017: Egress-proxy — universal MITM via mitmproxy (replaces cred-proxy) #27
@@ -0,0 +1,324 @@
|
||||
# PRD 0017: Egress-proxy — universal MITM with path filtering + auth injection
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** didericis
|
||||
- **Created:** 2026-05-25
|
||||
- **Supersedes:** the cred-proxy sidecar (PRD 0010) — hard cutover.
|
||||
|
||||
## Summary
|
||||
|
||||
Replace the per-bottle cred-proxy sidecar with a new `egress-proxy`
|
||||
sidecar built on mitmproxy. The egress-proxy is the agent's
|
||||
`HTTP_PROXY` / `HTTPS_PROXY` — every agent HTTP/HTTPS request flows
|
||||
through it before reaching pipelock. It owns three jobs that today
|
||||
are split between cred-proxy and pipelock:
|
||||
|
||||
1. **MITM the agent's HTTPS.** Uses the per-bottle CA today held by
|
||||
pipelock; that key moves to the egress-proxy.
|
||||
2. **Path-level allow/deny.** Manifest-declared `path_allowlist`
|
||||
per route. Universal coverage — any HTTPS path the agent reaches
|
||||
for is inspected here, not just traffic that voluntarily dials
|
||||
the cred-proxy URL.
|
||||
3. **Credential injection.** Continues cred-proxy's existing role:
|
||||
match by hostname (or hostname + path), strip inbound
|
||||
Authorization, inject one based on the route's optional `auth:
|
||||
{ scheme, token_ref }` block.
|
||||
|
||||
Pipelock's role narrows to hostname allowlist + DLP body scanning
|
||||
on the egress-proxy → upstream leg. Pipelock no longer holds the
|
||||
CA private key; no longer the agent's direct proxy.
|
||||
|
||||
## Problem
|
||||
|
||||
PR #25's pipelock-block flow exposed an honest gap: pipelock's
|
||||
`api_allowlist` is hostname-only (verified by probing the binary's
|
||||
strict preset and the `pipelock check --url` output). Approving a
|
||||
proposed `pipelock-block` opens the entire host, not the URL's
|
||||
path. For shared platforms (github.com, gitlab.com, public
|
||||
registries) operators routinely want narrower-than-host granularity
|
||||
— allow github.com/didericis but block github.com/somebody-else.
|
||||
|
||||
Cred-proxy already does path-prefix routing for credentialed APIs,
|
||||
but it only sees the requests the agent voluntarily routes to it
|
||||
(via `ANTHROPIC_BASE_URL`, `~/.gitconfig` insteadOf, npmrc
|
||||
`registry=`). A raw `curl https://github.com/anyone` from the agent
|
||||
goes to `HTTPS_PROXY=pipelock` directly and bypasses cred-proxy
|
||||
entirely. So extending cred-proxy with `path_allowlist` (the earlier
|
||||
PRD 0017 draft) buys *opt-in* path filtering, not enforcement.
|
||||
|
||||
For enforcement we need a layer that sits on the agent's
|
||||
`HTTPS_PROXY` path — universal coverage of agent egress.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
A bottle manifest declares an egress-proxy route with a
|
||||
`path_allowlist`. From inside the bottle, `curl
|
||||
https://github.com/didericis/foo` succeeds; `curl
|
||||
https://github.com/somebody-else/secret` gets a 403 from
|
||||
egress-proxy, never reaches pipelock or the real github. The same
|
||||
holds for any tool inside the bottle that respects
|
||||
`HTTPS_PROXY` — claude-code, git over HTTPS, npm, raw curl, random
|
||||
Python `requests`. No tool-specific rewrite is required for path
|
||||
enforcement.
|
||||
|
||||
Existing cred-proxy responsibilities continue to work after the
|
||||
cutover: Anthropic OAuth injection for claude-code (via the
|
||||
proxy-side header injection rather than the dotfile rewrite),
|
||||
git-insteadof routing into the proxy stays useful for hostname
|
||||
canonicalisation but is no longer load-bearing for credential
|
||||
delivery.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Replacing pipelock. Pipelock keeps doing hostname allowlist +
|
||||
DLP body scanning on the egress-proxy → upstream leg.
|
||||
- Building our own MITM stack. mitmproxy already does it; we ship
|
||||
addons.
|
||||
- Backward compatibility with `bottle.cred_proxy.routes[]`. Hard
|
||||
cutover (see Migration).
|
||||
- Path-level rules in pipelock. Upstream feature request is a
|
||||
separate track (file independently); this PRD doesn't depend on
|
||||
it.
|
||||
|
||||
## Scope
|
||||
|
||||
### In scope
|
||||
|
||||
- A new `egress-proxy` sidecar replacing the cred-proxy sidecar.
|
||||
mitmproxy image, pinned by digest. Addons in Python.
|
||||
- Per-bottle CA generation **moves from pipelock to egress-proxy**.
|
||||
The agent's trust store is rebuilt against the egress-proxy CA
|
||||
(was pipelock's CA).
|
||||
- Manifest rename: `bottle.cred_proxy.routes[]` →
|
||||
`bottle.egress_proxy.routes[]`. The route shape gains optional
|
||||
`path_allowlist: [<prefix>, ...]` and a nested optional `auth:
|
||||
{ scheme, token_ref }` block (presence/absence of `auth` is the
|
||||
authenticated vs unauthenticated signal — replaces the old
|
||||
`auth_scheme: "none"` pattern).
|
||||
- Agent's `HTTP_PROXY` / `HTTPS_PROXY` env vars repointed at the
|
||||
egress-proxy (was pipelock).
|
||||
- Pipelock retains its sidecar slot and its own DLP + hostname
|
||||
scanner. The agent never dials it directly anymore; egress-proxy
|
||||
uses `HTTPS_PROXY=pipelock` for its outbound leg, matching the
|
||||
current cred-proxy → pipelock pattern.
|
||||
- Existing PRDs that depend on cred-proxy:
|
||||
- PRD 0014 (cred-proxy-block remediation) → renames + retargets
|
||||
apply path. SIGHUP reload semantics carry over to egress-proxy.
|
||||
- PRD 0013 (supervise plane) `cred-proxy-block` MCP tool stays;
|
||||
its proposed file format updates per the new route shape.
|
||||
- Removal of the old cred-proxy code: `claude_bottle/cred_proxy.py`,
|
||||
`cred_proxy_server.py`, `backend/docker/cred_proxy.py`,
|
||||
`provision/cred_proxy.py`, the `Dockerfile.cred-proxy`. Tests
|
||||
updated.
|
||||
|
||||
### Out of scope
|
||||
|
||||
- Pipelock CA path: pipelock keeps generating its *own* CA for
|
||||
any internal TLS termination it still does (e.g., on the
|
||||
egress-proxy → upstream leg if pipelock is the MITM there).
|
||||
Whether pipelock needs that CA at all post-cutover is an open
|
||||
question (probably no — egress-proxy already terminated; pipelock
|
||||
is now downstream of a plain-HTTP forward from egress-proxy).
|
||||
- Glob / regex matching in `path_allowlist`. v1 ships prefix
|
||||
matching; expressive forms are a follow-up.
|
||||
- An MCP tool for the agent to propose `path_allowlist`
|
||||
additions. Today the operator manages this via the manifest +
|
||||
the existing `routes edit <bottle>` TUI verb (renamed to
|
||||
`egress-proxy edit <bottle>`).
|
||||
|
||||
## Proposed design
|
||||
|
||||
### Topology
|
||||
|
||||
```
|
||||
[Agent] --HTTP_PROXY=egress-proxy-->
|
||||
[egress-proxy (mitmproxy)]
|
||||
MITM with per-bottle CA
|
||||
path_allowlist enforcement
|
||||
Authorization header injection
|
||||
--HTTPS_PROXY=pipelock-->
|
||||
[pipelock]
|
||||
hostname allowlist
|
||||
DLP body scan
|
||||
--egress--> Internet
|
||||
```
|
||||
|
||||
Universal coverage: every HTTP/HTTPS request the agent makes hits
|
||||
egress-proxy first. cred-proxy's URL convention
|
||||
(`http://cred-proxy:9099/...`) goes away — there's no need for the
|
||||
agent to address the proxy by name because it's already on the
|
||||
default proxy path.
|
||||
|
||||
### Manifest
|
||||
|
||||
```yaml
|
||||
egress_proxy:
|
||||
routes:
|
||||
# Authenticated route — `auth` block carries the injection
|
||||
# config. path_allowlist optional.
|
||||
- host: "api.github.com"
|
||||
auth:
|
||||
scheme: "Bearer"
|
||||
token_ref: "GH_PAT"
|
||||
path_allowlist:
|
||||
- "/repos/didericis/"
|
||||
- "/users/didericis"
|
||||
# Unauthenticated path-filtered route — `auth` omitted
|
||||
# entirely (presence/absence of the key is the auth signal).
|
||||
- host: "github.com"
|
||||
path_allowlist:
|
||||
- "/didericis/"
|
||||
# Bare-pass route: no auth, no path constraint. Useful when
|
||||
# you want a host to skip path filtering but still be
|
||||
# DLP-scanned by pipelock on the outbound leg.
|
||||
- host: "api.anthropic.com"
|
||||
```
|
||||
|
||||
Route matching is on `host` (was `path` prefix). The hostname
|
||||
gates whether a route applies; `path_allowlist` (if present)
|
||||
constrains the URL path under that host. The optional `auth`
|
||||
block carries credential-injection config:
|
||||
|
||||
- Omit `auth` → no Authorization header injected (replaces the
|
||||
earlier draft's `auth_scheme: "none"`).
|
||||
- `auth.scheme` → one of `Bearer`, `token` (the values
|
||||
cred-proxy supports today; sidesteps the gitea-token quirk).
|
||||
- `auth.token_ref` → host env var holding the secret. Same
|
||||
semantics as cred-proxy's `TokenRef` field today.
|
||||
|
||||
Validation: `auth` (if present) must contain both `scheme` and
|
||||
`token_ref`. An empty `auth: {}` is an error rather than a
|
||||
synonym for "no auth" — that's what omission is for.
|
||||
|
||||
### mitmproxy addon shape
|
||||
|
||||
The egress-proxy ships a small Python addon that:
|
||||
|
||||
- Loads the per-bottle routes from `/etc/egress-proxy/routes.yaml`
|
||||
(rendered by the prepare step, docker-cp'd in like cred-proxy's
|
||||
current routes.json).
|
||||
- On `request` hook: match `flow.request.host` → route. If no route
|
||||
matches → forward unchanged (pipelock will hostname-gate it). If
|
||||
route matches and has `path_allowlist`, check `flow.request.path`
|
||||
against the prefix list; 403 with a clear reason if no match.
|
||||
- On approved requests: strip inbound Authorization. If the route
|
||||
carries an `auth` block, inject `Authorization: <auth.scheme>
|
||||
<token-from-env-named-by-auth.token_ref>`. If the route omits
|
||||
`auth`, leave Authorization unset.
|
||||
- SIGHUP / file-mtime watch on `routes.yaml` for hot-reload (same
|
||||
cadence as today's cred-proxy SIGHUP path).
|
||||
|
||||
mitmproxy's standard CA generation handles per-host leaf certs at
|
||||
SNI time. The per-bottle CA is generated at bottle launch (was
|
||||
pipelock's tls-init step; now egress-proxy's). Agent's trust store
|
||||
gets the egress-proxy CA installed in place of pipelock's.
|
||||
|
||||
### Trust-domain concentration
|
||||
|
||||
The egress-proxy now holds:
|
||||
|
||||
- Every credential the bottle declared in `egress_proxy.routes[]`
|
||||
(OAuth tokens, PATs, npm tokens).
|
||||
- The per-bottle MITM CA private key.
|
||||
|
||||
This is a deliberate concentration. With the previous split:
|
||||
|
||||
- cred-proxy held tokens.
|
||||
- pipelock held the CA.
|
||||
|
||||
A memory disclosure in cred-proxy exposed tokens; in pipelock,
|
||||
the CA. Both were bad; neither exposed everything.
|
||||
|
||||
The new egress-proxy in one disclosure exposes both. Mitigations:
|
||||
|
||||
- mitmproxy runs as an unprivileged user inside the container.
|
||||
- Tokens live in the container's environ (same as cred-proxy today).
|
||||
The CA private key is mounted from the host's stage_dir (mode 600).
|
||||
- Pipelock stays as a separate sidecar, so a compromise of
|
||||
egress-proxy doesn't disable pipelock's hostname check + DLP on
|
||||
the outbound leg — the attacker can forge certs to the agent but
|
||||
can't easily exfil from inside the agent without pipelock
|
||||
noticing.
|
||||
|
||||
The user (per PR #25 discussion) accepted this concentration in
|
||||
exchange for the one-sidecar consolidation. The PRD records it
|
||||
explicitly.
|
||||
|
||||
### Migration — hard cutover
|
||||
|
||||
No backward-compat alias for `bottle.cred_proxy.routes[]`. At
|
||||
manifest load:
|
||||
|
||||
- `cred_proxy:` block → `die()` with a clear pointer at this PRD
|
||||
and a migration recipe (rename to `egress_proxy:`, rename
|
||||
`path` → `host`, drop the agent-side URL prefix).
|
||||
- `cred_proxy_routes` field on existing dataclasses removed.
|
||||
- `Dockerfile.cred-proxy` deleted.
|
||||
- `claude_bottle/cred_proxy*.py` deleted.
|
||||
- `claude_bottle/backend/docker/cred_proxy*.py` consolidated into
|
||||
`egress_proxy*.py`.
|
||||
- Provisioner files renamed.
|
||||
- PRDs 0010 (cred-proxy), 0014 (cred-proxy-block remediation)
|
||||
retroactively annotated as "superseded by 0017" — old text
|
||||
preserved, header updated.
|
||||
|
||||
### Implementation chunks
|
||||
|
||||
Plausibly three implementation PRs after this PRD lands:
|
||||
|
||||
1. **egress-proxy sidecar core.** Dockerfile + mitmproxy addon +
|
||||
`routes.yaml` schema + lifecycle (prepare / start / stop / SIGHUP).
|
||||
2. **Manifest + provisioner migration.** Rename cred-proxy
|
||||
throughout the codebase, hard-fail on legacy manifests, update
|
||||
agent CA trust to point at egress-proxy.
|
||||
3. **PRD 0014 retargeting.** cred-proxy-block remediation's apply
|
||||
path repointed at egress-proxy (SIGHUP, audit log, etc.).
|
||||
Supervise tool description updated.
|
||||
|
||||
## Open questions
|
||||
|
||||
- **mitmproxy addon distribution.** Mount the addon Python file
|
||||
from stage_dir, or bake it into the image. Mount is more
|
||||
hot-reloadable; bake-in is more reproducible. Recommend bake-in,
|
||||
with routes.yaml as the only mounted state.
|
||||
- **Path match semantics.** Prefix-only for v1 (matches PRD 0017
|
||||
v1 spirit). Globs / regex are a follow-up if operators ask.
|
||||
- **Mode for the `Authorization` strip on inbound.** Pipelock has a
|
||||
similar strip in `sensitive_headers`. Confirm there's no
|
||||
double-strip causing a real header the agent set to disappear
|
||||
unexpectedly. Probably want egress-proxy to be the only stripper
|
||||
for routes that match.
|
||||
- **Pipelock's TLS interception post-cutover.** Today pipelock
|
||||
MITMs the cred-proxy → upstream leg using its own CA. After the
|
||||
cutover, that leg starts as a CONNECT tunnel from egress-proxy
|
||||
(egress-proxy treats pipelock as a plain HTTPS forward proxy).
|
||||
Does pipelock still need to MITM? Probably no — egress-proxy
|
||||
already terminated, body content is already inspected upstream
|
||||
by egress-proxy's addons (or could be). But that means moving
|
||||
DLP from pipelock to egress-proxy, which expands egress-proxy's
|
||||
trust-domain *further*. Punted to the implementation PR to
|
||||
decide.
|
||||
- **Performance.** Two MITM hops in the worst case (agent ↔
|
||||
egress-proxy and pipelock ↔ upstream if pipelock keeps its
|
||||
interception). Measure under realistic load; if it's a problem,
|
||||
the answer is probably to disable pipelock's TLS interception
|
||||
and let it operate at hostname-only.
|
||||
- **Agent's existing dotfile rewrites.** Today cred-proxy
|
||||
provisions ~/.npmrc with `registry=http://cred-proxy:9099/npm/`,
|
||||
~/.gitconfig with `insteadOf` rules, etc. After the cutover
|
||||
none of those rewrites are strictly necessary for routing
|
||||
(HTTPS_PROXY catches everything), but they may still be useful
|
||||
for canonicalisation (so the agent's `npm install` doesn't
|
||||
surprise itself by talking to a different registry). Decide per
|
||||
dotfile in the migration PR.
|
||||
|
||||
## References
|
||||
|
||||
- PRD 0010 — cred-proxy (superseded by this PRD).
|
||||
- PRD 0014 — cred-proxy-block remediation (retargeted).
|
||||
- PRD 0013 — supervise plane (tool descriptions updated).
|
||||
- PR #25 — the supervise loop, whose `_apply_pipelock_url`
|
||||
docstring flagged the original "path filtering belongs
|
||||
somewhere" follow-up.
|
||||
- mitmproxy — https://mitmproxy.org/ — chosen as the egress-proxy
|
||||
engine because it's the canonical scriptable MITM forward proxy.
|
||||
Reference in New Issue
Block a user