docs: switch cred-proxy to sidecar shape
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 28s

Make the cred-proxy a per-bottle sidecar container on the bottle's
internal docker network instead of a root-owned process inside the
agent container. The boundary becomes container namespace
separation, matching pipelock and git-gate. Update summary,
problem, goals, in-scope, architecture diagram, components,
existing code touched, external deps, and open questions; add a
"Considered alternatives" section recording the rejected
in-container shape.
This commit is contained in:
2026-05-13 00:40:16 -04:00
parent 7dc3914abc
commit fe9d05664c
+145 -88
View File
@@ -6,15 +6,16 @@
## Summary
Per-bottle reverse proxy that holds API tokens (Anthropic OAuth,
GitHub PAT, Gitea PAT, npm token) in a root-owned process inside
the agent container. The agent (`node`, UID 1000) keeps only URLs
in its environ; the proxy injects the right `Authorization` header
and forwards over TLS. The boundary that makes this meaningful is
the kernel's `ptrace_may_access` check: `node` cannot read root's
`/proc/<pid>/environ` and cannot `ptrace` attach without
`CAP_SYS_PTRACE` / `CAP_PERFMON`, which claude-bottle does not
grant.
Per-bottle sidecar container that holds API tokens (Anthropic
OAuth, GitHub PAT, Gitea PAT, npm token). The agent container
keeps only URLs in its environ; the sidecar injects the right
`Authorization` header and forwards over TLS to the upstream. The
boundary is the container line — PID, mount, and network
namespaces separate the agent's container from the sidecar's, so
from inside the agent the sidecar's processes are not visible in
`/proc`, cannot be `ptrace`'d, and share no memory. Reaching the
sidecar's environ requires escaping the agent container — the same
threshold pipelock and git-gate already rely on.
AWS / SigV4 is explicitly out of scope — it is per-request signing,
not header injection, and does not fit this proxy's shape. If a
@@ -34,11 +35,10 @@ nothing else).
Linux has no per-env-var ACL — once a variable is in a process's
environ, the process and its descendants own it. The credible
boundary is process-level: hold the credential in a different
process the agent cannot read. Default Docker already enforces
that boundary at the kernel line via `ptrace_may_access`, the
same property the (removed) ssh-gate and the current git-gate
rely on.
boundary is container-level: hold the credential in a separate
container the agent cannot reach. Default Docker's namespace
isolation enforces that the same property pipelock and git-gate
already rely on.
The research note
[`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
@@ -55,15 +55,16 @@ supported kinds (anthropic, github, gitea, npm):
1. **No plaintext tokens in the agent's environ.** `printenv` and
`cat /proc/self/environ` from the agent's shell return only
URLs pointing at `127.0.0.1:<PORT>/...`. None of the
URLs pointing at `cred-proxy:<PORT>/...`. None of the
`bottle.tokens[].TokenRef` values appear.
2. **Kernel boundary holds.** From the agent's shell,
`cat /proc/<cred-proxy-pid>/environ` returns `EACCES` and
`gdb -p <cred-proxy-pid>` / `strace -p <cred-proxy-pid>` fails
with `EPERM`.
2. **Container boundary holds.** From the agent's shell, `ps aux`
does not list the cred-proxy process; there is no `/proc/<X>`
entry for it to read. The sidecar's hostname (`cred-proxy`)
resolves only on the bottle's internal network — from a
different bottle or from the host, the name does not resolve.
3. **Anthropic API works.** `claude` makes a successful streaming
tool-use round-trip via `ANTHROPIC_BASE_URL`
`127.0.0.1:<PORT>/anthropic`. SSE chunks arrive without
`cred-proxy:<PORT>/anthropic`. SSE chunks arrive without
buffering; `anthropic-version`, `anthropic-beta`, and
`X-Claude-Code-Session-Id` headers round-trip untouched.
4. **Git push to declared remotes works.** `git push` against a
@@ -117,36 +118,41 @@ supported kinds (anthropic, github, gitea, npm):
`npm`), an optional `Url` (required for `gitea`, defaulted for
the others), and `TokenRef` (the name of a host env var the
CLI resolves at launch time).
- **cred-proxy process.** Runs as root inside the agent
container, listens on `127.0.0.1:<PORT>`. Holds the tokens in
its own environ — never on argv, never written to disk.
- **cred-proxy sidecar.** Runs as its own container on the
bottle's internal docker network with hostname `cred-proxy`,
listening on `0.0.0.0:<PORT>` bound to the internal interface.
No host port published. Holds the tokens in the sidecar
container's environ — never on argv, never written to disk.
Per-`Kind` route handler: inject the right header, forward
over TLS, stream the response back to the client without
buffering.
over TLS, stream the response back without buffering.
- **Agent-side rewrites.** Provisioner writes:
- `ANTHROPIC_BASE_URL=http://127.0.0.1:<PORT>/anthropic` to
- `ANTHROPIC_BASE_URL=http://cred-proxy:<PORT>/anthropic` to
the agent's environ
- `~/.npmrc` `registry = http://127.0.0.1:<PORT>/npm/`
- `~/.npmrc` `registry = http://cred-proxy:<PORT>/npm/`
- `~/.gitconfig` `[url …] insteadOf = …` for each declared
`github` / `gitea` upstream
- `~/.config/tea/config.yml` with the proxy URL for each
declared `gitea` entry
- **Process lifecycle.** Container entrypoint launches the proxy
first as root, waits for it to bind, then `exec setpriv …
--reuid=node --regid=node …` for the claude child. Proxy
death is fatal (the container exits); this is also the
PID-1-zombie story.
- **pipelock interop.** Drop `api.anthropic.com` from pipelock's
TLS-MITM list; keep it on the allowlist as a plain HTTPS host
(cred-proxy is the trust endpoint now). Verify pipelock still
lets cred-proxy's HTTPS connections out for the four upstream
hosts.
- **Sidecar lifecycle.** Mirrors `DockerGitGate` /
`DockerPipelockProxy` in shape: `prepare` is host-side and
side-effect-free; `start` does `docker create` + `docker start`
on the bottle's internal network with hostname `cred-proxy`;
`stop` is idempotent `docker rm -f`. Container name:
`claude-bottle-cred-proxy-<slug>`. The agent container starts
after the sidecar is up so DNS resolution succeeds on the
agent's first call.
- **pipelock interop.** cred-proxy's outbound HTTPS still
traverses pipelock — pipelock keeps its egress-allowlist role
for the four upstream hosts. Drop `api.anthropic.com` from
pipelock's TLS-MITM list (cred-proxy is now the trust endpoint
for that host); the host stays on the plain HTTPS allowlist.
- **Plan rendering.** `bottle_plan.py` and the y/N preflight
show: which tokens are configured (kind + ref name, not the
value), the proxy port, the routes the proxy will publish.
- **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in
`prepare.py`.** Today it lands in the agent's environ; once
this PRD ships, it lands in the proxy's environ instead.
this PRD ships, it lands in the cred-proxy sidecar's environ
instead.
- **Tests.** Integration tests for each of the six success
criteria; unit tests for manifest parsing, route table
generation, header injection.
@@ -175,22 +181,23 @@ supported kinds (anthropic, github, gitea, npm):
│ GITEA_SERVER_TOKEN, NPM_TOKEN │
│ │ docker run -e KEY (no =VALUE on argv) │
│ ▼ │
│ ┌── Bottle container ────────────────────────────────────────┐ │
│ ┌── per-bottle internal docker network ──────────────────────┐ │
│ │ │ │
│ │ ┌── UID 1000 (node) ─────────────────────────────────┐ │ │
│ │ │ claude --dangerously-skip-permissions │ │ │
│ │ ┌── agent container ─────────────────────────────────┐ │ │
│ │ │ claude as node (UID 1000) │ │ │
│ │ │ --dangerously-skip-permissions │ │ │
│ │ │ environ: URLs only, no plaintext tokens │ │ │
│ │ │ ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │ │ │
│ │ │ npm registry → http://127.0.0.1:PORT/npm/ │ │ │
│ │ │ git remote.url → http://127.0.0.1:PORT/... │ │ │
│ │ │ tea --url → http://127.0.0.1:PORT/gitea │ │ │
│ │ │ ANTHROPIC_BASE_URL=http://cred-proxy:PORT/an.. │ │ │
│ │ │ npm registry → http://cred-proxy:PORT/npm/ │ │ │
│ │ │ git insteadOf → http://cred-proxy:PORT/... │ │ │
│ │ │ tea --url → http://cred-proxy:PORT/gite │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ plain HTTP, loopback │ │
│ │ │ HTTP, DNS → cred-proxy │ │
│ │ ▼ │ │
│ │ ┌── UID 0 (root) ────────────────────────────────────┐ │ │
│ │ │ cred-proxy listens 127.0.0.1:PORT │ │ │
│ │ │ tokens live ONLY in this process's environ │ │ │
│ │ │ per-route: inject auth header, forward over TLS │ │ │
│ │ ┌── cred-proxy sidecar ──────────────────────────────┐ │ │
│ │ │ distroless image, no shell, runs as root │ │ │
│ │ │ hostname: cred-proxy listens 0.0.0.0:PORT │ │ │
│ │ │ tokens live ONLY in this container's environ │ │ │
│ │ │ /anthropic → api.anthropic.com Bearer │ │ │
│ │ │ /gh-api → api.github.com Bearer │ │ │
│ │ │ /gh-git → github.com Bearer │ │ │
@@ -200,7 +207,7 @@ supported kinds (anthropic, github, gitea, npm):
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ HTTPS │ │
│ │ ▼ │ │
│ │ ┌── pipelock (egress allowlist) ─────────────────────┐ │ │
│ │ ┌── pipelock sidecar (egress allowlist) ─────────────┐ │ │
│ │ │ allow: api.anthropic.com, api.github.com, │ │ │
│ │ │ github.com, gitea.dideric.is, │ │ │
│ │ │ registry.npmjs.org │ │ │
@@ -213,35 +220,40 @@ supported kinds (anthropic, github, gitea, npm):
Upstream APIs
Why node@1000 can't just steal the tokens:
┌─────────────────────────────────────────────────────────┐
node tries:
cat /proc/<cred-proxy-pid>/environ → EACCES
ptrace(PTRACE_ATTACH, <cred-proxy-pid>, ...) → EPERM
Kernel's ptrace_may_access rejects: UID mismatch
and no CAP_SYS_PTRACE / CAP_PERFMON in the container.
└─────────────────────────────────────────────────────────┘
Why the agent can't reach the sidecar's environ:
┌───────────────────────────────────────────────────────────────
Different container = different PID, mount, and network ns.
The agent's /proc shows only the agent's own processes;
the cred-proxy PID is not visible — no /proc/<X>/environ
to read, no PID to ptrace, no shared memory.
│ Reaching the sidecar's environ requires escaping the agent │
│ container — the same threshold pipelock and git-gate rely │
│ on. Default Docker isolation is the boundary. │
└───────────────────────────────────────────────────────────────┘
```
### New components
- **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy`
+ `CredProxyPlan` dataclass. `prepare` is host-side and
side-effect-free on Docker; renders the route table and
resolves `TokenRef`s against host env. Mirrors the existing
`GitGate` / `Pipelock` shape.
side-effect-free; renders the route table and resolves
`TokenRef`s against host env. Mirrors the existing `GitGate` /
`Pipelock` shape.
- **`claude_bottle/backend/docker/cred_proxy.py`** (new):
`DockerCredProxy` concrete subclass. Bakes the proxy binary
into the agent image; `start` writes the route table to a
mode-600 file under `stage_dir` and arranges the entrypoint
so the proxy boots first.
`DockerCredProxy` concrete subclass. `start` does
`docker create` on the bottle's internal network with hostname
`cred-proxy`, copies the route-table file into the container,
then `docker start`. `stop` is idempotent `docker rm -f`.
Container name: `claude-bottle-cred-proxy-<slug>`.
- **`claude_bottle/backend/docker/provision/cred_proxy.py`**
(new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`,
`~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml`
into the agent's home for each declared kind.
- **The proxy binary itself.** Bundled into the agent image at
`/usr/local/libexec/cred-proxy`. See "External dependencies"
for the language choice.
into the agent's home for each declared kind — all pointing at
`http://cred-proxy:<PORT>/...`.
- **cred-proxy image.** Minimal base + the proxy binary, no
shell. Pinned by digest, baked at build time. Footprint sized
to match git-gate's image rather than the full agent image.
### Existing code touched
@@ -251,14 +263,17 @@ Why node@1000 can't just steal the tokens:
carry multiple Urls).
- **`claude_bottle/backend/docker/prepare.py`** — delete the
`CLAUDE_BOTTLE_OAUTH_TOKEN``CLAUDE_CODE_OAUTH_TOKEN` branch
in the agent's forwarded env. The OAuth token now flows to
the proxy's environ via the cred-proxy lifecycle.
in the agent's forwarded env. The OAuth token is forwarded
into the cred-proxy sidecar's environ at sidecar `docker create`
time instead.
- **`claude_bottle/backend/docker/backend.py`** — instantiate
`DockerCredProxy`; thread its `prepare` / `start` / `stop`
`DockerCredProxy` alongside `DockerPipelockProxy` and
`DockerGitGate`; thread its `prepare` / `start` / `stop`
through `resolve_plan` / `launch`.
- **`claude_bottle/backend/docker/launch.py`** — add cred-proxy
start before the cred-proxy provisioner runs (provisioner
writes URLs that reference the proxy port, so it must be up).
start/stop to the `ExitStack` alongside pipelock and git-gate;
the sidecar must be up before the agent container starts so
DNS resolution for `cred-proxy` succeeds on first contact.
- **`claude_bottle/backend/docker/bottle_plan.py`** — new
`CredProxyPlan` field; preflight shows kind + ref name +
port + route table.
@@ -330,14 +345,17 @@ The proxy binary. Two real options:
no new pip packages. Matches CLAUDE.md's "bash-first, low-deps"
posture. SSE pass-through is fiddly but doable.
- **Go single binary** — cleaner SSE story, smaller runtime,
one static binary baked into the image. New build dependency.
one static binary in a scratch/distroless image. New build
dependency.
Default: Python, baked into the agent image. Reconsider in the
implementation PR if SSE behavior is troublesome under load.
Default: Python in a minimal `python:3.X-slim` image (or alpine
if we want smaller). Reconsider in the implementation PR if SSE
behavior is troublesome under load.
No new Python packages. No DB. No admin API. The proxy's
configuration is a single mode-600 JSON file passed in via
`/run/cred-proxy/routes.json`.
configuration is a single mode-600 JSON file copied into the
sidecar at `docker create` time and read by the proxy at startup
from `/run/cred-proxy/routes.json`.
## Future work
@@ -353,12 +371,51 @@ configuration is a single mode-600 JSON file passed in via
PATs with TTL), have the proxy mint a fresh per-session
child credential from a long-lived parent.
- **Smolmachines colocation.** Same packing question as
pipelock / git-gate; the cred-proxy can sit inside the agent
VM (current shape) or in a separate VM (stricter isolation,
per-bottle TCP hop). Backend decision, not a manifest decision.
pipelock / git-gate; under a future microVM backend the
cred-proxy could share a VM with the agent (today's per-bottle
network gives it its own container, not its own VM) or sit in
its own VM (stricter isolation, an extra TCP hop). Backend
decision, not a manifest decision.
- **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing
pattern generalizes; add as needed.
## Considered alternatives
### In-container proxy (root inside the agent container)
Run cred-proxy as PID 1 of the agent container, listening on
`127.0.0.1:<PORT>`, with claude exec'd as `node` (UID 1000) only
after the proxy is bound. The boundary in that shape is the
kernel's cross-UID `ptrace_may_access` check — `node` cannot read
root's `/proc/<pid>/environ` and cannot `ptrace` attach.
Pros: one less container per bottle; slightly faster bottle
startup; no extra docker create/start/stop dance.
Rejected because:
- **Weaker isolation.** The boundary collapses to UID separation
alone. Any container-root compromise inside the agent (setuid
bug in the image, accidentally mounted docker socket, a kernel
CVE, accidental `--privileged`) reads the proxy's environ via
`/proc/<pid>/environ`. The sidecar's namespace separation
cannot be bypassed from inside the agent container without a
container escape.
- **Inconsistent with the existing topology.** pipelock and
git-gate are already sidecars on the bottle's internal network.
cred-proxy slots into the same shape and reuses the same
lifecycle abstractions (`BottleBackend.prepare/start/stop`,
`ExitStack` ordering, plan rendering).
- **Coupled to the agent image.** The proxy binary, its
entrypoint, and its priv-drop logic would all live in the
agent's Dockerfile. A sidecar image evolves independently —
agents can change base, language, or tooling without touching
the proxy.
- **PID-1 babysitting.** The "proxy supervises, then `exec
setpriv → node`" entrypoint introduces a class of issues
(zombie reaping, signal forwarding, exit-code propagation) that
the sidecar shape avoids.
## Open questions
- **Field name.** `bottle.tokens` is the working name. The
@@ -368,11 +425,11 @@ configuration is a single mode-600 JSON file passed in via
`bottle.cred_proxy`. Default: `bottle.tokens`.
- **Python vs Go for the proxy.** Default: Python, revisit
during implementation if SSE pass-through is unreliable.
- **Process inside the agent container vs sidecar container.**
v1: inside (simpler lifecycle, no extra container; ptrace
boundary is enough). The sidecar option becomes attractive
only if we want a network-layer split between proxy and agent
on top of the UID split.
- **Sidecar image base.** Distroless (smallest, no shell — hardest
to debug), Python slim (debuggable, larger), or scratch + a
statically-linked Go binary (smallest if Go). Default: whatever
fits the chosen language with the smallest non-shell base;
revisit if debuggability bites during implementation.
- **Belt-and-braces on outbound telemetry.** Set
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and
`DISABLE_ERROR_REPORTING=1` in the agent's environ by