From fe9d05664c43d3876ace894e28d197c87dedd355 Mon Sep 17 00:00:00 2001 From: didericis Date: Wed, 13 May 2026 00:40:16 -0400 Subject: [PATCH] docs: switch cred-proxy to sidecar shape Make the cred-proxy a per-bottle sidecar container on the bottle's internal docker network instead of a root-owned process inside the agent container. The boundary becomes container namespace separation, matching pipelock and git-gate. Update summary, problem, goals, in-scope, architecture diagram, components, existing code touched, external deps, and open questions; add a "Considered alternatives" section recording the rejected in-container shape. --- docs/prds/0010-cred-proxy.md | 233 ++++++++++++++++++++++------------- 1 file changed, 145 insertions(+), 88 deletions(-) diff --git a/docs/prds/0010-cred-proxy.md b/docs/prds/0010-cred-proxy.md index 8f1756b..760c2a8 100644 --- a/docs/prds/0010-cred-proxy.md +++ b/docs/prds/0010-cred-proxy.md @@ -6,15 +6,16 @@ ## Summary -Per-bottle reverse proxy that holds API tokens (Anthropic OAuth, -GitHub PAT, Gitea PAT, npm token) in a root-owned process inside -the agent container. The agent (`node`, UID 1000) keeps only URLs -in its environ; the proxy injects the right `Authorization` header -and forwards over TLS. The boundary that makes this meaningful is -the kernel's `ptrace_may_access` check: `node` cannot read root's -`/proc//environ` and cannot `ptrace` attach without -`CAP_SYS_PTRACE` / `CAP_PERFMON`, which claude-bottle does not -grant. +Per-bottle sidecar container that holds API tokens (Anthropic +OAuth, GitHub PAT, Gitea PAT, npm token). The agent container +keeps only URLs in its environ; the sidecar injects the right +`Authorization` header and forwards over TLS to the upstream. The +boundary is the container line — PID, mount, and network +namespaces separate the agent's container from the sidecar's, so +from inside the agent the sidecar's processes are not visible in +`/proc`, cannot be `ptrace`'d, and share no memory. Reaching the +sidecar's environ requires escaping the agent container — the same +threshold pipelock and git-gate already rely on. AWS / SigV4 is explicitly out of scope — it is per-request signing, not header injection, and does not fit this proxy's shape. If a @@ -34,11 +35,10 @@ nothing else). Linux has no per-env-var ACL — once a variable is in a process's environ, the process and its descendants own it. The credible -boundary is process-level: hold the credential in a different -process the agent cannot read. Default Docker already enforces -that boundary at the kernel line via `ptrace_may_access`, the -same property the (removed) ssh-gate and the current git-gate -rely on. +boundary is container-level: hold the credential in a separate +container the agent cannot reach. Default Docker's namespace +isolation enforces that — the same property pipelock and git-gate +already rely on. The research note [`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md) @@ -55,15 +55,16 @@ supported kinds (anthropic, github, gitea, npm): 1. **No plaintext tokens in the agent's environ.** `printenv` and `cat /proc/self/environ` from the agent's shell return only - URLs pointing at `127.0.0.1:/...`. None of the + URLs pointing at `cred-proxy:/...`. None of the `bottle.tokens[].TokenRef` values appear. -2. **Kernel boundary holds.** From the agent's shell, - `cat /proc//environ` returns `EACCES` and - `gdb -p ` / `strace -p ` fails - with `EPERM`. +2. **Container boundary holds.** From the agent's shell, `ps aux` + does not list the cred-proxy process; there is no `/proc/` + entry for it to read. The sidecar's hostname (`cred-proxy`) + resolves only on the bottle's internal network — from a + different bottle or from the host, the name does not resolve. 3. **Anthropic API works.** `claude` makes a successful streaming tool-use round-trip via `ANTHROPIC_BASE_URL` → - `127.0.0.1:/anthropic`. SSE chunks arrive without + `cred-proxy:/anthropic`. SSE chunks arrive without buffering; `anthropic-version`, `anthropic-beta`, and `X-Claude-Code-Session-Id` headers round-trip untouched. 4. **Git push to declared remotes works.** `git push` against a @@ -117,36 +118,41 @@ supported kinds (anthropic, github, gitea, npm): `npm`), an optional `Url` (required for `gitea`, defaulted for the others), and `TokenRef` (the name of a host env var the CLI resolves at launch time). -- **cred-proxy process.** Runs as root inside the agent - container, listens on `127.0.0.1:`. Holds the tokens in - its own environ — never on argv, never written to disk. +- **cred-proxy sidecar.** Runs as its own container on the + bottle's internal docker network with hostname `cred-proxy`, + listening on `0.0.0.0:` bound to the internal interface. + No host port published. Holds the tokens in the sidecar + container's environ — never on argv, never written to disk. Per-`Kind` route handler: inject the right header, forward - over TLS, stream the response back to the client without - buffering. + over TLS, stream the response back without buffering. - **Agent-side rewrites.** Provisioner writes: - - `ANTHROPIC_BASE_URL=http://127.0.0.1:/anthropic` to + - `ANTHROPIC_BASE_URL=http://cred-proxy:/anthropic` to the agent's environ - - `~/.npmrc` `registry = http://127.0.0.1:/npm/` + - `~/.npmrc` `registry = http://cred-proxy:/npm/` - `~/.gitconfig` `[url …] insteadOf = …` for each declared `github` / `gitea` upstream - `~/.config/tea/config.yml` with the proxy URL for each declared `gitea` entry -- **Process lifecycle.** Container entrypoint launches the proxy - first as root, waits for it to bind, then `exec setpriv … - --reuid=node --regid=node …` for the claude child. Proxy - death is fatal (the container exits); this is also the - PID-1-zombie story. -- **pipelock interop.** Drop `api.anthropic.com` from pipelock's - TLS-MITM list; keep it on the allowlist as a plain HTTPS host - (cred-proxy is the trust endpoint now). Verify pipelock still - lets cred-proxy's HTTPS connections out for the four upstream - hosts. +- **Sidecar lifecycle.** Mirrors `DockerGitGate` / + `DockerPipelockProxy` in shape: `prepare` is host-side and + side-effect-free; `start` does `docker create` + `docker start` + on the bottle's internal network with hostname `cred-proxy`; + `stop` is idempotent `docker rm -f`. Container name: + `claude-bottle-cred-proxy-`. The agent container starts + after the sidecar is up so DNS resolution succeeds on the + agent's first call. +- **pipelock interop.** cred-proxy's outbound HTTPS still + traverses pipelock — pipelock keeps its egress-allowlist role + for the four upstream hosts. Drop `api.anthropic.com` from + pipelock's TLS-MITM list (cred-proxy is now the trust endpoint + for that host); the host stays on the plain HTTPS allowlist. - **Plan rendering.** `bottle_plan.py` and the y/N preflight show: which tokens are configured (kind + ref name, not the value), the proxy port, the routes the proxy will publish. - **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in `prepare.py`.** Today it lands in the agent's environ; once - this PRD ships, it lands in the proxy's environ instead. + this PRD ships, it lands in the cred-proxy sidecar's environ + instead. - **Tests.** Integration tests for each of the six success criteria; unit tests for manifest parsing, route table generation, header injection. @@ -175,22 +181,23 @@ supported kinds (anthropic, github, gitea, npm): │ GITEA_SERVER_TOKEN, NPM_TOKEN │ │ │ docker run -e KEY (no =VALUE on argv) │ │ ▼ │ -│ ┌── Bottle container ────────────────────────────────────────┐ │ +│ ┌── per-bottle internal docker network ──────────────────────┐ │ │ │ │ │ -│ │ ┌── UID 1000 (node) ─────────────────────────────────┐ │ │ -│ │ │ claude --dangerously-skip-permissions │ │ │ +│ │ ┌── agent container ─────────────────────────────────┐ │ │ +│ │ │ claude as node (UID 1000) │ │ │ +│ │ │ --dangerously-skip-permissions │ │ │ │ │ │ environ: URLs only, no plaintext tokens │ │ │ -│ │ │ ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │ │ │ -│ │ │ npm registry → http://127.0.0.1:PORT/npm/ │ │ │ -│ │ │ git remote.url → http://127.0.0.1:PORT/... │ │ │ -│ │ │ tea --url → http://127.0.0.1:PORT/gitea │ │ │ +│ │ │ ANTHROPIC_BASE_URL=http://cred-proxy:PORT/an.. │ │ │ +│ │ │ npm registry → http://cred-proxy:PORT/npm/ │ │ │ +│ │ │ git insteadOf → http://cred-proxy:PORT/... │ │ │ +│ │ │ tea --url → http://cred-proxy:PORT/gite │ │ │ │ │ └────────────┬───────────────────────────────────────┘ │ │ -│ │ │ plain HTTP, loopback │ │ +│ │ │ HTTP, DNS → cred-proxy │ │ │ │ ▼ │ │ -│ │ ┌── UID 0 (root) ────────────────────────────────────┐ │ │ -│ │ │ cred-proxy listens 127.0.0.1:PORT │ │ │ -│ │ │ tokens live ONLY in this process's environ │ │ │ -│ │ │ per-route: inject auth header, forward over TLS │ │ │ +│ │ ┌── cred-proxy sidecar ──────────────────────────────┐ │ │ +│ │ │ distroless image, no shell, runs as root │ │ │ +│ │ │ hostname: cred-proxy listens 0.0.0.0:PORT │ │ │ +│ │ │ tokens live ONLY in this container's environ │ │ │ │ │ │ /anthropic → api.anthropic.com Bearer │ │ │ │ │ │ /gh-api → api.github.com Bearer │ │ │ │ │ │ /gh-git → github.com Bearer │ │ │ @@ -200,7 +207,7 @@ supported kinds (anthropic, github, gitea, npm): │ │ └────────────┬───────────────────────────────────────┘ │ │ │ │ │ HTTPS │ │ │ │ ▼ │ │ -│ │ ┌── pipelock (egress allowlist) ─────────────────────┐ │ │ +│ │ ┌── pipelock sidecar (egress allowlist) ─────────────┐ │ │ │ │ │ allow: api.anthropic.com, api.github.com, │ │ │ │ │ │ github.com, gitea.dideric.is, │ │ │ │ │ │ registry.npmjs.org │ │ │ @@ -213,35 +220,40 @@ supported kinds (anthropic, github, gitea, npm): Upstream APIs -Why node@1000 can't just steal the tokens: - ┌─────────────────────────────────────────────────────────┐ - │ node tries: │ - │ cat /proc//environ → EACCES │ - │ ptrace(PTRACE_ATTACH, , ...) → EPERM│ - │ Kernel's ptrace_may_access rejects: UID mismatch │ - │ and no CAP_SYS_PTRACE / CAP_PERFMON in the container. │ - └─────────────────────────────────────────────────────────┘ +Why the agent can't reach the sidecar's environ: + ┌───────────────────────────────────────────────────────────────┐ + │ Different container = different PID, mount, and network ns. │ + │ The agent's /proc shows only the agent's own processes; │ + │ the cred-proxy PID is not visible — no /proc//environ │ + │ to read, no PID to ptrace, no shared memory. │ + │ │ + │ Reaching the sidecar's environ requires escaping the agent │ + │ container — the same threshold pipelock and git-gate rely │ + │ on. Default Docker isolation is the boundary. │ + └───────────────────────────────────────────────────────────────┘ ``` ### New components - **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy` + `CredProxyPlan` dataclass. `prepare` is host-side and - side-effect-free on Docker; renders the route table and - resolves `TokenRef`s against host env. Mirrors the existing - `GitGate` / `Pipelock` shape. + side-effect-free; renders the route table and resolves + `TokenRef`s against host env. Mirrors the existing `GitGate` / + `Pipelock` shape. - **`claude_bottle/backend/docker/cred_proxy.py`** (new): - `DockerCredProxy` concrete subclass. Bakes the proxy binary - into the agent image; `start` writes the route table to a - mode-600 file under `stage_dir` and arranges the entrypoint - so the proxy boots first. + `DockerCredProxy` concrete subclass. `start` does + `docker create` on the bottle's internal network with hostname + `cred-proxy`, copies the route-table file into the container, + then `docker start`. `stop` is idempotent `docker rm -f`. + Container name: `claude-bottle-cred-proxy-`. - **`claude_bottle/backend/docker/provision/cred_proxy.py`** (new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`, `~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml` - into the agent's home for each declared kind. -- **The proxy binary itself.** Bundled into the agent image at - `/usr/local/libexec/cred-proxy`. See "External dependencies" - for the language choice. + into the agent's home for each declared kind — all pointing at + `http://cred-proxy:/...`. +- **cred-proxy image.** Minimal base + the proxy binary, no + shell. Pinned by digest, baked at build time. Footprint sized + to match git-gate's image rather than the full agent image. ### Existing code touched @@ -251,14 +263,17 @@ Why node@1000 can't just steal the tokens: carry multiple Urls). - **`claude_bottle/backend/docker/prepare.py`** — delete the `CLAUDE_BOTTLE_OAUTH_TOKEN` → `CLAUDE_CODE_OAUTH_TOKEN` branch - in the agent's forwarded env. The OAuth token now flows to - the proxy's environ via the cred-proxy lifecycle. + in the agent's forwarded env. The OAuth token is forwarded + into the cred-proxy sidecar's environ at sidecar `docker create` + time instead. - **`claude_bottle/backend/docker/backend.py`** — instantiate - `DockerCredProxy`; thread its `prepare` / `start` / `stop` + `DockerCredProxy` alongside `DockerPipelockProxy` and + `DockerGitGate`; thread its `prepare` / `start` / `stop` through `resolve_plan` / `launch`. - **`claude_bottle/backend/docker/launch.py`** — add cred-proxy - start before the cred-proxy provisioner runs (provisioner - writes URLs that reference the proxy port, so it must be up). + start/stop to the `ExitStack` alongside pipelock and git-gate; + the sidecar must be up before the agent container starts so + DNS resolution for `cred-proxy` succeeds on first contact. - **`claude_bottle/backend/docker/bottle_plan.py`** — new `CredProxyPlan` field; preflight shows kind + ref name + port + route table. @@ -330,14 +345,17 @@ The proxy binary. Two real options: no new pip packages. Matches CLAUDE.md's "bash-first, low-deps" posture. SSE pass-through is fiddly but doable. - **Go single binary** — cleaner SSE story, smaller runtime, - one static binary baked into the image. New build dependency. + one static binary in a scratch/distroless image. New build + dependency. -Default: Python, baked into the agent image. Reconsider in the -implementation PR if SSE behavior is troublesome under load. +Default: Python in a minimal `python:3.X-slim` image (or alpine +if we want smaller). Reconsider in the implementation PR if SSE +behavior is troublesome under load. No new Python packages. No DB. No admin API. The proxy's -configuration is a single mode-600 JSON file passed in via -`/run/cred-proxy/routes.json`. +configuration is a single mode-600 JSON file copied into the +sidecar at `docker create` time and read by the proxy at startup +from `/run/cred-proxy/routes.json`. ## Future work @@ -353,12 +371,51 @@ configuration is a single mode-600 JSON file passed in via PATs with TTL), have the proxy mint a fresh per-session child credential from a long-lived parent. - **Smolmachines colocation.** Same packing question as - pipelock / git-gate; the cred-proxy can sit inside the agent - VM (current shape) or in a separate VM (stricter isolation, - per-bottle TCP hop). Backend decision, not a manifest decision. + pipelock / git-gate; under a future microVM backend the + cred-proxy could share a VM with the agent (today's per-bottle + network gives it its own container, not its own VM) or sit in + its own VM (stricter isolation, an extra TCP hop). Backend + decision, not a manifest decision. - **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing pattern generalizes; add as needed. +## Considered alternatives + +### In-container proxy (root inside the agent container) + +Run cred-proxy as PID 1 of the agent container, listening on +`127.0.0.1:`, with claude exec'd as `node` (UID 1000) only +after the proxy is bound. The boundary in that shape is the +kernel's cross-UID `ptrace_may_access` check — `node` cannot read +root's `/proc//environ` and cannot `ptrace` attach. + +Pros: one less container per bottle; slightly faster bottle +startup; no extra docker create/start/stop dance. + +Rejected because: + +- **Weaker isolation.** The boundary collapses to UID separation + alone. Any container-root compromise inside the agent (setuid + bug in the image, accidentally mounted docker socket, a kernel + CVE, accidental `--privileged`) reads the proxy's environ via + `/proc//environ`. The sidecar's namespace separation + cannot be bypassed from inside the agent container without a + container escape. +- **Inconsistent with the existing topology.** pipelock and + git-gate are already sidecars on the bottle's internal network. + cred-proxy slots into the same shape and reuses the same + lifecycle abstractions (`BottleBackend.prepare/start/stop`, + `ExitStack` ordering, plan rendering). +- **Coupled to the agent image.** The proxy binary, its + entrypoint, and its priv-drop logic would all live in the + agent's Dockerfile. A sidecar image evolves independently — + agents can change base, language, or tooling without touching + the proxy. +- **PID-1 babysitting.** The "proxy supervises, then `exec + setpriv → node`" entrypoint introduces a class of issues + (zombie reaping, signal forwarding, exit-code propagation) that + the sidecar shape avoids. + ## Open questions - **Field name.** `bottle.tokens` is the working name. The @@ -368,11 +425,11 @@ configuration is a single mode-600 JSON file passed in via `bottle.cred_proxy`. Default: `bottle.tokens`. - **Python vs Go for the proxy.** Default: Python, revisit during implementation if SSE pass-through is unreliable. -- **Process inside the agent container vs sidecar container.** - v1: inside (simpler lifecycle, no extra container; ptrace - boundary is enough). The sidecar option becomes attractive - only if we want a network-layer split between proxy and agent - on top of the UID split. +- **Sidecar image base.** Distroless (smallest, no shell — hardest + to debug), Python slim (debuggable, larger), or scratch + a + statically-linked Go binary (smallest if Go). Default: whatever + fits the chosen language with the smallest non-shell base; + revisit if debuggability bites during implementation. - **Belt-and-braces on outbound telemetry.** Set `CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and `DISABLE_ERROR_REPORTING=1` in the agent's environ by