From 2a687449d4d062281a1834bfc2a6216d35ffca62 Mon Sep 17 00:00:00 2001 From: didericis Date: Wed, 13 May 2026 00:18:55 -0400 Subject: [PATCH] docs: add PRD 0010 for credential proxy Per-bottle reverse proxy that holds API tokens (Anthropic OAuth, GitHub PAT, Gitea PAT, npm) in a root-owned process; agent gets only URLs in its environ. AWS / SigV4 explicitly out of scope. --- docs/prds/0010-cred-proxy.md | 420 +++++++++++++++++++++++++++++++++++ 1 file changed, 420 insertions(+) create mode 100644 docs/prds/0010-cred-proxy.md diff --git a/docs/prds/0010-cred-proxy.md b/docs/prds/0010-cred-proxy.md new file mode 100644 index 0000000..65f153f --- /dev/null +++ b/docs/prds/0010-cred-proxy.md @@ -0,0 +1,420 @@ +# PRD 0010: Credential proxy for agent-bound API tokens + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-13 + +## Summary + +Per-bottle reverse proxy that holds API tokens (Anthropic OAuth, +GitHub PAT, Gitea PAT, npm token) in a root-owned process inside +the agent container. The agent (`node`, UID 1000) keeps only URLs +in its environ; the proxy injects the right `Authorization` header +and forwards over TLS. The boundary that makes this meaningful is +the kernel's `ptrace_may_access` check: `node` cannot read root's +`/proc//environ` and cannot `ptrace` attach without +`CAP_SYS_PTRACE` / `CAP_PERFMON`, which claude-bottle does not +grant. + +AWS / SigV4 is explicitly out of scope — it is per-request signing, +not header injection, and does not fit this proxy's shape. If a +bottle needs AWS credentials later, that lives in a separate PRD. + +## Problem + +Today `CLAUDE_CODE_OAUTH_TOKEN` (and any `bottle.env` secrets such +as a Gitea PAT, GitHub PAT, or npm token) gets `docker run -e`'d +straight into the agent's environ. Inside the bottle the agent +runs as `node` with `--dangerously-skip-permissions`; its Bash +tool can do `printenv`, `cat /proc/self/environ`, or +`node -e 'console.log(process.env)'` and capture every value into +the conversation. From there a prompt-injected or hijacked agent +can exfil over any allowed egress (api.anthropic.com itself if +nothing else). + +Linux has no per-env-var ACL — once a variable is in a process's +environ, the process and its descendants own it. The credible +boundary is process-level: hold the credential in a different +process the agent cannot read. Default Docker already enforces +that boundary at the kernel line via `ptrace_may_access`, the +same property the (removed) ssh-gate and the current git-gate +rely on. + +The research note +[`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md) +surveys the existing tools and concludes that a small +claude-bottle-specific reverse proxy is less work and less risk +than either adopting nono (alpha, unaudited) or Infisical Agent +Vault (TLS-MITM topology that doubles up on pipelock's CA stack). +This PRD is the build. + +## Goals / Success Criteria + +Each test runs inside a bottle whose manifest declares the four +supported kinds (anthropic, github, gitea, npm): + +1. **No plaintext tokens in the agent's environ.** `printenv` and + `cat /proc/self/environ` from the agent's shell return only + URLs pointing at `127.0.0.1:/...`. None of the + `bottle.tokens[].TokenRef` values appear. +2. **Kernel boundary holds.** From the agent's shell, + `cat /proc//environ` returns `EACCES` and + `gdb -p ` / `strace -p ` fails + with `EPERM`. +3. **Anthropic API works.** `claude` makes a successful streaming + tool-use round-trip via `ANTHROPIC_BASE_URL` → + `127.0.0.1:/anthropic`. SSE chunks arrive without + buffering; `anthropic-version`, `anthropic-beta`, and + `X-Claude-Code-Session-Id` headers round-trip untouched. +4. **Git push to declared remotes works.** `git push` against a + `bottle.tokens[].Kind: github` or `gitea` upstream succeeds; + the upstream sees the gate's token, not the agent's. +5. **npm install works.** `npm install ` + succeeds against the registry pointed at the proxy. A scoped + install that requires the token (e.g. against a private + registry) also succeeds. +6. **Wrong token rejected at the source, not silently swapped.** + If the agent tries to send its own `Authorization: …` header, + the proxy strips and replaces with the configured one. A + manifest token revoked at the upstream produces a 401 to the + agent, not a 5xx. + +## Non-goals + +- **AWS / SigV4.** Per-request signing is a different shape; a + bearer-injecting proxy doesn't help. Hold for a future PRD + (likely an IMDS emulator sidecar handing out short-lived STS + credentials). +- **DB-backed credential store.** Flat env / mode-600 file only. + The LiteLLM CVE-2026-42208 incident is the cautionary tale: + any DB-backed credential gateway is itself a high-value attack + target. +- **Generic LLM-gateway features.** No cost tracking, no + fallbacks, no virtual keys, no multi-tenant routing, no usage + metering. The proxy is a credential-injection trust endpoint, + not a gateway. +- **Subsuming pipelock.** pipelock keeps its egress-allowlist + role. It drops the `api.anthropic.com` TLS-MITM job because + cred-proxy is now the trust endpoint for that host; everything + else pipelock does stays. +- **TLS interception inside the bottle.** The agent talks plain + HTTP to loopback; cred-proxy speaks real HTTPS outbound. No + container-local CA, no `golang/go#28866` loopback workaround. +- **Cross-bottle credential sharing.** One proxy per bottle, same + one-sidecar-per-agent posture as pipelock and git-gate. +- **`claude --bare` mode.** Reads only `ANTHROPIC_API_KEY`, not + the OAuth token. Not in claude-bottle's flow today. +- **MCP-server tokens, package-installer tokens for languages + beyond npm.** PyPI / Bun / cargo can land in a follow-up if + needed; the routing pattern generalizes. + +## Scope + +### In scope + +- **Manifest field.** `bottle.tokens: [TokenEntry, ...]`. Each + entry carries `Kind` (`anthropic` | `github` | `gitea` | + `npm`), an optional `Url` (required for `gitea`, defaulted for + the others), and `TokenRef` (the name of a host env var the + CLI resolves at launch time). +- **cred-proxy process.** Runs as root inside the agent + container, listens on `127.0.0.1:`. Holds the tokens in + its own environ — never on argv, never written to disk. + Per-`Kind` route handler: inject the right header, forward + over TLS, stream the response back to the client without + buffering. +- **Agent-side rewrites.** Provisioner writes: + - `ANTHROPIC_BASE_URL=http://127.0.0.1:/anthropic` to + the agent's environ + - `~/.npmrc` `registry = http://127.0.0.1:/npm/` + - `~/.gitconfig` `[url …] insteadOf = …` for each declared + `github` / `gitea` upstream + - `~/.config/tea/config.yml` with the proxy URL for each + declared `gitea` entry +- **Process lifecycle.** Container entrypoint launches the proxy + first as root, waits for it to bind, then `exec setpriv … + --reuid=node --regid=node …` for the claude child. Proxy + death is fatal (the container exits); this is also the + PID-1-zombie story. +- **pipelock interop.** Drop `api.anthropic.com` from pipelock's + TLS-MITM list; keep it on the allowlist as a plain HTTPS host + (cred-proxy is the trust endpoint now). Verify pipelock still + lets cred-proxy's HTTPS connections out for the four upstream + hosts. +- **Plan rendering.** `bottle_plan.py` and the y/N preflight + show: which tokens are configured (kind + ref name, not the + value), the proxy port, the routes the proxy will publish. +- **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in + `prepare.py`.** Today it lands in the agent's environ; once + this PRD ships, it lands in the proxy's environ instead. +- **Tests.** Integration tests for each of the six success + criteria; unit tests for manifest parsing, route table + generation, header injection. + +### Out of scope + +- AWS / SigV4 (see Non-goals). +- Per-method / per-path allowlist *inside* a kind. Defer to a + follow-up once observed traffic stabilizes. +- Replacing `bottle.env` for non-token secrets. The proxy + handles the four kinds listed above; other env vars keep their + current path. +- Migrating an in-flight bottle from "token in agent env" to + "token via proxy" mid-session. Restart required. +- Audit logging. The proxy doesn't write request logs in v1. + Add only if a concrete debugging need surfaces. + +## Proposed Design + +### Architecture + +``` +┌── Host (macOS) ──────────────────────────────────────────────────┐ +│ Secrets at rest (keychain / .env): │ +│ CLAUDE_BOTTLE_OAUTH_TOKEN, GITHUB_TOKEN, │ +│ GITEA_SERVER_TOKEN, NPM_TOKEN │ +│ │ docker run -e KEY (no =VALUE on argv) │ +│ ▼ │ +│ ┌── Bottle container ────────────────────────────────────────┐ │ +│ │ │ │ +│ │ ┌── UID 1000 (node) ─────────────────────────────────┐ │ │ +│ │ │ claude --dangerously-skip-permissions │ │ │ +│ │ │ environ: URLs only, no plaintext tokens │ │ │ +│ │ │ ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │ │ │ +│ │ │ npm registry → http://127.0.0.1:PORT/npm/ │ │ │ +│ │ │ git remote.url → http://127.0.0.1:PORT/... │ │ │ +│ │ │ tea --url → http://127.0.0.1:PORT/gitea │ │ │ +│ │ └────────────┬───────────────────────────────────────┘ │ │ +│ │ │ plain HTTP, loopback │ │ +│ │ ▼ │ │ +│ │ ┌── UID 0 (root) ────────────────────────────────────┐ │ │ +│ │ │ cred-proxy listens 127.0.0.1:PORT │ │ │ +│ │ │ tokens live ONLY in this process's environ │ │ │ +│ │ │ per-route: inject auth header, forward over TLS │ │ │ +│ │ │ /anthropic → api.anthropic.com Bearer │ │ │ +│ │ │ /gh-api → api.github.com Bearer │ │ │ +│ │ │ /gh-git → github.com Bearer │ │ │ +│ │ │ /gitea → gitea.dideric.is token │ │ │ +│ │ │ /npm → registry.npmjs.org Bearer │ │ │ +│ │ │ SSE pass-through, no buffering │ │ │ +│ │ └────────────┬───────────────────────────────────────┘ │ │ +│ │ │ HTTPS │ │ +│ │ ▼ │ │ +│ │ ┌── pipelock (egress allowlist) ─────────────────────┐ │ │ +│ │ │ allow: api.anthropic.com, api.github.com, │ │ │ +│ │ │ github.com, gitea.dideric.is, │ │ │ +│ │ │ registry.npmjs.org │ │ │ +│ │ │ block: statsig, sentry, autoupdater, * │ │ │ +│ │ └────────────┬───────────────────────────────────────┘ │ │ +│ └────────────────┼──────────────────────────────────────────┘ │ +│ ▼ │ +└────────────────────┼─────────────────────────────────────────────┘ + ▼ + Upstream APIs + + +Why node@1000 can't just steal the tokens: + ┌─────────────────────────────────────────────────────────┐ + │ node tries: │ + │ cat /proc//environ → EACCES │ + │ ptrace(PTRACE_ATTACH, , ...) → EPERM│ + │ Kernel's ptrace_may_access rejects: UID mismatch │ + │ and no CAP_SYS_PTRACE / CAP_PERFMON in the container. │ + └─────────────────────────────────────────────────────────┘ +``` + +### New components + +- **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy` + + `CredProxyPlan` dataclass. `prepare` is host-side and + side-effect-free on Docker; renders the route table and + resolves `TokenRef`s against host env. Mirrors the existing + `GitGate` / `Pipelock` shape. +- **`claude_bottle/backend/docker/cred_proxy.py`** (new): + `DockerCredProxy` concrete subclass. Bakes the proxy binary + into the agent image; `start` writes the route table to a + mode-600 file under `stage_dir` and arranges the entrypoint + so the proxy boots first. +- **`claude_bottle/backend/docker/provision/cred_proxy.py`** + (new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`, + `~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml` + into the agent's home for each declared kind. +- **The proxy binary itself.** Bundled into the agent image at + `/usr/local/libexec/cred-proxy`. See "External dependencies" + for the language choice. + +### Existing code touched + +- **`claude_bottle/manifest.py`** — add `TokenEntry`, + `Bottle.tokens: tuple[TokenEntry, ...] = ()`, parse + validate + (at most one entry per `Kind` except `gitea`, which may + carry multiple Urls). +- **`claude_bottle/backend/docker/prepare.py`** — delete the + `CLAUDE_BOTTLE_OAUTH_TOKEN` → `CLAUDE_CODE_OAUTH_TOKEN` branch + in the agent's forwarded env. The OAuth token now flows to + the proxy's environ via the cred-proxy lifecycle. +- **`claude_bottle/backend/docker/backend.py`** — instantiate + `DockerCredProxy`; thread its `prepare` / `start` / `stop` + through `resolve_plan` / `launch`. +- **`claude_bottle/backend/docker/launch.py`** — add cred-proxy + start before the cred-proxy provisioner runs (provisioner + writes URLs that reference the proxy port, so it must be up). +- **`claude_bottle/backend/docker/bottle_plan.py`** — new + `CredProxyPlan` field; preflight shows kind + ref name + + port + route table. +- **`claude_bottle/pipelock.py`** — drop the `api.anthropic.com` + TLS-MITM branch; the host stays on the allowlist as a plain + HTTPS destination. Confirm the four upstream hosts are + allowlisted by default when `bottle.tokens` declares them. +- **`README.md`** — replace the architecture diagram with the + one above; document the `bottle.tokens` field. +- **`claude-bottle.example.json`** — add a `tokens` array to + one bottle showing each Kind. +- **Tests** — new unit tests for manifest parsing, route table + generation, header injection; new integration tests for the + six success criteria. Delete the bits of `prepare.py` tests + that asserted on `CLAUDE_CODE_OAUTH_TOKEN` landing in the + agent's env. + +### Data model changes + +```python +@dataclass(frozen=True) +class TokenEntry: + Kind: Literal["anthropic", "github", "gitea", "npm"] + TokenRef: str # name of host env var + Url: str | None = None # required for gitea; defaulted otherwise + +@dataclass(frozen=True) +class Bottle: + ... + tokens: tuple[TokenEntry, ...] = () +``` + +Validation: + +- `Kind` must be one of the four supported values. +- `TokenRef` must resolve against `os.environ` at launch (fail + fast with a clear "host env var X is unset" if missing). +- `gitea` entries require `Url`; others fall back to the + documented upstream. +- At most one entry per `Kind` except `gitea`, which may have + multiple distinct `Url`s. +- No silent overlap with `bottle.git` upstreams that already + flow through git-gate; if a `tokens[].Kind: github|gitea` + entry's `Url` collides with a `git[].Upstream`'s host, parse + fails with a "git-gate already brokers this remote, drop one" + hint. (Both paths broker credentials; doubling up is a + configuration smell, not a feature.) + +### Routing table + +| Kind | Proxy path | Upstream | Header | +|-----------|----------------|-------------------------|----------------------------| +| anthropic | `/anthropic/` | `api.anthropic.com` | `Authorization: Bearer …` | +| github | `/gh-api/` | `api.github.com` | `Authorization: Bearer …` | +| github | `/gh-git/` | `github.com` | `Authorization: Bearer …` | +| gitea | `/gitea/` | configured `Url` | `Authorization: token …` | +| npm | `/npm/` | `registry.npmjs.org` | `Authorization: Bearer …` | + +Gitea uses `Authorization: token` rather than `Bearer` to +sidestep `go-gitea/gitea#16734`. The proxy strips any incoming +`Authorization` header before injecting its own — the agent +cannot smuggle a stolen token through this path. + +### External dependencies + +The proxy binary. Two real options: + +- **Python (stdlib)** — `http.server` + `urllib`/`http.client`, + no new pip packages. Matches CLAUDE.md's "bash-first, low-deps" + posture. SSE pass-through is fiddly but doable. +- **Go single binary** — cleaner SSE story, smaller runtime, + one static binary baked into the image. New build dependency. + +Default: Python, baked into the agent image. Reconsider in the +implementation PR if SSE behavior is troublesome under load. + +No new Python packages. No DB. No admin API. The proxy's +configuration is a single mode-600 JSON file passed in via +`/run/cred-proxy/routes.json`. + +## Future work + +- **AWS / SigV4.** Likely an IMDS emulator sidecar handing out + short-lived STS tokens. Different threat model (the agent + ends up holding the STS creds — the proxy just shortens + their lifetime). Separate PRD. +- **Per-method / per-path allowlist** inside a kind. Once the + set of API operations claude actually performs is observed, + reject everything else. Narrows the within-allowlist surface. +- **Short-lived token minting.** For services that support it + (GitHub Apps, GitLab project-access tokens, fine-grained + PATs with TTL), have the proxy mint a fresh per-session + child credential from a long-lived parent. +- **Smolmachines colocation.** Same packing question as + pipelock / git-gate; the cred-proxy can sit inside the agent + VM (current shape) or in a separate VM (stricter isolation, + per-bottle TCP hop). Backend decision, not a manifest decision. +- **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing + pattern generalizes; add as needed. + +## Open questions + +- **Field name.** `bottle.tokens` is the working name. The + research note used `bottle.forge` for the gitea/github + generalization, but "forge" doesn't fit `anthropic` or + `npm`. Alternatives: `bottle.brokered`, `bottle.upstreams`, + `bottle.cred_proxy`. Default: `bottle.tokens`. +- **Python vs Go for the proxy.** Default: Python, revisit + during implementation if SSE pass-through is unreliable. +- **Process inside the agent container vs sidecar container.** + v1: inside (simpler lifecycle, no extra container; ptrace + boundary is enough). The sidecar option becomes attractive + only if we want a network-layer split between proxy and agent + on top of the UID split. +- **Belt-and-braces on outbound telemetry.** Set + `CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and + `DISABLE_ERROR_REPORTING=1` in the agent's environ by + default? Default: yes — they don't route through + `ANTHROPIC_BASE_URL`, so the proxy doesn't catch them; the + flags are the only off switch. +- **`git push` over a rewritten URL vs. credential-helper + shim.** `[url "http://…"] insteadOf = "https://github.com/"` + captures push/fetch/clone/pull/ls-remote in one config knob; + a credential helper would need separate wiring. Default: + `insteadOf`. +- **Token-refresh story for the Anthropic OAuth token.** The + token is ~1-year and there's no client-side refresh, so the + proxy holds a static value. The 1-year blast radius is the + cost, documented in + [`claude-code-token-revocation.md`](../research/claude-code-token-revocation.md). + No design change here; flagged for awareness. +- **`anthropics/claude-code#36998`.** Older claude-code + versions bypassed `ANTHROPIC_BASE_URL` for some startup + calls (auth validation, org lookup). Marked closed upstream; + the implementation PR verifies with `strace -e connect` + against the pinned claude-code build before trusting the + isolation. + +## References + +- [`docs/research/agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md) + — landscape research; this PRD is the build path that note + recommends. +- [`docs/research/secret-minimization-over-dlp.md`](../research/secret-minimization-over-dlp.md) + — architectural framing: why moving the credential matters + more than scanning egress. +- PRD 0006: pipelock TLS interception — the + `api.anthropic.com` TLS-MITM responsibility cred-proxy takes + over. +- PRD 0008: Git gate — the credential-broker pattern this PRD + reuses (gate holds creds, agent gets a rewritten URL, gate + makes the upstream connection). +- [`anthropics/claude-code#36998`](https://github.com/anthropics/claude-code/issues/36998) + — historic `ANTHROPIC_BASE_URL` bypass. +- [`go-gitea/gitea#16734`](https://github.com/go-gitea/gitea/issues/16734) + — why Gitea uses `Authorization: token`, not `Bearer`. +- [`golang/go#28866`](https://github.com/golang/go/issues/28866) + — the `HTTPS_PROXY` loopback bug; not hit here because we're + a reverse proxy, not a forward proxy.