docs: add PRD 0010 for credential proxy

Per-bottle reverse proxy that holds API tokens (Anthropic OAuth, GitHub PAT, Gitea PAT, npm) in a root-owned process; agent gets only URLs in its environ. AWS / SigV4 explicitly out of scope.
2026-05-13 00:18:55 -04:00
parent 3d9103d5b5
commit 2a687449d4
1 changed files with 420 additions and 0 deletions
@@ -0,0 +1,420 @@
+# PRD 0010: Credential proxy for agent-bound API tokens
+
+- **Status:** Draft
+- **Author:** didericis
+- **Created:** 2026-05-13
+
+## Summary
+
+Per-bottle reverse proxy that holds API tokens (Anthropic OAuth,
+GitHub PAT, Gitea PAT, npm token) in a root-owned process inside
+the agent container. The agent (`node`, UID 1000) keeps only URLs
+in its environ; the proxy injects the right `Authorization` header
+and forwards over TLS. The boundary that makes this meaningful is
+the kernel's `ptrace_may_access` check: `node` cannot read root's
+`/proc/<pid>/environ` and cannot `ptrace` attach without
+`CAP_SYS_PTRACE` / `CAP_PERFMON`, which claude-bottle does not
+grant.
+
+AWS / SigV4 is explicitly out of scope — it is per-request signing,
+not header injection, and does not fit this proxy's shape. If a
+bottle needs AWS credentials later, that lives in a separate PRD.
+
+## Problem
+
+Today `CLAUDE_CODE_OAUTH_TOKEN` (and any `bottle.env` secrets such
+as a Gitea PAT, GitHub PAT, or npm token) gets `docker run -e`'d
+straight into the agent's environ. Inside the bottle the agent
+runs as `node` with `--dangerously-skip-permissions`; its Bash
+tool can do `printenv`, `cat /proc/self/environ`, or
+`node -e 'console.log(process.env)'` and capture every value into
+the conversation. From there a prompt-injected or hijacked agent
+can exfil over any allowed egress (api.anthropic.com itself if
+nothing else).
+
+Linux has no per-env-var ACL — once a variable is in a process's
+environ, the process and its descendants own it. The credible
+boundary is process-level: hold the credential in a different
+process the agent cannot read. Default Docker already enforces
+that boundary at the kernel line via `ptrace_may_access`, the
+same property the (removed) ssh-gate and the current git-gate
+rely on.
+
+The research note
+[`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
+surveys the existing tools and concludes that a small
+claude-bottle-specific reverse proxy is less work and less risk
+than either adopting nono (alpha, unaudited) or Infisical Agent
+Vault (TLS-MITM topology that doubles up on pipelock's CA stack).
+This PRD is the build.
+
+## Goals / Success Criteria
+
+Each test runs inside a bottle whose manifest declares the four
+supported kinds (anthropic, github, gitea, npm):
+
+1. **No plaintext tokens in the agent's environ.** `printenv` and
+   `cat /proc/self/environ` from the agent's shell return only
+   URLs pointing at `127.0.0.1:<PORT>/...`. None of the
+   `bottle.tokens[].TokenRef` values appear.
+2. **Kernel boundary holds.** From the agent's shell,
+   `cat /proc/<cred-proxy-pid>/environ` returns `EACCES` and
+   `gdb -p <cred-proxy-pid>` / `strace -p <cred-proxy-pid>` fails
+   with `EPERM`.
+3. **Anthropic API works.** `claude` makes a successful streaming
+   tool-use round-trip via `ANTHROPIC_BASE_URL` →
+   `127.0.0.1:<PORT>/anthropic`. SSE chunks arrive without
+   buffering; `anthropic-version`, `anthropic-beta`, and
+   `X-Claude-Code-Session-Id` headers round-trip untouched.
+4. **Git push to declared remotes works.** `git push` against a
+   `bottle.tokens[].Kind: github` or `gitea` upstream succeeds;
+   the upstream sees the gate's token, not the agent's.
+5. **npm install works.** `npm install <public-package>`
+   succeeds against the registry pointed at the proxy. A scoped
+   install that requires the token (e.g. against a private
+   registry) also succeeds.
+6. **Wrong token rejected at the source, not silently swapped.**
+   If the agent tries to send its own `Authorization: …` header,
+   the proxy strips and replaces with the configured one. A
+   manifest token revoked at the upstream produces a 401 to the
+   agent, not a 5xx.
+
+## Non-goals
+
+- **AWS / SigV4.** Per-request signing is a different shape; a
+  bearer-injecting proxy doesn't help. Hold for a future PRD
+  (likely an IMDS emulator sidecar handing out short-lived STS
+  credentials).
+- **DB-backed credential store.** Flat env / mode-600 file only.
+  The LiteLLM CVE-2026-42208 incident is the cautionary tale:
+  any DB-backed credential gateway is itself a high-value attack
+  target.
+- **Generic LLM-gateway features.** No cost tracking, no
+  fallbacks, no virtual keys, no multi-tenant routing, no usage
+  metering. The proxy is a credential-injection trust endpoint,
+  not a gateway.
+- **Subsuming pipelock.** pipelock keeps its egress-allowlist
+  role. It drops the `api.anthropic.com` TLS-MITM job because
+  cred-proxy is now the trust endpoint for that host; everything
+  else pipelock does stays.
+- **TLS interception inside the bottle.** The agent talks plain
+  HTTP to loopback; cred-proxy speaks real HTTPS outbound. No
+  container-local CA, no `golang/go#28866` loopback workaround.
+- **Cross-bottle credential sharing.** One proxy per bottle, same
+  one-sidecar-per-agent posture as pipelock and git-gate.
+- **`claude --bare` mode.** Reads only `ANTHROPIC_API_KEY`, not
+  the OAuth token. Not in claude-bottle's flow today.
+- **MCP-server tokens, package-installer tokens for languages
+  beyond npm.** PyPI / Bun / cargo can land in a follow-up if
+  needed; the routing pattern generalizes.
+
+## Scope
+
+### In scope
+
+- **Manifest field.** `bottle.tokens: [TokenEntry, ...]`. Each
+  entry carries `Kind` (`anthropic` | `github` | `gitea` |
+  `npm`), an optional `Url` (required for `gitea`, defaulted for
+  the others), and `TokenRef` (the name of a host env var the
+  CLI resolves at launch time).
+- **cred-proxy process.** Runs as root inside the agent
+  container, listens on `127.0.0.1:<PORT>`. Holds the tokens in
+  its own environ — never on argv, never written to disk.
+  Per-`Kind` route handler: inject the right header, forward
+  over TLS, stream the response back to the client without
+  buffering.
+- **Agent-side rewrites.** Provisioner writes:
+  - `ANTHROPIC_BASE_URL=http://127.0.0.1:<PORT>/anthropic` to
+    the agent's environ
+  - `~/.npmrc` `registry = http://127.0.0.1:<PORT>/npm/`
+  - `~/.gitconfig` `[url …] insteadOf = …` for each declared
+    `github` / `gitea` upstream
+  - `~/.config/tea/config.yml` with the proxy URL for each
+    declared `gitea` entry
+- **Process lifecycle.** Container entrypoint launches the proxy
+  first as root, waits for it to bind, then `exec setpriv …
+  --reuid=node --regid=node …` for the claude child. Proxy
+  death is fatal (the container exits); this is also the
+  PID-1-zombie story.
+- **pipelock interop.** Drop `api.anthropic.com` from pipelock's
+  TLS-MITM list; keep it on the allowlist as a plain HTTPS host
+  (cred-proxy is the trust endpoint now). Verify pipelock still
+  lets cred-proxy's HTTPS connections out for the four upstream
+  hosts.
+- **Plan rendering.** `bottle_plan.py` and the y/N preflight
+  show: which tokens are configured (kind + ref name, not the
+  value), the proxy port, the routes the proxy will publish.
+- **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in
+  `prepare.py`.** Today it lands in the agent's environ; once
+  this PRD ships, it lands in the proxy's environ instead.
+- **Tests.** Integration tests for each of the six success
+  criteria; unit tests for manifest parsing, route table
+  generation, header injection.
+
+### Out of scope
+
+- AWS / SigV4 (see Non-goals).
+- Per-method / per-path allowlist *inside* a kind. Defer to a
+  follow-up once observed traffic stabilizes.
+- Replacing `bottle.env` for non-token secrets. The proxy
+  handles the four kinds listed above; other env vars keep their
+  current path.
+- Migrating an in-flight bottle from "token in agent env" to
+  "token via proxy" mid-session. Restart required.
+- Audit logging. The proxy doesn't write request logs in v1.
+  Add only if a concrete debugging need surfaces.
+
+## Proposed Design
+
+### Architecture
+
+```
+┌── Host (macOS) ──────────────────────────────────────────────────┐
+│   Secrets at rest (keychain / .env):                             │
+│     CLAUDE_BOTTLE_OAUTH_TOKEN, GITHUB_TOKEN,                     │
+│     GITEA_SERVER_TOKEN, NPM_TOKEN                                │
+│        │ docker run -e KEY  (no =VALUE on argv)                  │
+│        ▼                                                         │
+│   ┌── Bottle container ────────────────────────────────────────┐ │
+│   │                                                            │ │
+│   │   ┌── UID 1000 (node) ─────────────────────────────────┐   │ │
+│   │   │  claude --dangerously-skip-permissions             │   │ │
+│   │   │  environ: URLs only, no plaintext tokens           │   │ │
+│   │   │    ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │   │ │
+│   │   │    npm  registry    → http://127.0.0.1:PORT/npm/   │   │ │
+│   │   │    git  remote.url  → http://127.0.0.1:PORT/...    │   │ │
+│   │   │    tea  --url       → http://127.0.0.1:PORT/gitea  │   │ │
+│   │   └────────────┬───────────────────────────────────────┘   │ │
+│   │                │ plain HTTP, loopback                       │ │
+│   │                ▼                                            │ │
+│   │   ┌── UID 0 (root) ────────────────────────────────────┐   │ │
+│   │   │  cred-proxy   listens 127.0.0.1:PORT               │   │ │
+│   │   │  tokens live ONLY in this process's environ        │   │ │
+│   │   │  per-route: inject auth header, forward over TLS   │   │ │
+│   │   │    /anthropic → api.anthropic.com   Bearer         │   │ │
+│   │   │    /gh-api    → api.github.com      Bearer         │   │ │
+│   │   │    /gh-git    → github.com          Bearer         │   │ │
+│   │   │    /gitea     → gitea.dideric.is    token          │   │ │
+│   │   │    /npm       → registry.npmjs.org  Bearer         │   │ │
+│   │   │  SSE pass-through, no buffering                    │   │ │
+│   │   └────────────┬───────────────────────────────────────┘   │ │
+│   │                │ HTTPS                                      │ │
+│   │                ▼                                            │ │
+│   │   ┌── pipelock (egress allowlist) ─────────────────────┐   │ │
+│   │   │  allow: api.anthropic.com, api.github.com,         │   │ │
+│   │   │         github.com, gitea.dideric.is,              │   │ │
+│   │   │         registry.npmjs.org                         │   │ │
+│   │   │  block: statsig, sentry, autoupdater, *            │   │ │
+│   │   └────────────┬───────────────────────────────────────┘   │ │
+│   └────────────────┼──────────────────────────────────────────┘ │
+│                    ▼                                             │
+└────────────────────┼─────────────────────────────────────────────┘
+                     ▼
+              Upstream APIs
+
+
+Why node@1000 can't just steal the tokens:
+   ┌─────────────────────────────────────────────────────────┐
+   │  node tries:                                            │
+   │     cat /proc/<cred-proxy-pid>/environ   → EACCES       │
+   │     ptrace(PTRACE_ATTACH, <cred-proxy-pid>, ...) → EPERM│
+   │  Kernel's ptrace_may_access rejects: UID mismatch       │
+   │  and no CAP_SYS_PTRACE / CAP_PERFMON in the container.  │
+   └─────────────────────────────────────────────────────────┘
+```
+
+### New components
+
+- **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy`
+  + `CredProxyPlan` dataclass. `prepare` is host-side and
+  side-effect-free on Docker; renders the route table and
+  resolves `TokenRef`s against host env. Mirrors the existing
+  `GitGate` / `Pipelock` shape.
+- **`claude_bottle/backend/docker/cred_proxy.py`** (new):
+  `DockerCredProxy` concrete subclass. Bakes the proxy binary
+  into the agent image; `start` writes the route table to a
+  mode-600 file under `stage_dir` and arranges the entrypoint
+  so the proxy boots first.
+- **`claude_bottle/backend/docker/provision/cred_proxy.py`**
+  (new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`,
+  `~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml`
+  into the agent's home for each declared kind.
+- **The proxy binary itself.** Bundled into the agent image at
+  `/usr/local/libexec/cred-proxy`. See "External dependencies"
+  for the language choice.
+
+### Existing code touched
+
+- **`claude_bottle/manifest.py`** — add `TokenEntry`,
+  `Bottle.tokens: tuple[TokenEntry, ...] = ()`, parse + validate
+  (at most one entry per `Kind` except `gitea`, which may
+  carry multiple Urls).
+- **`claude_bottle/backend/docker/prepare.py`** — delete the
+  `CLAUDE_BOTTLE_OAUTH_TOKEN` → `CLAUDE_CODE_OAUTH_TOKEN` branch
+  in the agent's forwarded env. The OAuth token now flows to
+  the proxy's environ via the cred-proxy lifecycle.
+- **`claude_bottle/backend/docker/backend.py`** — instantiate
+  `DockerCredProxy`; thread its `prepare` / `start` / `stop`
+  through `resolve_plan` / `launch`.
+- **`claude_bottle/backend/docker/launch.py`** — add cred-proxy
+  start before the cred-proxy provisioner runs (provisioner
+  writes URLs that reference the proxy port, so it must be up).
+- **`claude_bottle/backend/docker/bottle_plan.py`** — new
+  `CredProxyPlan` field; preflight shows kind + ref name +
+  port + route table.
+- **`claude_bottle/pipelock.py`** — drop the `api.anthropic.com`
+  TLS-MITM branch; the host stays on the allowlist as a plain
+  HTTPS destination. Confirm the four upstream hosts are
+  allowlisted by default when `bottle.tokens` declares them.
+- **`README.md`** — replace the architecture diagram with the
+  one above; document the `bottle.tokens` field.
+- **`claude-bottle.example.json`** — add a `tokens` array to
+  one bottle showing each Kind.
+- **Tests** — new unit tests for manifest parsing, route table
+  generation, header injection; new integration tests for the
+  six success criteria. Delete the bits of `prepare.py` tests
+  that asserted on `CLAUDE_CODE_OAUTH_TOKEN` landing in the
+  agent's env.
+
+### Data model changes
+
+```python
+@dataclass(frozen=True)
+class TokenEntry:
+    Kind: Literal["anthropic", "github", "gitea", "npm"]
+    TokenRef: str             # name of host env var
+    Url: str | None = None    # required for gitea; defaulted otherwise
+
+@dataclass(frozen=True)
+class Bottle:
+    ...
+    tokens: tuple[TokenEntry, ...] = ()
+```
+
+Validation:
+
+- `Kind` must be one of the four supported values.
+- `TokenRef` must resolve against `os.environ` at launch (fail
+  fast with a clear "host env var X is unset" if missing).
+- `gitea` entries require `Url`; others fall back to the
+  documented upstream.
+- At most one entry per `Kind` except `gitea`, which may have
+  multiple distinct `Url`s.
+- No silent overlap with `bottle.git` upstreams that already
+  flow through git-gate; if a `tokens[].Kind: github|gitea`
+  entry's `Url` collides with a `git[].Upstream`'s host, parse
+  fails with a "git-gate already brokers this remote, drop one"
+  hint. (Both paths broker credentials; doubling up is a
+  configuration smell, not a feature.)
+
+### Routing table
+
+| Kind      | Proxy path     | Upstream                | Header                     |
+|-----------|----------------|-------------------------|----------------------------|
+| anthropic | `/anthropic/`  | `api.anthropic.com`     | `Authorization: Bearer …`  |
+| github    | `/gh-api/`     | `api.github.com`        | `Authorization: Bearer …`  |
+| github    | `/gh-git/`     | `github.com`            | `Authorization: Bearer …`  |
+| gitea     | `/gitea/<Url>` | configured `Url`        | `Authorization: token …`   |
+| npm       | `/npm/`        | `registry.npmjs.org`    | `Authorization: Bearer …`  |
+
+Gitea uses `Authorization: token` rather than `Bearer` to
+sidestep `go-gitea/gitea#16734`. The proxy strips any incoming
+`Authorization` header before injecting its own — the agent
+cannot smuggle a stolen token through this path.
+
+### External dependencies
+
+The proxy binary. Two real options:
+
+- **Python (stdlib)** — `http.server` + `urllib`/`http.client`,
+  no new pip packages. Matches CLAUDE.md's "bash-first, low-deps"
+  posture. SSE pass-through is fiddly but doable.
+- **Go single binary** — cleaner SSE story, smaller runtime,
+  one static binary baked into the image. New build dependency.
+
+Default: Python, baked into the agent image. Reconsider in the
+implementation PR if SSE behavior is troublesome under load.
+
+No new Python packages. No DB. No admin API. The proxy's
+configuration is a single mode-600 JSON file passed in via
+`/run/cred-proxy/routes.json`.
+
+## Future work
+
+- **AWS / SigV4.** Likely an IMDS emulator sidecar handing out
+  short-lived STS tokens. Different threat model (the agent
+  ends up holding the STS creds — the proxy just shortens
+  their lifetime). Separate PRD.
+- **Per-method / per-path allowlist** inside a kind. Once the
+  set of API operations claude actually performs is observed,
+  reject everything else. Narrows the within-allowlist surface.
+- **Short-lived token minting.** For services that support it
+  (GitHub Apps, GitLab project-access tokens, fine-grained
+  PATs with TTL), have the proxy mint a fresh per-session
+  child credential from a long-lived parent.
+- **Smolmachines colocation.** Same packing question as
+  pipelock / git-gate; the cred-proxy can sit inside the agent
+  VM (current shape) or in a separate VM (stricter isolation,
+  per-bottle TCP hop). Backend decision, not a manifest decision.
+- **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing
+  pattern generalizes; add as needed.
+
+## Open questions
+
+- **Field name.** `bottle.tokens` is the working name. The
+  research note used `bottle.forge` for the gitea/github
+  generalization, but "forge" doesn't fit `anthropic` or
+  `npm`. Alternatives: `bottle.brokered`, `bottle.upstreams`,
+  `bottle.cred_proxy`. Default: `bottle.tokens`.
+- **Python vs Go for the proxy.** Default: Python, revisit
+  during implementation if SSE pass-through is unreliable.
+- **Process inside the agent container vs sidecar container.**
+  v1: inside (simpler lifecycle, no extra container; ptrace
+  boundary is enough). The sidecar option becomes attractive
+  only if we want a network-layer split between proxy and agent
+  on top of the UID split.
+- **Belt-and-braces on outbound telemetry.** Set
+  `CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and
+  `DISABLE_ERROR_REPORTING=1` in the agent's environ by
+  default? Default: yes — they don't route through
+  `ANTHROPIC_BASE_URL`, so the proxy doesn't catch them; the
+  flags are the only off switch.
+- **`git push` over a rewritten URL vs. credential-helper
+  shim.** `[url "http://…"] insteadOf = "https://github.com/"`
+  captures push/fetch/clone/pull/ls-remote in one config knob;
+  a credential helper would need separate wiring. Default:
+  `insteadOf`.
+- **Token-refresh story for the Anthropic OAuth token.** The
+  token is ~1-year and there's no client-side refresh, so the
+  proxy holds a static value. The 1-year blast radius is the
+  cost, documented in
+  [`claude-code-token-revocation.md`](../research/claude-code-token-revocation.md).
+  No design change here; flagged for awareness.
+- **`anthropics/claude-code#36998`.** Older claude-code
+  versions bypassed `ANTHROPIC_BASE_URL` for some startup
+  calls (auth validation, org lookup). Marked closed upstream;
+  the implementation PR verifies with `strace -e connect`
+  against the pinned claude-code build before trusting the
+  isolation.
+
+## References
+
+- [`docs/research/agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
+  — landscape research; this PRD is the build path that note
+  recommends.
+- [`docs/research/secret-minimization-over-dlp.md`](../research/secret-minimization-over-dlp.md)
+  — architectural framing: why moving the credential matters
+  more than scanning egress.
+- PRD 0006: pipelock TLS interception — the
+  `api.anthropic.com` TLS-MITM responsibility cred-proxy takes
+  over.
+- PRD 0008: Git gate — the credential-broker pattern this PRD
+  reuses (gate holds creds, agent gets a rewritten URL, gate
+  makes the upstream connection).
+- [`anthropics/claude-code#36998`](https://github.com/anthropics/claude-code/issues/36998)
+  — historic `ANTHROPIC_BASE_URL` bypass.
+- [`go-gitea/gitea#16734`](https://github.com/go-gitea/gitea/issues/16734)
+  — why Gitea uses `Authorization: token`, not `Bearer`.
+- [`golang/go#28866`](https://github.com/golang/go/issues/28866)
+  — the `HTTPS_PROXY` loopback bug; not hit here because we're
+  a reverse proxy, not a forward proxy.