docs: add PRD 0010 for credential proxy
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 19s

Per-bottle reverse proxy that holds API tokens (Anthropic OAuth,
GitHub PAT, Gitea PAT, npm) in a root-owned process; agent gets
only URLs in its environ. AWS / SigV4 explicitly out of scope.
This commit is contained in:
2026-05-13 00:18:55 -04:00
parent 3d9103d5b5
commit 2a687449d4
+420
View File
@@ -0,0 +1,420 @@
# PRD 0010: Credential proxy for agent-bound API tokens
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-13
## Summary
Per-bottle reverse proxy that holds API tokens (Anthropic OAuth,
GitHub PAT, Gitea PAT, npm token) in a root-owned process inside
the agent container. The agent (`node`, UID 1000) keeps only URLs
in its environ; the proxy injects the right `Authorization` header
and forwards over TLS. The boundary that makes this meaningful is
the kernel's `ptrace_may_access` check: `node` cannot read root's
`/proc/<pid>/environ` and cannot `ptrace` attach without
`CAP_SYS_PTRACE` / `CAP_PERFMON`, which claude-bottle does not
grant.
AWS / SigV4 is explicitly out of scope — it is per-request signing,
not header injection, and does not fit this proxy's shape. If a
bottle needs AWS credentials later, that lives in a separate PRD.
## Problem
Today `CLAUDE_CODE_OAUTH_TOKEN` (and any `bottle.env` secrets such
as a Gitea PAT, GitHub PAT, or npm token) gets `docker run -e`'d
straight into the agent's environ. Inside the bottle the agent
runs as `node` with `--dangerously-skip-permissions`; its Bash
tool can do `printenv`, `cat /proc/self/environ`, or
`node -e 'console.log(process.env)'` and capture every value into
the conversation. From there a prompt-injected or hijacked agent
can exfil over any allowed egress (api.anthropic.com itself if
nothing else).
Linux has no per-env-var ACL — once a variable is in a process's
environ, the process and its descendants own it. The credible
boundary is process-level: hold the credential in a different
process the agent cannot read. Default Docker already enforces
that boundary at the kernel line via `ptrace_may_access`, the
same property the (removed) ssh-gate and the current git-gate
rely on.
The research note
[`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
surveys the existing tools and concludes that a small
claude-bottle-specific reverse proxy is less work and less risk
than either adopting nono (alpha, unaudited) or Infisical Agent
Vault (TLS-MITM topology that doubles up on pipelock's CA stack).
This PRD is the build.
## Goals / Success Criteria
Each test runs inside a bottle whose manifest declares the four
supported kinds (anthropic, github, gitea, npm):
1. **No plaintext tokens in the agent's environ.** `printenv` and
`cat /proc/self/environ` from the agent's shell return only
URLs pointing at `127.0.0.1:<PORT>/...`. None of the
`bottle.tokens[].TokenRef` values appear.
2. **Kernel boundary holds.** From the agent's shell,
`cat /proc/<cred-proxy-pid>/environ` returns `EACCES` and
`gdb -p <cred-proxy-pid>` / `strace -p <cred-proxy-pid>` fails
with `EPERM`.
3. **Anthropic API works.** `claude` makes a successful streaming
tool-use round-trip via `ANTHROPIC_BASE_URL`
`127.0.0.1:<PORT>/anthropic`. SSE chunks arrive without
buffering; `anthropic-version`, `anthropic-beta`, and
`X-Claude-Code-Session-Id` headers round-trip untouched.
4. **Git push to declared remotes works.** `git push` against a
`bottle.tokens[].Kind: github` or `gitea` upstream succeeds;
the upstream sees the gate's token, not the agent's.
5. **npm install works.** `npm install <public-package>`
succeeds against the registry pointed at the proxy. A scoped
install that requires the token (e.g. against a private
registry) also succeeds.
6. **Wrong token rejected at the source, not silently swapped.**
If the agent tries to send its own `Authorization: …` header,
the proxy strips and replaces with the configured one. A
manifest token revoked at the upstream produces a 401 to the
agent, not a 5xx.
## Non-goals
- **AWS / SigV4.** Per-request signing is a different shape; a
bearer-injecting proxy doesn't help. Hold for a future PRD
(likely an IMDS emulator sidecar handing out short-lived STS
credentials).
- **DB-backed credential store.** Flat env / mode-600 file only.
The LiteLLM CVE-2026-42208 incident is the cautionary tale:
any DB-backed credential gateway is itself a high-value attack
target.
- **Generic LLM-gateway features.** No cost tracking, no
fallbacks, no virtual keys, no multi-tenant routing, no usage
metering. The proxy is a credential-injection trust endpoint,
not a gateway.
- **Subsuming pipelock.** pipelock keeps its egress-allowlist
role. It drops the `api.anthropic.com` TLS-MITM job because
cred-proxy is now the trust endpoint for that host; everything
else pipelock does stays.
- **TLS interception inside the bottle.** The agent talks plain
HTTP to loopback; cred-proxy speaks real HTTPS outbound. No
container-local CA, no `golang/go#28866` loopback workaround.
- **Cross-bottle credential sharing.** One proxy per bottle, same
one-sidecar-per-agent posture as pipelock and git-gate.
- **`claude --bare` mode.** Reads only `ANTHROPIC_API_KEY`, not
the OAuth token. Not in claude-bottle's flow today.
- **MCP-server tokens, package-installer tokens for languages
beyond npm.** PyPI / Bun / cargo can land in a follow-up if
needed; the routing pattern generalizes.
## Scope
### In scope
- **Manifest field.** `bottle.tokens: [TokenEntry, ...]`. Each
entry carries `Kind` (`anthropic` | `github` | `gitea` |
`npm`), an optional `Url` (required for `gitea`, defaulted for
the others), and `TokenRef` (the name of a host env var the
CLI resolves at launch time).
- **cred-proxy process.** Runs as root inside the agent
container, listens on `127.0.0.1:<PORT>`. Holds the tokens in
its own environ — never on argv, never written to disk.
Per-`Kind` route handler: inject the right header, forward
over TLS, stream the response back to the client without
buffering.
- **Agent-side rewrites.** Provisioner writes:
- `ANTHROPIC_BASE_URL=http://127.0.0.1:<PORT>/anthropic` to
the agent's environ
- `~/.npmrc` `registry = http://127.0.0.1:<PORT>/npm/`
- `~/.gitconfig` `[url …] insteadOf = …` for each declared
`github` / `gitea` upstream
- `~/.config/tea/config.yml` with the proxy URL for each
declared `gitea` entry
- **Process lifecycle.** Container entrypoint launches the proxy
first as root, waits for it to bind, then `exec setpriv …
--reuid=node --regid=node …` for the claude child. Proxy
death is fatal (the container exits); this is also the
PID-1-zombie story.
- **pipelock interop.** Drop `api.anthropic.com` from pipelock's
TLS-MITM list; keep it on the allowlist as a plain HTTPS host
(cred-proxy is the trust endpoint now). Verify pipelock still
lets cred-proxy's HTTPS connections out for the four upstream
hosts.
- **Plan rendering.** `bottle_plan.py` and the y/N preflight
show: which tokens are configured (kind + ref name, not the
value), the proxy port, the routes the proxy will publish.
- **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in
`prepare.py`.** Today it lands in the agent's environ; once
this PRD ships, it lands in the proxy's environ instead.
- **Tests.** Integration tests for each of the six success
criteria; unit tests for manifest parsing, route table
generation, header injection.
### Out of scope
- AWS / SigV4 (see Non-goals).
- Per-method / per-path allowlist *inside* a kind. Defer to a
follow-up once observed traffic stabilizes.
- Replacing `bottle.env` for non-token secrets. The proxy
handles the four kinds listed above; other env vars keep their
current path.
- Migrating an in-flight bottle from "token in agent env" to
"token via proxy" mid-session. Restart required.
- Audit logging. The proxy doesn't write request logs in v1.
Add only if a concrete debugging need surfaces.
## Proposed Design
### Architecture
```
┌── Host (macOS) ──────────────────────────────────────────────────┐
│ Secrets at rest (keychain / .env): │
│ CLAUDE_BOTTLE_OAUTH_TOKEN, GITHUB_TOKEN, │
│ GITEA_SERVER_TOKEN, NPM_TOKEN │
│ │ docker run -e KEY (no =VALUE on argv) │
│ ▼ │
│ ┌── Bottle container ────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌── UID 1000 (node) ─────────────────────────────────┐ │ │
│ │ │ claude --dangerously-skip-permissions │ │ │
│ │ │ environ: URLs only, no plaintext tokens │ │ │
│ │ │ ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │ │ │
│ │ │ npm registry → http://127.0.0.1:PORT/npm/ │ │ │
│ │ │ git remote.url → http://127.0.0.1:PORT/... │ │ │
│ │ │ tea --url → http://127.0.0.1:PORT/gitea │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ plain HTTP, loopback │ │
│ │ ▼ │ │
│ │ ┌── UID 0 (root) ────────────────────────────────────┐ │ │
│ │ │ cred-proxy listens 127.0.0.1:PORT │ │ │
│ │ │ tokens live ONLY in this process's environ │ │ │
│ │ │ per-route: inject auth header, forward over TLS │ │ │
│ │ │ /anthropic → api.anthropic.com Bearer │ │ │
│ │ │ /gh-api → api.github.com Bearer │ │ │
│ │ │ /gh-git → github.com Bearer │ │ │
│ │ │ /gitea → gitea.dideric.is token │ │ │
│ │ │ /npm → registry.npmjs.org Bearer │ │ │
│ │ │ SSE pass-through, no buffering │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ HTTPS │ │
│ │ ▼ │ │
│ │ ┌── pipelock (egress allowlist) ─────────────────────┐ │ │
│ │ │ allow: api.anthropic.com, api.github.com, │ │ │
│ │ │ github.com, gitea.dideric.is, │ │ │
│ │ │ registry.npmjs.org │ │ │
│ │ │ block: statsig, sentry, autoupdater, * │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ └────────────────┼──────────────────────────────────────────┘ │
│ ▼ │
└────────────────────┼─────────────────────────────────────────────┘
Upstream APIs
Why node@1000 can't just steal the tokens:
┌─────────────────────────────────────────────────────────┐
│ node tries: │
│ cat /proc/<cred-proxy-pid>/environ → EACCES │
│ ptrace(PTRACE_ATTACH, <cred-proxy-pid>, ...) → EPERM│
│ Kernel's ptrace_may_access rejects: UID mismatch │
│ and no CAP_SYS_PTRACE / CAP_PERFMON in the container. │
└─────────────────────────────────────────────────────────┘
```
### New components
- **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy`
+ `CredProxyPlan` dataclass. `prepare` is host-side and
side-effect-free on Docker; renders the route table and
resolves `TokenRef`s against host env. Mirrors the existing
`GitGate` / `Pipelock` shape.
- **`claude_bottle/backend/docker/cred_proxy.py`** (new):
`DockerCredProxy` concrete subclass. Bakes the proxy binary
into the agent image; `start` writes the route table to a
mode-600 file under `stage_dir` and arranges the entrypoint
so the proxy boots first.
- **`claude_bottle/backend/docker/provision/cred_proxy.py`**
(new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`,
`~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml`
into the agent's home for each declared kind.
- **The proxy binary itself.** Bundled into the agent image at
`/usr/local/libexec/cred-proxy`. See "External dependencies"
for the language choice.
### Existing code touched
- **`claude_bottle/manifest.py`** — add `TokenEntry`,
`Bottle.tokens: tuple[TokenEntry, ...] = ()`, parse + validate
(at most one entry per `Kind` except `gitea`, which may
carry multiple Urls).
- **`claude_bottle/backend/docker/prepare.py`** — delete the
`CLAUDE_BOTTLE_OAUTH_TOKEN``CLAUDE_CODE_OAUTH_TOKEN` branch
in the agent's forwarded env. The OAuth token now flows to
the proxy's environ via the cred-proxy lifecycle.
- **`claude_bottle/backend/docker/backend.py`** — instantiate
`DockerCredProxy`; thread its `prepare` / `start` / `stop`
through `resolve_plan` / `launch`.
- **`claude_bottle/backend/docker/launch.py`** — add cred-proxy
start before the cred-proxy provisioner runs (provisioner
writes URLs that reference the proxy port, so it must be up).
- **`claude_bottle/backend/docker/bottle_plan.py`** — new
`CredProxyPlan` field; preflight shows kind + ref name +
port + route table.
- **`claude_bottle/pipelock.py`** — drop the `api.anthropic.com`
TLS-MITM branch; the host stays on the allowlist as a plain
HTTPS destination. Confirm the four upstream hosts are
allowlisted by default when `bottle.tokens` declares them.
- **`README.md`** — replace the architecture diagram with the
one above; document the `bottle.tokens` field.
- **`claude-bottle.example.json`** — add a `tokens` array to
one bottle showing each Kind.
- **Tests** — new unit tests for manifest parsing, route table
generation, header injection; new integration tests for the
six success criteria. Delete the bits of `prepare.py` tests
that asserted on `CLAUDE_CODE_OAUTH_TOKEN` landing in the
agent's env.
### Data model changes
```python
@dataclass(frozen=True)
class TokenEntry:
Kind: Literal["anthropic", "github", "gitea", "npm"]
TokenRef: str # name of host env var
Url: str | None = None # required for gitea; defaulted otherwise
@dataclass(frozen=True)
class Bottle:
...
tokens: tuple[TokenEntry, ...] = ()
```
Validation:
- `Kind` must be one of the four supported values.
- `TokenRef` must resolve against `os.environ` at launch (fail
fast with a clear "host env var X is unset" if missing).
- `gitea` entries require `Url`; others fall back to the
documented upstream.
- At most one entry per `Kind` except `gitea`, which may have
multiple distinct `Url`s.
- No silent overlap with `bottle.git` upstreams that already
flow through git-gate; if a `tokens[].Kind: github|gitea`
entry's `Url` collides with a `git[].Upstream`'s host, parse
fails with a "git-gate already brokers this remote, drop one"
hint. (Both paths broker credentials; doubling up is a
configuration smell, not a feature.)
### Routing table
| Kind | Proxy path | Upstream | Header |
|-----------|----------------|-------------------------|----------------------------|
| anthropic | `/anthropic/` | `api.anthropic.com` | `Authorization: Bearer …` |
| github | `/gh-api/` | `api.github.com` | `Authorization: Bearer …` |
| github | `/gh-git/` | `github.com` | `Authorization: Bearer …` |
| gitea | `/gitea/<Url>` | configured `Url` | `Authorization: token …` |
| npm | `/npm/` | `registry.npmjs.org` | `Authorization: Bearer …` |
Gitea uses `Authorization: token` rather than `Bearer` to
sidestep `go-gitea/gitea#16734`. The proxy strips any incoming
`Authorization` header before injecting its own — the agent
cannot smuggle a stolen token through this path.
### External dependencies
The proxy binary. Two real options:
- **Python (stdlib)** — `http.server` + `urllib`/`http.client`,
no new pip packages. Matches CLAUDE.md's "bash-first, low-deps"
posture. SSE pass-through is fiddly but doable.
- **Go single binary** — cleaner SSE story, smaller runtime,
one static binary baked into the image. New build dependency.
Default: Python, baked into the agent image. Reconsider in the
implementation PR if SSE behavior is troublesome under load.
No new Python packages. No DB. No admin API. The proxy's
configuration is a single mode-600 JSON file passed in via
`/run/cred-proxy/routes.json`.
## Future work
- **AWS / SigV4.** Likely an IMDS emulator sidecar handing out
short-lived STS tokens. Different threat model (the agent
ends up holding the STS creds — the proxy just shortens
their lifetime). Separate PRD.
- **Per-method / per-path allowlist** inside a kind. Once the
set of API operations claude actually performs is observed,
reject everything else. Narrows the within-allowlist surface.
- **Short-lived token minting.** For services that support it
(GitHub Apps, GitLab project-access tokens, fine-grained
PATs with TTL), have the proxy mint a fresh per-session
child credential from a long-lived parent.
- **Smolmachines colocation.** Same packing question as
pipelock / git-gate; the cred-proxy can sit inside the agent
VM (current shape) or in a separate VM (stricter isolation,
per-bottle TCP hop). Backend decision, not a manifest decision.
- **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing
pattern generalizes; add as needed.
## Open questions
- **Field name.** `bottle.tokens` is the working name. The
research note used `bottle.forge` for the gitea/github
generalization, but "forge" doesn't fit `anthropic` or
`npm`. Alternatives: `bottle.brokered`, `bottle.upstreams`,
`bottle.cred_proxy`. Default: `bottle.tokens`.
- **Python vs Go for the proxy.** Default: Python, revisit
during implementation if SSE pass-through is unreliable.
- **Process inside the agent container vs sidecar container.**
v1: inside (simpler lifecycle, no extra container; ptrace
boundary is enough). The sidecar option becomes attractive
only if we want a network-layer split between proxy and agent
on top of the UID split.
- **Belt-and-braces on outbound telemetry.** Set
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and
`DISABLE_ERROR_REPORTING=1` in the agent's environ by
default? Default: yes — they don't route through
`ANTHROPIC_BASE_URL`, so the proxy doesn't catch them; the
flags are the only off switch.
- **`git push` over a rewritten URL vs. credential-helper
shim.** `[url "http://…"] insteadOf = "https://github.com/"`
captures push/fetch/clone/pull/ls-remote in one config knob;
a credential helper would need separate wiring. Default:
`insteadOf`.
- **Token-refresh story for the Anthropic OAuth token.** The
token is ~1-year and there's no client-side refresh, so the
proxy holds a static value. The 1-year blast radius is the
cost, documented in
[`claude-code-token-revocation.md`](../research/claude-code-token-revocation.md).
No design change here; flagged for awareness.
- **`anthropics/claude-code#36998`.** Older claude-code
versions bypassed `ANTHROPIC_BASE_URL` for some startup
calls (auth validation, org lookup). Marked closed upstream;
the implementation PR verifies with `strace -e connect`
against the pinned claude-code build before trusting the
isolation.
## References
- [`docs/research/agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
— landscape research; this PRD is the build path that note
recommends.
- [`docs/research/secret-minimization-over-dlp.md`](../research/secret-minimization-over-dlp.md)
— architectural framing: why moving the credential matters
more than scanning egress.
- PRD 0006: pipelock TLS interception — the
`api.anthropic.com` TLS-MITM responsibility cred-proxy takes
over.
- PRD 0008: Git gate — the credential-broker pattern this PRD
reuses (gate holds creds, agent gets a rewritten URL, gate
makes the upstream connection).
- [`anthropics/claude-code#36998`](https://github.com/anthropics/claude-code/issues/36998)
— historic `ANTHROPIC_BASE_URL` bypass.
- [`go-gitea/gitea#16734`](https://github.com/go-gitea/gitea/issues/16734)
— why Gitea uses `Authorization: token`, not `Bearer`.
- [`golang/go#28866`](https://github.com/golang/go/issues/28866)
— the `HTTPS_PROXY` loopback bug; not hit here because we're
a reverse proxy, not a forward proxy.