Files
bot-bottle/docs/prds/0010-cred-proxy.md
T
didericis fe9d05664c
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 28s
docs: switch cred-proxy to sidecar shape
Make the cred-proxy a per-bottle sidecar container on the bottle's
internal docker network instead of a root-owned process inside the
agent container. The boundary becomes container namespace
separation, matching pipelock and git-gate. Update summary,
problem, goals, in-scope, architecture diagram, components,
existing code touched, external deps, and open questions; add a
"Considered alternatives" section recording the rejected
in-container shape.
2026-05-13 00:40:16 -04:00

478 lines
24 KiB
Markdown

# PRD 0010: Credential proxy for agent-bound API tokens
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-13
## Summary
Per-bottle sidecar container that holds API tokens (Anthropic
OAuth, GitHub PAT, Gitea PAT, npm token). The agent container
keeps only URLs in its environ; the sidecar injects the right
`Authorization` header and forwards over TLS to the upstream. The
boundary is the container line — PID, mount, and network
namespaces separate the agent's container from the sidecar's, so
from inside the agent the sidecar's processes are not visible in
`/proc`, cannot be `ptrace`'d, and share no memory. Reaching the
sidecar's environ requires escaping the agent container — the same
threshold pipelock and git-gate already rely on.
AWS / SigV4 is explicitly out of scope — it is per-request signing,
not header injection, and does not fit this proxy's shape. If a
bottle needs AWS credentials later, that lives in a separate PRD.
## Problem
Today `CLAUDE_CODE_OAUTH_TOKEN` (and any `bottle.env` secrets such
as a Gitea PAT, GitHub PAT, or npm token) gets `docker run -e`'d
straight into the agent's environ. Inside the bottle the agent
runs as `node` with `--dangerously-skip-permissions`; its Bash
tool can do `printenv`, `cat /proc/self/environ`, or
`node -e 'console.log(process.env)'` and capture every value into
the conversation. From there a prompt-injected or hijacked agent
can exfil over any allowed egress (api.anthropic.com itself if
nothing else).
Linux has no per-env-var ACL — once a variable is in a process's
environ, the process and its descendants own it. The credible
boundary is container-level: hold the credential in a separate
container the agent cannot reach. Default Docker's namespace
isolation enforces that — the same property pipelock and git-gate
already rely on.
The research note
[`agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
surveys the existing tools and concludes that a small
claude-bottle-specific reverse proxy is less work and less risk
than either adopting nono (alpha, unaudited) or Infisical Agent
Vault (TLS-MITM topology that doubles up on pipelock's CA stack).
This PRD is the build.
## Goals / Success Criteria
Each test runs inside a bottle whose manifest declares the four
supported kinds (anthropic, github, gitea, npm):
1. **No plaintext tokens in the agent's environ.** `printenv` and
`cat /proc/self/environ` from the agent's shell return only
URLs pointing at `cred-proxy:<PORT>/...`. None of the
`bottle.tokens[].TokenRef` values appear.
2. **Container boundary holds.** From the agent's shell, `ps aux`
does not list the cred-proxy process; there is no `/proc/<X>`
entry for it to read. The sidecar's hostname (`cred-proxy`)
resolves only on the bottle's internal network — from a
different bottle or from the host, the name does not resolve.
3. **Anthropic API works.** `claude` makes a successful streaming
tool-use round-trip via `ANTHROPIC_BASE_URL`
`cred-proxy:<PORT>/anthropic`. SSE chunks arrive without
buffering; `anthropic-version`, `anthropic-beta`, and
`X-Claude-Code-Session-Id` headers round-trip untouched.
4. **Git push to declared remotes works.** `git push` against a
`bottle.tokens[].Kind: github` or `gitea` upstream succeeds;
the upstream sees the gate's token, not the agent's.
5. **npm install works.** `npm install <public-package>`
succeeds against the registry pointed at the proxy. A scoped
install that requires the token (e.g. against a private
registry) also succeeds.
6. **Wrong token rejected at the source, not silently swapped.**
If the agent tries to send its own `Authorization: …` header,
the proxy strips and replaces with the configured one. A
manifest token revoked at the upstream produces a 401 to the
agent, not a 5xx.
## Non-goals
- **AWS / SigV4.** Per-request signing is a different shape; a
bearer-injecting proxy doesn't help. Hold for a future PRD
(likely an IMDS emulator sidecar handing out short-lived STS
credentials).
- **DB-backed credential store.** Flat env / mode-600 file only.
The LiteLLM CVE-2026-42208 incident is the cautionary tale:
any DB-backed credential gateway is itself a high-value attack
target.
- **Generic LLM-gateway features.** No cost tracking, no
fallbacks, no virtual keys, no multi-tenant routing, no usage
metering. The proxy is a credential-injection trust endpoint,
not a gateway.
- **Subsuming pipelock.** pipelock keeps its egress-allowlist
role. It drops the `api.anthropic.com` TLS-MITM job because
cred-proxy is now the trust endpoint for that host; everything
else pipelock does stays.
- **TLS interception inside the bottle.** The agent talks plain
HTTP to loopback; cred-proxy speaks real HTTPS outbound. No
container-local CA, no `golang/go#28866` loopback workaround.
- **Cross-bottle credential sharing.** One proxy per bottle, same
one-sidecar-per-agent posture as pipelock and git-gate.
- **`claude --bare` mode.** Reads only `ANTHROPIC_API_KEY`, not
the OAuth token. Not in claude-bottle's flow today.
- **MCP-server tokens, package-installer tokens for languages
beyond npm.** PyPI / Bun / cargo can land in a follow-up if
needed; the routing pattern generalizes.
## Scope
### In scope
- **Manifest field.** `bottle.tokens: [TokenEntry, ...]`. Each
entry carries `Kind` (`anthropic` | `github` | `gitea` |
`npm`), an optional `Url` (required for `gitea`, defaulted for
the others), and `TokenRef` (the name of a host env var the
CLI resolves at launch time).
- **cred-proxy sidecar.** Runs as its own container on the
bottle's internal docker network with hostname `cred-proxy`,
listening on `0.0.0.0:<PORT>` bound to the internal interface.
No host port published. Holds the tokens in the sidecar
container's environ — never on argv, never written to disk.
Per-`Kind` route handler: inject the right header, forward
over TLS, stream the response back without buffering.
- **Agent-side rewrites.** Provisioner writes:
- `ANTHROPIC_BASE_URL=http://cred-proxy:<PORT>/anthropic` to
the agent's environ
- `~/.npmrc` `registry = http://cred-proxy:<PORT>/npm/`
- `~/.gitconfig` `[url …] insteadOf = …` for each declared
`github` / `gitea` upstream
- `~/.config/tea/config.yml` with the proxy URL for each
declared `gitea` entry
- **Sidecar lifecycle.** Mirrors `DockerGitGate` /
`DockerPipelockProxy` in shape: `prepare` is host-side and
side-effect-free; `start` does `docker create` + `docker start`
on the bottle's internal network with hostname `cred-proxy`;
`stop` is idempotent `docker rm -f`. Container name:
`claude-bottle-cred-proxy-<slug>`. The agent container starts
after the sidecar is up so DNS resolution succeeds on the
agent's first call.
- **pipelock interop.** cred-proxy's outbound HTTPS still
traverses pipelock — pipelock keeps its egress-allowlist role
for the four upstream hosts. Drop `api.anthropic.com` from
pipelock's TLS-MITM list (cred-proxy is now the trust endpoint
for that host); the host stays on the plain HTTPS allowlist.
- **Plan rendering.** `bottle_plan.py` and the y/N preflight
show: which tokens are configured (kind + ref name, not the
value), the proxy port, the routes the proxy will publish.
- **Drop the existing `CLAUDE_CODE_OAUTH_TOKEN` forward in
`prepare.py`.** Today it lands in the agent's environ; once
this PRD ships, it lands in the cred-proxy sidecar's environ
instead.
- **Tests.** Integration tests for each of the six success
criteria; unit tests for manifest parsing, route table
generation, header injection.
### Out of scope
- AWS / SigV4 (see Non-goals).
- Per-method / per-path allowlist *inside* a kind. Defer to a
follow-up once observed traffic stabilizes.
- Replacing `bottle.env` for non-token secrets. The proxy
handles the four kinds listed above; other env vars keep their
current path.
- Migrating an in-flight bottle from "token in agent env" to
"token via proxy" mid-session. Restart required.
- Audit logging. The proxy doesn't write request logs in v1.
Add only if a concrete debugging need surfaces.
## Proposed Design
### Architecture
```
┌── Host (macOS) ──────────────────────────────────────────────────┐
│ Secrets at rest (keychain / .env): │
│ CLAUDE_BOTTLE_OAUTH_TOKEN, GITHUB_TOKEN, │
│ GITEA_SERVER_TOKEN, NPM_TOKEN │
│ │ docker run -e KEY (no =VALUE on argv) │
│ ▼ │
│ ┌── per-bottle internal docker network ──────────────────────┐ │
│ │ │ │
│ │ ┌── agent container ─────────────────────────────────┐ │ │
│ │ │ claude as node (UID 1000) │ │ │
│ │ │ --dangerously-skip-permissions │ │ │
│ │ │ environ: URLs only, no plaintext tokens │ │ │
│ │ │ ANTHROPIC_BASE_URL=http://cred-proxy:PORT/an.. │ │ │
│ │ │ npm registry → http://cred-proxy:PORT/npm/ │ │ │
│ │ │ git insteadOf → http://cred-proxy:PORT/... │ │ │
│ │ │ tea --url → http://cred-proxy:PORT/gite │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ HTTP, DNS → cred-proxy │ │
│ │ ▼ │ │
│ │ ┌── cred-proxy sidecar ──────────────────────────────┐ │ │
│ │ │ distroless image, no shell, runs as root │ │ │
│ │ │ hostname: cred-proxy listens 0.0.0.0:PORT │ │ │
│ │ │ tokens live ONLY in this container's environ │ │ │
│ │ │ /anthropic → api.anthropic.com Bearer │ │ │
│ │ │ /gh-api → api.github.com Bearer │ │ │
│ │ │ /gh-git → github.com Bearer │ │ │
│ │ │ /gitea → gitea.dideric.is token │ │ │
│ │ │ /npm → registry.npmjs.org Bearer │ │ │
│ │ │ SSE pass-through, no buffering │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ │ │ HTTPS │ │
│ │ ▼ │ │
│ │ ┌── pipelock sidecar (egress allowlist) ─────────────┐ │ │
│ │ │ allow: api.anthropic.com, api.github.com, │ │ │
│ │ │ github.com, gitea.dideric.is, │ │ │
│ │ │ registry.npmjs.org │ │ │
│ │ │ block: statsig, sentry, autoupdater, * │ │ │
│ │ └────────────┬───────────────────────────────────────┘ │ │
│ └────────────────┼───────────────────────────────────────────┘ │
│ ▼ │
└────────────────────┼─────────────────────────────────────────────┘
Upstream APIs
Why the agent can't reach the sidecar's environ:
┌───────────────────────────────────────────────────────────────┐
│ Different container = different PID, mount, and network ns. │
│ The agent's /proc shows only the agent's own processes; │
│ the cred-proxy PID is not visible — no /proc/<X>/environ │
│ to read, no PID to ptrace, no shared memory. │
│ │
│ Reaching the sidecar's environ requires escaping the agent │
│ container — the same threshold pipelock and git-gate rely │
│ on. Default Docker isolation is the boundary. │
└───────────────────────────────────────────────────────────────┘
```
### New components
- **`claude_bottle/cred_proxy.py`** (new): abstract `CredProxy`
+ `CredProxyPlan` dataclass. `prepare` is host-side and
side-effect-free; renders the route table and resolves
`TokenRef`s against host env. Mirrors the existing `GitGate` /
`Pipelock` shape.
- **`claude_bottle/backend/docker/cred_proxy.py`** (new):
`DockerCredProxy` concrete subclass. `start` does
`docker create` on the bottle's internal network with hostname
`cred-proxy`, copies the route-table file into the container,
then `docker start`. `stop` is idempotent `docker rm -f`.
Container name: `claude-bottle-cred-proxy-<slug>`.
- **`claude_bottle/backend/docker/provision/cred_proxy.py`**
(new): renders `ANTHROPIC_BASE_URL`, `~/.npmrc`,
`~/.gitconfig` `insteadOf` blocks, and `~/.config/tea/config.yml`
into the agent's home for each declared kind — all pointing at
`http://cred-proxy:<PORT>/...`.
- **cred-proxy image.** Minimal base + the proxy binary, no
shell. Pinned by digest, baked at build time. Footprint sized
to match git-gate's image rather than the full agent image.
### Existing code touched
- **`claude_bottle/manifest.py`** — add `TokenEntry`,
`Bottle.tokens: tuple[TokenEntry, ...] = ()`, parse + validate
(at most one entry per `Kind` except `gitea`, which may
carry multiple Urls).
- **`claude_bottle/backend/docker/prepare.py`** — delete the
`CLAUDE_BOTTLE_OAUTH_TOKEN``CLAUDE_CODE_OAUTH_TOKEN` branch
in the agent's forwarded env. The OAuth token is forwarded
into the cred-proxy sidecar's environ at sidecar `docker create`
time instead.
- **`claude_bottle/backend/docker/backend.py`** — instantiate
`DockerCredProxy` alongside `DockerPipelockProxy` and
`DockerGitGate`; thread its `prepare` / `start` / `stop`
through `resolve_plan` / `launch`.
- **`claude_bottle/backend/docker/launch.py`** — add cred-proxy
start/stop to the `ExitStack` alongside pipelock and git-gate;
the sidecar must be up before the agent container starts so
DNS resolution for `cred-proxy` succeeds on first contact.
- **`claude_bottle/backend/docker/bottle_plan.py`** — new
`CredProxyPlan` field; preflight shows kind + ref name +
port + route table.
- **`claude_bottle/pipelock.py`** — drop the `api.anthropic.com`
TLS-MITM branch; the host stays on the allowlist as a plain
HTTPS destination. Confirm the four upstream hosts are
allowlisted by default when `bottle.tokens` declares them.
- **`README.md`** — replace the architecture diagram with the
one above; document the `bottle.tokens` field.
- **`claude-bottle.example.json`** — add a `tokens` array to
one bottle showing each Kind.
- **Tests** — new unit tests for manifest parsing, route table
generation, header injection; new integration tests for the
six success criteria. Delete the bits of `prepare.py` tests
that asserted on `CLAUDE_CODE_OAUTH_TOKEN` landing in the
agent's env.
### Data model changes
```python
@dataclass(frozen=True)
class TokenEntry:
Kind: Literal["anthropic", "github", "gitea", "npm"]
TokenRef: str # name of host env var
Url: str | None = None # required for gitea; defaulted otherwise
@dataclass(frozen=True)
class Bottle:
...
tokens: tuple[TokenEntry, ...] = ()
```
Validation:
- `Kind` must be one of the four supported values.
- `TokenRef` must resolve against `os.environ` at launch (fail
fast with a clear "host env var X is unset" if missing).
- `gitea` entries require `Url`; others fall back to the
documented upstream.
- At most one entry per `Kind` except `gitea`, which may have
multiple distinct `Url`s.
- No silent overlap with `bottle.git` upstreams that already
flow through git-gate; if a `tokens[].Kind: github|gitea`
entry's `Url` collides with a `git[].Upstream`'s host, parse
fails with a "git-gate already brokers this remote, drop one"
hint. (Both paths broker credentials; doubling up is a
configuration smell, not a feature.)
### Routing table
| Kind | Proxy path | Upstream | Header |
|-----------|----------------|-------------------------|----------------------------|
| anthropic | `/anthropic/` | `api.anthropic.com` | `Authorization: Bearer …` |
| github | `/gh-api/` | `api.github.com` | `Authorization: Bearer …` |
| github | `/gh-git/` | `github.com` | `Authorization: Bearer …` |
| gitea | `/gitea/<Url>` | configured `Url` | `Authorization: token …` |
| npm | `/npm/` | `registry.npmjs.org` | `Authorization: Bearer …` |
Gitea uses `Authorization: token` rather than `Bearer` to
sidestep `go-gitea/gitea#16734`. The proxy strips any incoming
`Authorization` header before injecting its own — the agent
cannot smuggle a stolen token through this path.
### External dependencies
The proxy binary. Two real options:
- **Python (stdlib)** — `http.server` + `urllib`/`http.client`,
no new pip packages. Matches CLAUDE.md's "bash-first, low-deps"
posture. SSE pass-through is fiddly but doable.
- **Go single binary** — cleaner SSE story, smaller runtime,
one static binary in a scratch/distroless image. New build
dependency.
Default: Python in a minimal `python:3.X-slim` image (or alpine
if we want smaller). Reconsider in the implementation PR if SSE
behavior is troublesome under load.
No new Python packages. No DB. No admin API. The proxy's
configuration is a single mode-600 JSON file copied into the
sidecar at `docker create` time and read by the proxy at startup
from `/run/cred-proxy/routes.json`.
## Future work
- **AWS / SigV4.** Likely an IMDS emulator sidecar handing out
short-lived STS tokens. Different threat model (the agent
ends up holding the STS creds — the proxy just shortens
their lifetime). Separate PRD.
- **Per-method / per-path allowlist** inside a kind. Once the
set of API operations claude actually performs is observed,
reject everything else. Narrows the within-allowlist surface.
- **Short-lived token minting.** For services that support it
(GitHub Apps, GitLab project-access tokens, fine-grained
PATs with TTL), have the proxy mint a fresh per-session
child credential from a long-lived parent.
- **Smolmachines colocation.** Same packing question as
pipelock / git-gate; under a future microVM backend the
cred-proxy could share a VM with the agent (today's per-bottle
network gives it its own container, not its own VM) or sit in
its own VM (stricter isolation, an extra TCP hop). Backend
decision, not a manifest decision.
- **More kinds.** PyPI, Bun, cargo, Docker Hub. The routing
pattern generalizes; add as needed.
## Considered alternatives
### In-container proxy (root inside the agent container)
Run cred-proxy as PID 1 of the agent container, listening on
`127.0.0.1:<PORT>`, with claude exec'd as `node` (UID 1000) only
after the proxy is bound. The boundary in that shape is the
kernel's cross-UID `ptrace_may_access` check — `node` cannot read
root's `/proc/<pid>/environ` and cannot `ptrace` attach.
Pros: one less container per bottle; slightly faster bottle
startup; no extra docker create/start/stop dance.
Rejected because:
- **Weaker isolation.** The boundary collapses to UID separation
alone. Any container-root compromise inside the agent (setuid
bug in the image, accidentally mounted docker socket, a kernel
CVE, accidental `--privileged`) reads the proxy's environ via
`/proc/<pid>/environ`. The sidecar's namespace separation
cannot be bypassed from inside the agent container without a
container escape.
- **Inconsistent with the existing topology.** pipelock and
git-gate are already sidecars on the bottle's internal network.
cred-proxy slots into the same shape and reuses the same
lifecycle abstractions (`BottleBackend.prepare/start/stop`,
`ExitStack` ordering, plan rendering).
- **Coupled to the agent image.** The proxy binary, its
entrypoint, and its priv-drop logic would all live in the
agent's Dockerfile. A sidecar image evolves independently —
agents can change base, language, or tooling without touching
the proxy.
- **PID-1 babysitting.** The "proxy supervises, then `exec
setpriv → node`" entrypoint introduces a class of issues
(zombie reaping, signal forwarding, exit-code propagation) that
the sidecar shape avoids.
## Open questions
- **Field name.** `bottle.tokens` is the working name. The
research note used `bottle.forge` for the gitea/github
generalization, but "forge" doesn't fit `anthropic` or
`npm`. Alternatives: `bottle.brokered`, `bottle.upstreams`,
`bottle.cred_proxy`. Default: `bottle.tokens`.
- **Python vs Go for the proxy.** Default: Python, revisit
during implementation if SSE pass-through is unreliable.
- **Sidecar image base.** Distroless (smallest, no shell — hardest
to debug), Python slim (debuggable, larger), or scratch + a
statically-linked Go binary (smallest if Go). Default: whatever
fits the chosen language with the smallest non-shell base;
revisit if debuggability bites during implementation.
- **Belt-and-braces on outbound telemetry.** Set
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1` and
`DISABLE_ERROR_REPORTING=1` in the agent's environ by
default? Default: yes — they don't route through
`ANTHROPIC_BASE_URL`, so the proxy doesn't catch them; the
flags are the only off switch.
- **`git push` over a rewritten URL vs. credential-helper
shim.** `[url "http://…"] insteadOf = "https://github.com/"`
captures push/fetch/clone/pull/ls-remote in one config knob;
a credential helper would need separate wiring. Default:
`insteadOf`.
- **Token-refresh story for the Anthropic OAuth token.** The
token is ~1-year and there's no client-side refresh, so the
proxy holds a static value. The 1-year blast radius is the
cost, documented in
[`claude-code-token-revocation.md`](../research/claude-code-token-revocation.md).
No design change here; flagged for awareness.
- **`anthropics/claude-code#36998`.** Older claude-code
versions bypassed `ANTHROPIC_BASE_URL` for some startup
calls (auth validation, org lookup). Marked closed upstream;
the implementation PR verifies with `strace -e connect`
against the pinned claude-code build before trusting the
isolation.
## References
- [`docs/research/agent-credential-proxy-landscape.md`](../research/agent-credential-proxy-landscape.md)
— landscape research; this PRD is the build path that note
recommends.
- [`docs/research/secret-minimization-over-dlp.md`](../research/secret-minimization-over-dlp.md)
— architectural framing: why moving the credential matters
more than scanning egress.
- PRD 0006: pipelock TLS interception — the
`api.anthropic.com` TLS-MITM responsibility cred-proxy takes
over.
- PRD 0008: Git gate — the credential-broker pattern this PRD
reuses (gate holds creds, agent gets a rewritten URL, gate
makes the upstream connection).
- [`anthropics/claude-code#36998`](https://github.com/anthropics/claude-code/issues/36998)
— historic `ANTHROPIC_BASE_URL` bypass.
- [`go-gitea/gitea#16734`](https://github.com/go-gitea/gitea/issues/16734)
— why Gitea uses `Authorization: token`, not `Bearer`.
- [`golang/go#28866`](https://github.com/golang/go/issues/28866)
— the `HTTPS_PROXY` loopback bug; not hit here because we're
a reverse proxy, not a forward proxy.