Files

T

didericis 3747927b9e docs: align cred-proxy architecture diagram

Trim one trailing space from the four arrow/HTTPS rows and add
one dash to the bottle-container bottom edge so all box-bound
lines are 68 columns.

2026-05-13 15:35:37 -04:00

21 KiB

Raw Blame History

PRD 0010: Credential proxy for agent-bound API tokens

Status: Draft
Author: didericis
Created: 2026-05-13

Summary

Per-bottle reverse proxy that holds API tokens (Anthropic OAuth, GitHub PAT, Gitea PAT, npm token) in a root-owned process inside the agent container. The agent (node, UID 1000) keeps only URLs in its environ; the proxy injects the right Authorization header and forwards over TLS. The boundary that makes this meaningful is the kernel's ptrace_may_access check: node cannot read root's /proc/<pid>/environ and cannot ptrace attach without CAP_SYS_PTRACE / CAP_PERFMON, which claude-bottle does not grant.

AWS / SigV4 is explicitly out of scope — it is per-request signing, not header injection, and does not fit this proxy's shape. If a bottle needs AWS credentials later, that lives in a separate PRD.

Problem

Today CLAUDE_CODE_OAUTH_TOKEN (and any bottle.env secrets such as a Gitea PAT, GitHub PAT, or npm token) gets docker run -e'd straight into the agent's environ. Inside the bottle the agent runs as node with --dangerously-skip-permissions; its Bash tool can do printenv, cat /proc/self/environ, or node -e 'console.log(process.env)' and capture every value into the conversation. From there a prompt-injected or hijacked agent can exfil over any allowed egress (api.anthropic.com itself if nothing else).

Linux has no per-env-var ACL — once a variable is in a process's environ, the process and its descendants own it. The credible boundary is process-level: hold the credential in a different process the agent cannot read. Default Docker already enforces that boundary at the kernel line via ptrace_may_access, the same property the (removed) ssh-gate and the current git-gate rely on.

The research note agent-credential-proxy-landscape.md surveys the existing tools and concludes that a small claude-bottle-specific reverse proxy is less work and less risk than either adopting nono (alpha, unaudited) or Infisical Agent Vault (TLS-MITM topology that doubles up on pipelock's CA stack). This PRD is the build.

Goals / Success Criteria

Each test runs inside a bottle whose manifest declares the four supported kinds (anthropic, github, gitea, npm):

No plaintext tokens in the agent's environ. printenv and cat /proc/self/environ from the agent's shell return only URLs pointing at 127.0.0.1:<PORT>/.... None of the bottle.tokens[].TokenRef values appear.
Kernel boundary holds. From the agent's shell, cat /proc/<cred-proxy-pid>/environ returns EACCES and gdb -p <cred-proxy-pid> / strace -p <cred-proxy-pid> fails with EPERM.
Anthropic API works. claude makes a successful streaming tool-use round-trip via ANTHROPIC_BASE_URL → 127.0.0.1:<PORT>/anthropic. SSE chunks arrive without buffering; anthropic-version, anthropic-beta, and X-Claude-Code-Session-Id headers round-trip untouched.
Git push to declared remotes works. git push against a bottle.tokens[].Kind: github or gitea upstream succeeds; the upstream sees the gate's token, not the agent's.
npm install works. npm install <public-package> succeeds against the registry pointed at the proxy. A scoped install that requires the token (e.g. against a private registry) also succeeds.
Wrong token rejected at the source, not silently swapped. If the agent tries to send its own Authorization: … header, the proxy strips and replaces with the configured one. A manifest token revoked at the upstream produces a 401 to the agent, not a 5xx.

Non-goals

AWS / SigV4. Per-request signing is a different shape; a bearer-injecting proxy doesn't help. Hold for a future PRD (likely an IMDS emulator sidecar handing out short-lived STS credentials).
DB-backed credential store. Flat env / mode-600 file only. The LiteLLM CVE-2026-42208 incident is the cautionary tale: any DB-backed credential gateway is itself a high-value attack target.
Generic LLM-gateway features. No cost tracking, no fallbacks, no virtual keys, no multi-tenant routing, no usage metering. The proxy is a credential-injection trust endpoint, not a gateway.
Subsuming pipelock. pipelock keeps its egress-allowlist role. It drops the api.anthropic.com TLS-MITM job because cred-proxy is now the trust endpoint for that host; everything else pipelock does stays.
TLS interception inside the bottle. The agent talks plain HTTP to loopback; cred-proxy speaks real HTTPS outbound. No container-local CA, no golang/go#28866 loopback workaround.
Cross-bottle credential sharing. One proxy per bottle, same one-sidecar-per-agent posture as pipelock and git-gate.
claude --bare mode. Reads only ANTHROPIC_API_KEY, not the OAuth token. Not in claude-bottle's flow today.
MCP-server tokens, package-installer tokens for languages beyond npm. PyPI / Bun / cargo can land in a follow-up if needed; the routing pattern generalizes.

Scope

In scope

Manifest field. bottle.tokens: [TokenEntry, ...]. Each entry carries Kind (anthropic | github | gitea | npm), an optional Url (required for gitea, defaulted for the others), and TokenRef (the name of a host env var the CLI resolves at launch time).
cred-proxy process. Runs as root inside the agent container, listens on 127.0.0.1:<PORT>. Holds the tokens in its own environ — never on argv, never written to disk. Per-Kind route handler: inject the right header, forward over TLS, stream the response back to the client without buffering.
Agent-side rewrites. Provisioner writes:
- ANTHROPIC_BASE_URL=http://127.0.0.1:<PORT>/anthropic to the agent's environ
- ~/.npmrc registry = http://127.0.0.1:<PORT>/npm/
- ~/.gitconfig [url …] insteadOf = … for each declared github / gitea upstream
- ~/.config/tea/config.yml with the proxy URL for each declared gitea entry
Process lifecycle. Container entrypoint launches the proxy first as root, waits for it to bind, then exec setpriv … --reuid=node --regid=node … for the claude child. Proxy death is fatal (the container exits); this is also the PID-1-zombie story.
pipelock interop. Drop api.anthropic.com from pipelock's TLS-MITM list; keep it on the allowlist as a plain HTTPS host (cred-proxy is the trust endpoint now). Verify pipelock still lets cred-proxy's HTTPS connections out for the four upstream hosts.
Plan rendering. bottle_plan.py and the y/N preflight show: which tokens are configured (kind + ref name, not the value), the proxy port, the routes the proxy will publish.
Drop the existing CLAUDE_CODE_OAUTH_TOKEN forward in prepare.py. Today it lands in the agent's environ; once this PRD ships, it lands in the proxy's environ instead.
Tests. Integration tests for each of the six success criteria; unit tests for manifest parsing, route table generation, header injection.

Out of scope

AWS / SigV4 (see Non-goals).
Per-method / per-path allowlist inside a kind. Defer to a follow-up once observed traffic stabilizes.
Replacing bottle.env for non-token secrets. The proxy handles the four kinds listed above; other env vars keep their current path.
Migrating an in-flight bottle from "token in agent env" to "token via proxy" mid-session. Restart required.
Audit logging. The proxy doesn't write request logs in v1. Add only if a concrete debugging need surfaces.

Proposed Design

Architecture

┌── Host (macOS) ──────────────────────────────────────────────────┐
│   Secrets at rest (keychain / .env):                             │
│     CLAUDE_BOTTLE_OAUTH_TOKEN, GITHUB_TOKEN,                     │
│     GITEA_SERVER_TOKEN, NPM_TOKEN                                │
│        │ docker run -e KEY  (no =VALUE on argv)                  │
│        ▼                                                         │
│   ┌── Bottle container ────────────────────────────────────────┐ │
│   │                                                            │ │
│   │   ┌── UID 1000 (node) ─────────────────────────────────┐   │ │
│   │   │  claude --dangerously-skip-permissions             │   │ │
│   │   │  environ: URLs only, no plaintext tokens           │   │ │
│   │   │    ANTHROPIC_BASE_URL=http://127.0.0.1:PORT/anth.. │   │ │
│   │   │    npm  registry    → http://127.0.0.1:PORT/npm/   │   │ │
│   │   │    git  remote.url  → http://127.0.0.1:PORT/...    │   │ │
│   │   │    tea  --url       → http://127.0.0.1:PORT/gitea  │   │ │
│   │   └────────────┬───────────────────────────────────────┘   │ │
│   │                │ plain HTTP, loopback                      │ │
│   │                ▼                                           │ │
│   │   ┌── UID 0 (root) ────────────────────────────────────┐   │ │
│   │   │  cred-proxy   listens 127.0.0.1:PORT               │   │ │
│   │   │  tokens live ONLY in this process's environ        │   │ │
│   │   │  per-route: inject auth header, forward over TLS   │   │ │
│   │   │    /anthropic → api.anthropic.com   Bearer         │   │ │
│   │   │    /gh-api    → api.github.com      Bearer         │   │ │
│   │   │    /gh-git    → github.com          Bearer         │   │ │
│   │   │    /gitea     → gitea.dideric.is    token          │   │ │
│   │   │    /npm       → registry.npmjs.org  Bearer         │   │ │
│   │   │  SSE pass-through, no buffering                    │   │ │
│   │   └────────────┬───────────────────────────────────────┘   │ │
│   │                │ HTTPS                                     │ │
│   │                ▼                                           │ │
│   │   ┌── pipelock (egress allowlist) ─────────────────────┐   │ │
│   │   │  allow: api.anthropic.com, api.github.com,         │   │ │
│   │   │         github.com, gitea.dideric.is,              │   │ │
│   │   │         registry.npmjs.org                         │   │ │
│   │   │  block: statsig, sentry, autoupdater, *            │   │ │
│   │   └────────────┬───────────────────────────────────────┘   │ │
│   └────────────────┼───────────────────────────────────────────┘ │
│                    ▼                                             │
└────────────────────┼─────────────────────────────────────────────┘
                     ▼
              Upstream APIs


Why node@1000 can't just steal the tokens:
   ┌─────────────────────────────────────────────────────────┐
   │  node tries:                                            │
   │     cat /proc/<cred-proxy-pid>/environ   → EACCES       │
   │     ptrace(PTRACE_ATTACH, <cred-proxy-pid>, ...) → EPERM│
   │  Kernel's ptrace_may_access rejects: UID mismatch       │
   │  and no CAP_SYS_PTRACE / CAP_PERFMON in the container.  │
   └─────────────────────────────────────────────────────────┘

New components

claude_bottle/cred_proxy.py (new): abstract CredProxy
- CredProxyPlan dataclass. prepare is host-side and side-effect-free on Docker; renders the route table and resolves TokenRefs against host env. Mirrors the existing GitGate / Pipelock shape.
claude_bottle/backend/docker/cred_proxy.py (new): DockerCredProxy concrete subclass. Bakes the proxy binary into the agent image; start writes the route table to a mode-600 file under stage_dir and arranges the entrypoint so the proxy boots first.
claude_bottle/backend/docker/provision/cred_proxy.py (new): renders ANTHROPIC_BASE_URL, ~/.npmrc, ~/.gitconfig insteadOf blocks, and ~/.config/tea/config.yml into the agent's home for each declared kind.
The proxy binary itself. Bundled into the agent image at /usr/local/libexec/cred-proxy. See "External dependencies" for the language choice.

Existing code touched

claude_bottle/manifest.py — add TokenEntry, Bottle.tokens: tuple[TokenEntry, ...] = (), parse + validate (at most one entry per Kind except gitea, which may carry multiple Urls).
claude_bottle/backend/docker/prepare.py — delete the CLAUDE_BOTTLE_OAUTH_TOKEN → CLAUDE_CODE_OAUTH_TOKEN branch in the agent's forwarded env. The OAuth token now flows to the proxy's environ via the cred-proxy lifecycle.
claude_bottle/backend/docker/backend.py — instantiate DockerCredProxy; thread its prepare / start / stop through resolve_plan / launch.
claude_bottle/backend/docker/launch.py — add cred-proxy start before the cred-proxy provisioner runs (provisioner writes URLs that reference the proxy port, so it must be up).
claude_bottle/backend/docker/bottle_plan.py — new CredProxyPlan field; preflight shows kind + ref name + port + route table.
claude_bottle/pipelock.py — drop the api.anthropic.com TLS-MITM branch; the host stays on the allowlist as a plain HTTPS destination. Confirm the four upstream hosts are allowlisted by default when bottle.tokens declares them.
README.md — replace the architecture diagram with the one above; document the bottle.tokens field.
claude-bottle.example.json — add a tokens array to one bottle showing each Kind.
Tests — new unit tests for manifest parsing, route table generation, header injection; new integration tests for the six success criteria. Delete the bits of prepare.py tests that asserted on CLAUDE_CODE_OAUTH_TOKEN landing in the agent's env.

Data model changes

@dataclass(frozen=True)
class TokenEntry:
    Kind: Literal["anthropic", "github", "gitea", "npm"]
    TokenRef: str             # name of host env var
    Url: str | None = None    # required for gitea; defaulted otherwise

@dataclass(frozen=True)
class Bottle:
    ...
    tokens: tuple[TokenEntry, ...] = ()

Validation:

Kind must be one of the four supported values.
TokenRef must resolve against os.environ at launch (fail fast with a clear "host env var X is unset" if missing).
gitea entries require Url; others fall back to the documented upstream.
At most one entry per Kind except gitea, which may have multiple distinct Urls.
No silent overlap with bottle.git upstreams that already flow through git-gate; if a tokens[].Kind: github|gitea entry's Url collides with a git[].Upstream's host, parse fails with a "git-gate already brokers this remote, drop one" hint. (Both paths broker credentials; doubling up is a configuration smell, not a feature.)

Routing table

Kind	Proxy path	Upstream	Header
anthropic	`/anthropic/`	`api.anthropic.com`	`Authorization: Bearer …`
github	`/gh-api/`	`api.github.com`	`Authorization: Bearer …`
github	`/gh-git/`	`github.com`	`Authorization: Bearer …`
gitea	`/gitea/<Url>`	configured `Url`	`Authorization: token …`
npm	`/npm/`	`registry.npmjs.org`	`Authorization: Bearer …`

Gitea uses Authorization: token rather than Bearer to sidestep go-gitea/gitea#16734. The proxy strips any incoming Authorization header before injecting its own — the agent cannot smuggle a stolen token through this path.

External dependencies

The proxy binary. Two real options:

Python (stdlib) — http.server + urllib/http.client, no new pip packages. Matches CLAUDE.md's "bash-first, low-deps" posture. SSE pass-through is fiddly but doable.
Go single binary — cleaner SSE story, smaller runtime, one static binary baked into the image. New build dependency.

Default: Python, baked into the agent image. Reconsider in the implementation PR if SSE behavior is troublesome under load.

No new Python packages. No DB. No admin API. The proxy's configuration is a single mode-600 JSON file passed in via /run/cred-proxy/routes.json.

Future work

AWS / SigV4. Likely an IMDS emulator sidecar handing out short-lived STS tokens. Different threat model (the agent ends up holding the STS creds — the proxy just shortens their lifetime). Separate PRD.
Per-method / per-path allowlist inside a kind. Once the set of API operations claude actually performs is observed, reject everything else. Narrows the within-allowlist surface.
Short-lived token minting. For services that support it (GitHub Apps, GitLab project-access tokens, fine-grained PATs with TTL), have the proxy mint a fresh per-session child credential from a long-lived parent.
Smolmachines colocation. Same packing question as pipelock / git-gate; the cred-proxy can sit inside the agent VM (current shape) or in a separate VM (stricter isolation, per-bottle TCP hop). Backend decision, not a manifest decision.
More kinds. PyPI, Bun, cargo, Docker Hub. The routing pattern generalizes; add as needed.

Open questions

Field name. bottle.tokens is the working name. The research note used bottle.forge for the gitea/github generalization, but "forge" doesn't fit anthropic or npm. Alternatives: bottle.brokered, bottle.upstreams, bottle.cred_proxy. Default: bottle.tokens.
Python vs Go for the proxy. Default: Python, revisit during implementation if SSE pass-through is unreliable.
Process inside the agent container vs sidecar container. v1: inside (simpler lifecycle, no extra container; ptrace boundary is enough). The sidecar option becomes attractive only if we want a network-layer split between proxy and agent on top of the UID split.
Belt-and-braces on outbound telemetry. Set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 and DISABLE_ERROR_REPORTING=1 in the agent's environ by default? Default: yes — they don't route through ANTHROPIC_BASE_URL, so the proxy doesn't catch them; the flags are the only off switch.
git push over a rewritten URL vs. credential-helper shim. [url "http://…"] insteadOf = "https://github.com/" captures push/fetch/clone/pull/ls-remote in one config knob; a credential helper would need separate wiring. Default: insteadOf.
Token-refresh story for the Anthropic OAuth token. The token is ~1-year and there's no client-side refresh, so the proxy holds a static value. The 1-year blast radius is the cost, documented in claude-code-token-revocation.md. No design change here; flagged for awareness.
anthropics/claude-code#36998. Older claude-code versions bypassed ANTHROPIC_BASE_URL for some startup calls (auth validation, org lookup). Marked closed upstream; the implementation PR verifies with strace -e connect against the pinned claude-code build before trusting the isolation.

References

docs/research/agent-credential-proxy-landscape.md — landscape research; this PRD is the build path that note recommends.
docs/research/secret-minimization-over-dlp.md — architectural framing: why moving the credential matters more than scanning egress.
PRD 0006: pipelock TLS interception — the api.anthropic.com TLS-MITM responsibility cred-proxy takes over.
PRD 0008: Git gate — the credential-broker pattern this PRD reuses (gate holds creds, agent gets a rewritten URL, gate makes the upstream connection).
anthropics/claude-code#36998 — historic ANTHROPIC_BASE_URL bypass.
go-gitea/gitea#16734 — why Gitea uses Authorization: token, not Bearer.
golang/go#28866 — the HTTPS_PROXY loopback bug; not hit here because we're a reverse proxy, not a forward proxy.

21 KiB Raw Blame History