PRD 0001: Per-agent egress proxy via pipelock (#1)

2026-05-08 01:56:43 -04:00
parent 08597ebcf8
commit ba7616a4ae
20 changed files with 1977 additions and 12 deletions
@@ -0,0 +1,222 @@
+# PRD 0001: Per-agent egress proxy via pipelock
+
+- **Status:** Draft
+- **Author:** didericis
+- **Created:** 2026-05-08
+
+## Summary
+
+Run pipelock as a sidecar container on each claude-bottle agent's only
+egress route, scanning all outbound HTTP for hostname allowlist violations
+and DLP matches.
+
+## Problem
+
+Today the agent container has unrestricted internet egress, and even on
+allowed channels there is no content-level inspection. Specifically:
+
+- Containers have unrestricted internet egress; a misbehaving agent can
+  POST to any host.
+- Allowed channels (`api.anthropic.com`, git remotes) can still carry
+  content-level exfil with no detection.
+- DNS exfil via subdomain encoding is not detected anywhere in the stack.
+- MCP tool calls and responses pass through unscanned.
+
+These gaps are documented in `docs/research/network-egress-guard.md`,
+`docs/research/secret-exfil-tripwire-encodings.md`, and
+`docs/research/pipelock-assessment.md`. The pipelock assessment recommends
+adopting pipelock as the v2 sidecar (in place of smokescreen) layered on
+top of a v1 iptables+dnsmasq floor.
+
+## Goals / Success Criteria
+
+The feature works when all of the following are observable:
+
+- The agent container has no default route; `curl https://example.com`
+  from inside fails when `example.com` is not on the allowlist.
+- The agent container can reach `api.anthropic.com` and `claude` runs
+  end-to-end through the proxy.
+- Pipelock blocks a known credential pattern in a request body and
+  surfaces a structured log line for the block.
+- The subdomain-entropy check fires on a `<base64-of-secret>.evil.com`
+  request.
+
+The feature is **done** when all of the following ship:
+
+- `cli.sh start` brings up a per-agent pipelock sidecar on a `--internal`
+  Docker network and points the agent's `HTTPS_PROXY` at it.
+- A per-agent pipelock YAML config is generated from a bottle-level
+  `egress.allowlist` field, plus baked-in defaults for Claude Code's
+  required hosts so basic bottles work out of the box.
+- The existing `cli.sh` y/N preflight shows the resolved allowlist before
+  launch.
+- When the agent container exits, the pipelock sidecar and the internal
+  network are torn down cleanly (no orphaned containers or networks).
+
+## Non-goals
+
+- Closing every exfil vector. SSH session content, raw TCP, UDP, ICMP,
+  and TLS-SNI domain fronting all remain known gaps after this PRD ships
+  and are explicitly out of scope.
+- Audit logging or persistent log storage of pipelock decisions. v1 logs
+  to stdout only; durable logging is a follow-up PRD.
+- Replacing the v1 iptables layer. Pipelock sits above iptables, not in
+  place of it (see `pipelock-assessment.md` §Recommendation).
+- Multi-tenant or remote-pipelock deployments. v1 is one pipelock
+  container per agent container, on the same Docker host.
+
+## Scope
+
+### In scope
+
+- New manifest schema: an `egress` object on each bottle, with
+  `allowlist: [hostname]` for v1.
+- Generation of a per-bottle pipelock YAML config at launch.
+- Per-agent Docker `--internal` network creation and teardown.
+- Pipelock sidecar container lifecycle (start, attach to network,
+  receive config, stop on agent exit).
+- `HTTPS_PROXY` / `HTTP_PROXY` injection into the agent container.
+- Preflight integration: the existing y/N plan in `cli.sh` lists the
+  resolved allowlist.
+
+### Out of scope
+
+- The v1 iptables + ipset + dnsmasq layer (separate PRD; see
+  `network-egress-guard.md`).
+- TLS interception / domain-fronting mitigation. Pipelock does not
+  terminate TLS and this PRD does not introduce CA-trust injection.
+- Per-bottle DLP rule customization beyond pipelock's 48 built-in
+  patterns. Custom signed rule bundles are deferred.
+- Mediator-signed action receipts and any other pipelock features
+  potentially gated under the ELv2 enterprise subtree (see open
+  question on licensing in `pipelock-assessment.md`).
+
+## Proposed Design
+
+### New services / components
+
+Two new files under `lib/`:
+
+- **`lib/pipelock.sh`** — pipelock-specific logic. Generates the
+  per-bottle YAML config from the manifest's `egress` block plus baked-in
+  defaults; copies the YAML into the sidecar via `docker cp`; starts and
+  stops the sidecar container; resolves the allowlist for display in the
+  preflight.
+- **`lib/network.sh`** — Docker network plumbing. Creates the per-agent
+  `--internal` network (named `claude-bottle-net-<slug>` with the same
+  slug-and-suffix scheme used for container names), attaches the agent
+  and sidecar to it, removes it on teardown. Kept separate from
+  `lib/docker.sh` so a future PRD can add non-pipelock network controls
+  without entangling them with pipelock specifics.
+
+This split mirrors the existing per-concern lib/ pattern
+(`manifest.sh`, `env_resolve.sh`, `skills.sh`, `ssh.sh`).
+
+### Existing code touched
+
+- **`cli.sh`** — wire the new lifecycle into `start`: create the
+  internal network, launch the pipelock sidecar, then launch the agent
+  container with `HTTPS_PROXY` / `HTTP_PROXY` set to the sidecar's
+  service name. Add the resolved allowlist to the preflight y/N output.
+  Tear down sidecar + network in the existing exit trap.
+- **`README.md`** — public-facing description should mention that
+  agent containers route HTTP egress through pipelock by default, and
+  document the new `egress.allowlist` bottle field.
+
+`Dockerfile` is intentionally not touched for v1 — `HTTPS_PROXY` /
+`HTTP_PROXY` are injected per-launch via `docker run -e`, not baked into
+the image. This keeps the image agnostic to whether a sidecar is in use
+(useful if a future bottle definition opts out of the proxy for testing).
+
+`lib/docker.sh` may grow one or two helpers if there is a clean place
+for shared primitives, but the network-specific helpers live in
+`lib/network.sh`. Decide during implementation; not a contract.
+
+### Data model changes
+
+The bottle schema gains an `egress` object. The structure is designed
+to allow incremental additions without a breaking rename:
+
+```jsonc
+{
+  "bottles": {
+    "default": {
+      "env":  { "...": "..." },
+      "ssh":  [],
+      "egress": {
+        "allowlist": [
+          "api.anthropic.com",
+          "github.com"
+        ]
+      }
+    }
+  }
+}
+```
+
+Resolution rules:
+
+- The effective allowlist is `<baked-in-defaults> ∪ <bottle.egress.allowlist>`.
+- Baked-in defaults cover hosts Claude Code itself needs:
+  `api.anthropic.com`, `statsig.anthropic.com`, `sentry.io`,
+  `claude.ai`, `platform.claude.com`, `downloads.claude.ai`,
+  `raw.githubusercontent.com` (per `pipelock-assessment.md` and
+  Claude Code's network-config docs).
+- Bottles with no `egress` block use defaults only.
+- Future keys (`dlp`, `mode`, `data_budget`, etc.) are reserved under
+  the same `egress` object; v1 ignores unknown keys.
+
+The `agent` schema is unchanged. Egress is a property of the
+container/sandbox, not the task — multiple agents pointing at the same
+bottle share the same allowlist.
+
+### External dependencies
+
+- **Pipelock binary** is pulled from
+  `ghcr.io/luckypipewrench/pipelock@sha256:<digest>`. The digest is
+  pinned in `lib/pipelock.sh` (or a sibling `.env`-shaped constants
+  file) and bumped deliberately, mirroring the claude-code version
+  pinning pattern in `Dockerfile`.
+- No new host-side runtimes. The pipelock image is the only new
+  external artifact.
+
+## Open questions
+
+- **ELv2 licensing.** Several capabilities discussed in
+  `pipelock-assessment.md` (mediator-signed action receipts, signed
+  rule bundles) may live under the `enterprise/` subtree and require
+  accepting Elastic License v2 terms. Before implementation, audit
+  which features used by this PRD are Apache-2.0-core. v1's plan
+  (proxy + 48 default DLP patterns + subdomain entropy + sidecar
+  topology) is expected to be core-only, but this should be confirmed.
+- **Where to put the digest pin.** A constant in `lib/pipelock.sh` is
+  the lowest-friction option; a separate `lib/versions.sh` (or similar)
+  may be cleaner once there are multiple pinned dependencies. Decide
+  during implementation.
+- **Per-agent overrides.** The PRD scopes egress to the bottle. If a
+  later use case calls for tightening (not loosening) the allowlist for
+  one agent within a bottle, revisit. Out of scope for v1.
+- **Default-allowlist drift.** Claude Code's required hostnames may
+  change with new versions. v1 hardcodes the current set; a follow-up
+  could derive them from the pinned claude-code version or a published
+  manifest from Anthropic.
+- **Sidecar log surface.** Pipelock decisions go to the sidecar's
+  stdout. v1 leaves these visible only via `docker logs <sidecar>` —
+  fine for inspection but not aggregated. Persistent / structured
+  logging is a non-goal here, called out for the follow-up.
+- **DNS resolver routing.** Pipelock's subdomain-entropy check fires
+  on URLs it sees, not on raw UDP/53. Without the v1 dnsmasq layer the
+  agent could still query a non-allowlisted resolver directly. Document
+  the dependency on the v1 PRD (or note explicitly that v1 of this PRD
+  ships with that gap if the iptables PRD lands later).
+
+## References
+
+- `docs/research/pipelock-assessment.md` — recommendation and rationale.
+- `docs/research/network-egress-guard.md` — v1 iptables+dnsmasq baseline.
+- `docs/research/secret-exfil-tripwire-encodings.md` — content-tripwire
+  framing this PRD partially addresses via pipelock's DLP layer.
+- Pipelock README:
+  <https://github.com/luckyPipewrench/pipelock/blob/main/README.md>
+- Claude Code network configuration:
+  <https://code.claude.com/docs/en/network-config>