diff --git a/docs/prds/0007-ssh-egress-gate.md b/docs/prds/0007-ssh-egress-gate.md new file mode 100644 index 0000000..d4fca13 --- /dev/null +++ b/docs/prds/0007-ssh-egress-gate.md @@ -0,0 +1,188 @@ +# PRD 0007: SSH egress gate + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-12 + +## Summary + +Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops +going through pipelock; pipelock keeps full TLS interception with no +SSH carve-outs. + +## Problem + +`git fetch` over SSH from inside an implementer-agent bottle is broken +on `main`. The error surfaced after PRD 0006 enabled pipelock's +native `tls_interception`: + +``` +kex_exchange_identification: Connection closed by remote host +Connection closed by UNKNOWN port 65535 +fatal: Could not read from remote repository. +``` + +The agent's ssh client tunnels through pipelock via a `ProxyCommand +socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT +tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's +SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an +`ssh` entry hits this — including the implementer agent used by the +free-agent workflow, which can't pull or push. + +## Goals / Success Criteria + +Integration test: spin up a bottle with an SSH entry, exec `git +fetch` against a real-ish SSH host from inside the agent, observe +exit 0. This is the same signal that's broken today; flipping it +back to green is the test. + +## Non-goals + +- Pluggable forwarder backend. One TCP forwarder image is baked in; + abstracting over haproxy / nginx-stream / etc. is deferred. +- SSH-protocol awareness. The gate stays at L4. No SSH-version + sniffing, no kex inspection, no per-key gating beyond what ssh + itself enforces inside the agent. +- Replacing pipelock for anything else. HTTPS / HTTP traffic + continues to flow through pipelock unchanged. This PRD adds a + sidecar; it doesn't displace one. +- Connection rate limits or quotas. No per-host or per-agent rate + limiting on the gate; future PRD if it ever matters. + +## Scope + +### In scope + +- **Gate sidecar lifecycle.** `DockerSSHGate` class with + `prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s + shape and network attachment story. +- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat + `ProxyCommand`; `~/.ssh/config` points each `Host` at the gate + container and the per-host listen port. +- **Pipelock carve-out removal.** Strip + `pipelock_bottle_ssh_trusted_domains`, + `pipelock_bottle_ssh_ip_cidrs`, and the related code paths in + `pipelock_build_config` + tests. After this PRD, pipelock has no + knowledge of `bottle.ssh`. +- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N + preflight surface the new gate sidecar (name, listen ports, + upstream targets). + +### Out of scope + +- SSH key generation / rotation. Bottle keys are still + user-supplied via `IdentityFile`; the gate doesn't manage key + material. +- Per-host audit logging. The gate is dumb TCP forwarding; no + in-band visibility into SSH session content. (Connection-level + logs from socat are a nice-to-have, not a goal.) +- Non-Docker backends. Implementation lands for Docker only; the + `BottleBackend` abstraction can grow the hook but other backends + are deferred. +- Manifest schema changes. `bottle.ssh` stays exactly as it is + today; this PRD is internals-only. + +## Proposed Design + +### New services / components + +Mirror the pipelock layout: + +- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` + + `SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free + on docker; renders the forwarder config under `stage_dir`. +- **`claude_bottle/backend/docker/ssh_gate.py`** (new): + `DockerSSHGate` concrete subclass — `start` does `docker create` + on the internal network, copies the config in, attaches the + egress network, `docker start`. `stop` is idempotent `docker rm + -f`. Container name: `claude-bottle-ssh-gate-`. + +Forwarder image: `alpine/socat`, pinned by digest. One socat +process per ssh entry, multiplexed inside the same gate container +via an entrypoint script that backgrounds N socat invocations: + +``` +socat TCP-LISTEN:,reuseaddr,fork TCP:: +``` + +Listen ports are assigned deterministically per ssh entry (e.g. +`30000 + index`). One container, N listeners, N upstreams. + +### Existing code touched + +- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the + `ProxyCommand socat - PROXY:...` plumbing and the + `pipelock_proxy_host_port` import. The rendered `~/.ssh/config` + block per entry becomes: + ``` + Host + HostName + User + Port + IdentityAgent + ``` + `known_hosts` entries are keyed off `` and the new + `[]:` form so OpenSSH's strict + host-key checking still matches. +- **`claude_bottle/pipelock.py`**: delete + `pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`, + `pipelock_bottle_ssh_ip_cidrs`, and the calls into them from + `pipelock_effective_allowlist` and `pipelock_build_config`. The + effective allowlist becomes baked-defaults ∪ `bottle.egress.allowlist`. +- **`claude_bottle/backend/docker/backend.py`**: instantiate + `DockerSSHGate` alongside `DockerPipelockProxy`; thread its + `prepare` / `start` / `stop` through `resolve_plan` / `launch`. +- **`claude_bottle/backend/docker/launch.py`**: add gate start / + stop to the `ExitStack` in the right order — gate must be up + before `provision_ssh` runs so the agent can dial it on first + boot. +- **`claude_bottle/backend/docker/bottle_plan.py`**: new + `SSHGatePlan` field on `DockerBottlePlan`; preflight rendering + surfaces the gate sidecar (name, per-entry listen ports, + upstream `Hostname:Port` targets). +- **Tests**: update `tests/fixtures.py` callers; rewrite + `tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape` + to assert pipelock no longer reflects ssh entries; add unit + tests for `SSHGate.prepare` + render shape; add an integration + test in `tests/integration/` for the `git fetch` round-trip. + +### Data model changes + +None. `bottle.ssh` schema is unchanged; one new internal plan +dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`. + +### External dependencies + +- `alpine/socat` image, pinned by digest (declared next to the + `PIPELOCK_IMAGE` constant). No new Python packages. + +## Open questions + +- Network topology: does the gate need its own per-agent egress + bridge, or can it share pipelock's egress network? Sharing is + simpler; per-gate isolates failure modes. Decide during + implementation; default to "share pipelock's egress network" + unless a concrete reason emerges. +- Socat container restart policy: a single socat that crashes + takes one upstream offline; do we want a wrapper that restarts + individual listeners, or just rely on `docker restart`? Default + to no-restart for v1 (matches pipelock). +- Connection-level audit log: socat's `-v` mode logs every + connect/close. Worth piping into the bottle's stderr stream, or + is that noise? Default off, reconsider if debugging gets hard. +- Docker DNS for the `` hostname inside the + agent: works via Docker's embedded resolver on user-defined + networks. Verify on the `--internal` network specifically before + implementation. + +## References + +- PRD 0001: per-agent egress proxy via pipelock — the parent + topology this PRD slots into. +- PRD 0006: pipelock native TLS interception — the change that + surfaced this regression by making pipelock incompatible with + SSH-over-CONNECT. +- `claude_bottle/backend/docker/provision/ssh.py` — current SSH + provisioning that this PRD rewrites. +- `claude_bottle/pipelock.py` — current pipelock config builder + that gains the `bottle.ssh`-derived fields this PRD removes.