docs(prd): 0007 SSH egress gate
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 15s

PRD 0006 enabled pipelock's native TLS interception, which broke
git fetch over SSH from inside the agent: pipelock's SNI gate
rejects the SSH banner that follows CONNECT. Document the
architectural fix — a dedicated per-agent TCP-forwarder sidecar
built from bottle.ssh entries — so pipelock can stay maximally
strict on the HTTPS path with no SSH carve-outs.
This commit is contained in:
2026-05-12 15:41:26 -04:00
parent 6eb898ffca
commit 02a0fe679d
+188
View File
@@ -0,0 +1,188 @@
# PRD 0007: SSH egress gate
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-12
## Summary
Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops
going through pipelock; pipelock keeps full TLS interception with no
SSH carve-outs.
## Problem
`git fetch` over SSH from inside an implementer-agent bottle is broken
on `main`. The error surfaced after PRD 0006 enabled pipelock's
native `tls_interception`:
```
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
fatal: Could not read from remote repository.
```
The agent's ssh client tunnels through pipelock via a `ProxyCommand
socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT
tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's
SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an
`ssh` entry hits this — including the implementer agent used by the
free-agent workflow, which can't pull or push.
## Goals / Success Criteria
Integration test: spin up a bottle with an SSH entry, exec `git
fetch` against a real-ish SSH host from inside the agent, observe
exit 0. This is the same signal that's broken today; flipping it
back to green is the test.
## Non-goals
- Pluggable forwarder backend. One TCP forwarder image is baked in;
abstracting over haproxy / nginx-stream / etc. is deferred.
- SSH-protocol awareness. The gate stays at L4. No SSH-version
sniffing, no kex inspection, no per-key gating beyond what ssh
itself enforces inside the agent.
- Replacing pipelock for anything else. HTTPS / HTTP traffic
continues to flow through pipelock unchanged. This PRD adds a
sidecar; it doesn't displace one.
- Connection rate limits or quotas. No per-host or per-agent rate
limiting on the gate; future PRD if it ever matters.
## Scope
### In scope
- **Gate sidecar lifecycle.** `DockerSSHGate` class with
`prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s
shape and network attachment story.
- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat
`ProxyCommand`; `~/.ssh/config` points each `Host` at the gate
container and the per-host listen port.
- **Pipelock carve-out removal.** Strip
`pipelock_bottle_ssh_trusted_domains`,
`pipelock_bottle_ssh_ip_cidrs`, and the related code paths in
`pipelock_build_config` + tests. After this PRD, pipelock has no
knowledge of `bottle.ssh`.
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
preflight surface the new gate sidecar (name, listen ports,
upstream targets).
### Out of scope
- SSH key generation / rotation. Bottle keys are still
user-supplied via `IdentityFile`; the gate doesn't manage key
material.
- Per-host audit logging. The gate is dumb TCP forwarding; no
in-band visibility into SSH session content. (Connection-level
logs from socat are a nice-to-have, not a goal.)
- Non-Docker backends. Implementation lands for Docker only; the
`BottleBackend` abstraction can grow the hook but other backends
are deferred.
- Manifest schema changes. `bottle.ssh` stays exactly as it is
today; this PRD is internals-only.
## Proposed Design
### New services / components
Mirror the pipelock layout:
- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` +
`SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free
on docker; renders the forwarder config under `stage_dir`.
- **`claude_bottle/backend/docker/ssh_gate.py`** (new):
`DockerSSHGate` concrete subclass — `start` does `docker create`
on the internal network, copies the config in, attaches the
egress network, `docker start`. `stop` is idempotent `docker rm
-f`. Container name: `claude-bottle-ssh-gate-<slug>`.
Forwarder image: `alpine/socat`, pinned by digest. One socat
process per ssh entry, multiplexed inside the same gate container
via an entrypoint script that backgrounds N socat invocations:
```
socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>
```
Listen ports are assigned deterministically per ssh entry (e.g.
`30000 + index`). One container, N listeners, N upstreams.
### Existing code touched
- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the
`ProxyCommand socat - PROXY:...` plumbing and the
`pipelock_proxy_host_port` import. The rendered `~/.ssh/config`
block per entry becomes:
```
Host <name>
HostName <gate-container>
User <user>
Port <listen-port>
IdentityAgent <public-socket>
```
`known_hosts` entries are keyed off `<name>` and the new
`[<gate-container>]:<listen-port>` form so OpenSSH's strict
host-key checking still matches.
- **`claude_bottle/pipelock.py`**: delete
`pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`,
`pipelock_bottle_ssh_ip_cidrs`, and the calls into them from
`pipelock_effective_allowlist` and `pipelock_build_config`. The
effective allowlist becomes baked-defaults `bottle.egress.allowlist`.
- **`claude_bottle/backend/docker/backend.py`**: instantiate
`DockerSSHGate` alongside `DockerPipelockProxy`; thread its
`prepare` / `start` / `stop` through `resolve_plan` / `launch`.
- **`claude_bottle/backend/docker/launch.py`**: add gate start /
stop to the `ExitStack` in the right order — gate must be up
before `provision_ssh` runs so the agent can dial it on first
boot.
- **`claude_bottle/backend/docker/bottle_plan.py`**: new
`SSHGatePlan` field on `DockerBottlePlan`; preflight rendering
surfaces the gate sidecar (name, per-entry listen ports,
upstream `Hostname:Port` targets).
- **Tests**: update `tests/fixtures.py` callers; rewrite
`tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape`
to assert pipelock no longer reflects ssh entries; add unit
tests for `SSHGate.prepare` + render shape; add an integration
test in `tests/integration/` for the `git fetch` round-trip.
### Data model changes
None. `bottle.ssh` schema is unchanged; one new internal plan
dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`.
### External dependencies
- `alpine/socat` image, pinned by digest (declared next to the
`PIPELOCK_IMAGE` constant). No new Python packages.
## Open questions
- Network topology: does the gate need its own per-agent egress
bridge, or can it share pipelock's egress network? Sharing is
simpler; per-gate isolates failure modes. Decide during
implementation; default to "share pipelock's egress network"
unless a concrete reason emerges.
- Socat container restart policy: a single socat that crashes
takes one upstream offline; do we want a wrapper that restarts
individual listeners, or just rely on `docker restart`? Default
to no-restart for v1 (matches pipelock).
- Connection-level audit log: socat's `-v` mode logs every
connect/close. Worth piping into the bottle's stderr stream, or
is that noise? Default off, reconsider if debugging gets hard.
- Docker DNS for the `<gate-container>` hostname inside the
agent: works via Docker's embedded resolver on user-defined
networks. Verify on the `--internal` network specifically before
implementation.
## References
- PRD 0001: per-agent egress proxy via pipelock — the parent
topology this PRD slots into.
- PRD 0006: pipelock native TLS interception — the change that
surfaced this regression by making pipelock incompatible with
SSH-over-CONNECT.
- `claude_bottle/backend/docker/provision/ssh.py` — current SSH
provisioning that this PRD rewrites.
- `claude_bottle/pipelock.py` — current pipelock config builder
that gains the `bottle.ssh`-derived fields this PRD removes.