docs(prd): 0007 SSH egress gate
PRD 0006 enabled pipelock's native TLS interception, which broke git fetch over SSH from inside the agent: pipelock's SNI gate rejects the SSH banner that follows CONNECT. Document the architectural fix — a dedicated per-agent TCP-forwarder sidecar built from bottle.ssh entries — so pipelock can stay maximally strict on the HTTPS path with no SSH carve-outs.
This commit is contained in:
@@ -0,0 +1,188 @@
|
||||
# PRD 0007: SSH egress gate
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** didericis
|
||||
- **Created:** 2026-05-12
|
||||
|
||||
## Summary
|
||||
|
||||
Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops
|
||||
going through pipelock; pipelock keeps full TLS interception with no
|
||||
SSH carve-outs.
|
||||
|
||||
## Problem
|
||||
|
||||
`git fetch` over SSH from inside an implementer-agent bottle is broken
|
||||
on `main`. The error surfaced after PRD 0006 enabled pipelock's
|
||||
native `tls_interception`:
|
||||
|
||||
```
|
||||
kex_exchange_identification: Connection closed by remote host
|
||||
Connection closed by UNKNOWN port 65535
|
||||
fatal: Could not read from remote repository.
|
||||
```
|
||||
|
||||
The agent's ssh client tunnels through pipelock via a `ProxyCommand
|
||||
socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT
|
||||
tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's
|
||||
SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an
|
||||
`ssh` entry hits this — including the implementer agent used by the
|
||||
free-agent workflow, which can't pull or push.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
Integration test: spin up a bottle with an SSH entry, exec `git
|
||||
fetch` against a real-ish SSH host from inside the agent, observe
|
||||
exit 0. This is the same signal that's broken today; flipping it
|
||||
back to green is the test.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Pluggable forwarder backend. One TCP forwarder image is baked in;
|
||||
abstracting over haproxy / nginx-stream / etc. is deferred.
|
||||
- SSH-protocol awareness. The gate stays at L4. No SSH-version
|
||||
sniffing, no kex inspection, no per-key gating beyond what ssh
|
||||
itself enforces inside the agent.
|
||||
- Replacing pipelock for anything else. HTTPS / HTTP traffic
|
||||
continues to flow through pipelock unchanged. This PRD adds a
|
||||
sidecar; it doesn't displace one.
|
||||
- Connection rate limits or quotas. No per-host or per-agent rate
|
||||
limiting on the gate; future PRD if it ever matters.
|
||||
|
||||
## Scope
|
||||
|
||||
### In scope
|
||||
|
||||
- **Gate sidecar lifecycle.** `DockerSSHGate` class with
|
||||
`prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s
|
||||
shape and network attachment story.
|
||||
- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat
|
||||
`ProxyCommand`; `~/.ssh/config` points each `Host` at the gate
|
||||
container and the per-host listen port.
|
||||
- **Pipelock carve-out removal.** Strip
|
||||
`pipelock_bottle_ssh_trusted_domains`,
|
||||
`pipelock_bottle_ssh_ip_cidrs`, and the related code paths in
|
||||
`pipelock_build_config` + tests. After this PRD, pipelock has no
|
||||
knowledge of `bottle.ssh`.
|
||||
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
|
||||
preflight surface the new gate sidecar (name, listen ports,
|
||||
upstream targets).
|
||||
|
||||
### Out of scope
|
||||
|
||||
- SSH key generation / rotation. Bottle keys are still
|
||||
user-supplied via `IdentityFile`; the gate doesn't manage key
|
||||
material.
|
||||
- Per-host audit logging. The gate is dumb TCP forwarding; no
|
||||
in-band visibility into SSH session content. (Connection-level
|
||||
logs from socat are a nice-to-have, not a goal.)
|
||||
- Non-Docker backends. Implementation lands for Docker only; the
|
||||
`BottleBackend` abstraction can grow the hook but other backends
|
||||
are deferred.
|
||||
- Manifest schema changes. `bottle.ssh` stays exactly as it is
|
||||
today; this PRD is internals-only.
|
||||
|
||||
## Proposed Design
|
||||
|
||||
### New services / components
|
||||
|
||||
Mirror the pipelock layout:
|
||||
|
||||
- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` +
|
||||
`SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free
|
||||
on docker; renders the forwarder config under `stage_dir`.
|
||||
- **`claude_bottle/backend/docker/ssh_gate.py`** (new):
|
||||
`DockerSSHGate` concrete subclass — `start` does `docker create`
|
||||
on the internal network, copies the config in, attaches the
|
||||
egress network, `docker start`. `stop` is idempotent `docker rm
|
||||
-f`. Container name: `claude-bottle-ssh-gate-<slug>`.
|
||||
|
||||
Forwarder image: `alpine/socat`, pinned by digest. One socat
|
||||
process per ssh entry, multiplexed inside the same gate container
|
||||
via an entrypoint script that backgrounds N socat invocations:
|
||||
|
||||
```
|
||||
socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>
|
||||
```
|
||||
|
||||
Listen ports are assigned deterministically per ssh entry (e.g.
|
||||
`30000 + index`). One container, N listeners, N upstreams.
|
||||
|
||||
### Existing code touched
|
||||
|
||||
- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the
|
||||
`ProxyCommand socat - PROXY:...` plumbing and the
|
||||
`pipelock_proxy_host_port` import. The rendered `~/.ssh/config`
|
||||
block per entry becomes:
|
||||
```
|
||||
Host <name>
|
||||
HostName <gate-container>
|
||||
User <user>
|
||||
Port <listen-port>
|
||||
IdentityAgent <public-socket>
|
||||
```
|
||||
`known_hosts` entries are keyed off `<name>` and the new
|
||||
`[<gate-container>]:<listen-port>` form so OpenSSH's strict
|
||||
host-key checking still matches.
|
||||
- **`claude_bottle/pipelock.py`**: delete
|
||||
`pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`,
|
||||
`pipelock_bottle_ssh_ip_cidrs`, and the calls into them from
|
||||
`pipelock_effective_allowlist` and `pipelock_build_config`. The
|
||||
effective allowlist becomes baked-defaults ∪ `bottle.egress.allowlist`.
|
||||
- **`claude_bottle/backend/docker/backend.py`**: instantiate
|
||||
`DockerSSHGate` alongside `DockerPipelockProxy`; thread its
|
||||
`prepare` / `start` / `stop` through `resolve_plan` / `launch`.
|
||||
- **`claude_bottle/backend/docker/launch.py`**: add gate start /
|
||||
stop to the `ExitStack` in the right order — gate must be up
|
||||
before `provision_ssh` runs so the agent can dial it on first
|
||||
boot.
|
||||
- **`claude_bottle/backend/docker/bottle_plan.py`**: new
|
||||
`SSHGatePlan` field on `DockerBottlePlan`; preflight rendering
|
||||
surfaces the gate sidecar (name, per-entry listen ports,
|
||||
upstream `Hostname:Port` targets).
|
||||
- **Tests**: update `tests/fixtures.py` callers; rewrite
|
||||
`tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape`
|
||||
to assert pipelock no longer reflects ssh entries; add unit
|
||||
tests for `SSHGate.prepare` + render shape; add an integration
|
||||
test in `tests/integration/` for the `git fetch` round-trip.
|
||||
|
||||
### Data model changes
|
||||
|
||||
None. `bottle.ssh` schema is unchanged; one new internal plan
|
||||
dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`.
|
||||
|
||||
### External dependencies
|
||||
|
||||
- `alpine/socat` image, pinned by digest (declared next to the
|
||||
`PIPELOCK_IMAGE` constant). No new Python packages.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Network topology: does the gate need its own per-agent egress
|
||||
bridge, or can it share pipelock's egress network? Sharing is
|
||||
simpler; per-gate isolates failure modes. Decide during
|
||||
implementation; default to "share pipelock's egress network"
|
||||
unless a concrete reason emerges.
|
||||
- Socat container restart policy: a single socat that crashes
|
||||
takes one upstream offline; do we want a wrapper that restarts
|
||||
individual listeners, or just rely on `docker restart`? Default
|
||||
to no-restart for v1 (matches pipelock).
|
||||
- Connection-level audit log: socat's `-v` mode logs every
|
||||
connect/close. Worth piping into the bottle's stderr stream, or
|
||||
is that noise? Default off, reconsider if debugging gets hard.
|
||||
- Docker DNS for the `<gate-container>` hostname inside the
|
||||
agent: works via Docker's embedded resolver on user-defined
|
||||
networks. Verify on the `--internal` network specifically before
|
||||
implementation.
|
||||
|
||||
## References
|
||||
|
||||
- PRD 0001: per-agent egress proxy via pipelock — the parent
|
||||
topology this PRD slots into.
|
||||
- PRD 0006: pipelock native TLS interception — the change that
|
||||
surfaced this regression by making pipelock incompatible with
|
||||
SSH-over-CONNECT.
|
||||
- `claude_bottle/backend/docker/provision/ssh.py` — current SSH
|
||||
provisioning that this PRD rewrites.
|
||||
- `claude_bottle/pipelock.py` — current pipelock config builder
|
||||
that gains the `bottle.ssh`-derived fields this PRD removes.
|
||||
Reference in New Issue
Block a user