PRD 0007: SSH egress gate #10
@@ -0,0 +1,188 @@
|
|||||||
|
# PRD 0007: SSH egress gate
|
||||||
|
|
||||||
|
- **Status:** Draft
|
||||||
|
- **Author:** didericis
|
||||||
|
- **Created:** 2026-05-12
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops
|
||||||
|
going through pipelock; pipelock keeps full TLS interception with no
|
||||||
|
SSH carve-outs.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
`git fetch` over SSH from inside an implementer-agent bottle is broken
|
||||||
|
on `main`. The error surfaced after PRD 0006 enabled pipelock's
|
||||||
|
native `tls_interception`:
|
||||||
|
|
||||||
|
```
|
||||||
|
kex_exchange_identification: Connection closed by remote host
|
||||||
|
Connection closed by UNKNOWN port 65535
|
||||||
|
fatal: Could not read from remote repository.
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent's ssh client tunnels through pipelock via a `ProxyCommand
|
||||||
|
socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT
|
||||||
|
tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's
|
||||||
|
SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an
|
||||||
|
`ssh` entry hits this — including the implementer agent used by the
|
||||||
|
free-agent workflow, which can't pull or push.
|
||||||
|
|
||||||
|
## Goals / Success Criteria
|
||||||
|
|
||||||
|
Integration test: spin up a bottle with an SSH entry, exec `git
|
||||||
|
fetch` against a real-ish SSH host from inside the agent, observe
|
||||||
|
exit 0. This is the same signal that's broken today; flipping it
|
||||||
|
back to green is the test.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Pluggable forwarder backend. One TCP forwarder image is baked in;
|
||||||
|
abstracting over haproxy / nginx-stream / etc. is deferred.
|
||||||
|
- SSH-protocol awareness. The gate stays at L4. No SSH-version
|
||||||
|
sniffing, no kex inspection, no per-key gating beyond what ssh
|
||||||
|
itself enforces inside the agent.
|
||||||
|
- Replacing pipelock for anything else. HTTPS / HTTP traffic
|
||||||
|
continues to flow through pipelock unchanged. This PRD adds a
|
||||||
|
sidecar; it doesn't displace one.
|
||||||
|
- Connection rate limits or quotas. No per-host or per-agent rate
|
||||||
|
limiting on the gate; future PRD if it ever matters.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### In scope
|
||||||
|
|
||||||
|
- **Gate sidecar lifecycle.** `DockerSSHGate` class with
|
||||||
|
`prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s
|
||||||
|
shape and network attachment story.
|
||||||
|
- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat
|
||||||
|
`ProxyCommand`; `~/.ssh/config` points each `Host` at the gate
|
||||||
|
container and the per-host listen port.
|
||||||
|
- **Pipelock carve-out removal.** Strip
|
||||||
|
`pipelock_bottle_ssh_trusted_domains`,
|
||||||
|
`pipelock_bottle_ssh_ip_cidrs`, and the related code paths in
|
||||||
|
`pipelock_build_config` + tests. After this PRD, pipelock has no
|
||||||
|
knowledge of `bottle.ssh`.
|
||||||
|
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
|
||||||
|
preflight surface the new gate sidecar (name, listen ports,
|
||||||
|
upstream targets).
|
||||||
|
|
||||||
|
### Out of scope
|
||||||
|
|
||||||
|
- SSH key generation / rotation. Bottle keys are still
|
||||||
|
user-supplied via `IdentityFile`; the gate doesn't manage key
|
||||||
|
material.
|
||||||
|
- Per-host audit logging. The gate is dumb TCP forwarding; no
|
||||||
|
in-band visibility into SSH session content. (Connection-level
|
||||||
|
logs from socat are a nice-to-have, not a goal.)
|
||||||
|
- Non-Docker backends. Implementation lands for Docker only; the
|
||||||
|
`BottleBackend` abstraction can grow the hook but other backends
|
||||||
|
are deferred.
|
||||||
|
- Manifest schema changes. `bottle.ssh` stays exactly as it is
|
||||||
|
today; this PRD is internals-only.
|
||||||
|
|
||||||
|
## Proposed Design
|
||||||
|
|
||||||
|
### New services / components
|
||||||
|
|
||||||
|
Mirror the pipelock layout:
|
||||||
|
|
||||||
|
- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` +
|
||||||
|
`SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free
|
||||||
|
on docker; renders the forwarder config under `stage_dir`.
|
||||||
|
- **`claude_bottle/backend/docker/ssh_gate.py`** (new):
|
||||||
|
`DockerSSHGate` concrete subclass — `start` does `docker create`
|
||||||
|
on the internal network, copies the config in, attaches the
|
||||||
|
egress network, `docker start`. `stop` is idempotent `docker rm
|
||||||
|
-f`. Container name: `claude-bottle-ssh-gate-<slug>`.
|
||||||
|
|
||||||
|
Forwarder image: `alpine/socat`, pinned by digest. One socat
|
||||||
|
process per ssh entry, multiplexed inside the same gate container
|
||||||
|
via an entrypoint script that backgrounds N socat invocations:
|
||||||
|
|
||||||
|
```
|
||||||
|
socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>
|
||||||
|
```
|
||||||
|
|
||||||
|
Listen ports are assigned deterministically per ssh entry (e.g.
|
||||||
|
`30000 + index`). One container, N listeners, N upstreams.
|
||||||
|
|
||||||
|
### Existing code touched
|
||||||
|
|
||||||
|
- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the
|
||||||
|
`ProxyCommand socat - PROXY:...` plumbing and the
|
||||||
|
`pipelock_proxy_host_port` import. The rendered `~/.ssh/config`
|
||||||
|
block per entry becomes:
|
||||||
|
```
|
||||||
|
Host <name>
|
||||||
|
HostName <gate-container>
|
||||||
|
User <user>
|
||||||
|
Port <listen-port>
|
||||||
|
IdentityAgent <public-socket>
|
||||||
|
```
|
||||||
|
`known_hosts` entries are keyed off `<name>` and the new
|
||||||
|
`[<gate-container>]:<listen-port>` form so OpenSSH's strict
|
||||||
|
host-key checking still matches.
|
||||||
|
- **`claude_bottle/pipelock.py`**: delete
|
||||||
|
`pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`,
|
||||||
|
`pipelock_bottle_ssh_ip_cidrs`, and the calls into them from
|
||||||
|
`pipelock_effective_allowlist` and `pipelock_build_config`. The
|
||||||
|
effective allowlist becomes baked-defaults ∪ `bottle.egress.allowlist`.
|
||||||
|
- **`claude_bottle/backend/docker/backend.py`**: instantiate
|
||||||
|
`DockerSSHGate` alongside `DockerPipelockProxy`; thread its
|
||||||
|
`prepare` / `start` / `stop` through `resolve_plan` / `launch`.
|
||||||
|
- **`claude_bottle/backend/docker/launch.py`**: add gate start /
|
||||||
|
stop to the `ExitStack` in the right order — gate must be up
|
||||||
|
before `provision_ssh` runs so the agent can dial it on first
|
||||||
|
boot.
|
||||||
|
- **`claude_bottle/backend/docker/bottle_plan.py`**: new
|
||||||
|
`SSHGatePlan` field on `DockerBottlePlan`; preflight rendering
|
||||||
|
surfaces the gate sidecar (name, per-entry listen ports,
|
||||||
|
upstream `Hostname:Port` targets).
|
||||||
|
- **Tests**: update `tests/fixtures.py` callers; rewrite
|
||||||
|
`tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape`
|
||||||
|
to assert pipelock no longer reflects ssh entries; add unit
|
||||||
|
tests for `SSHGate.prepare` + render shape; add an integration
|
||||||
|
test in `tests/integration/` for the `git fetch` round-trip.
|
||||||
|
|
||||||
|
### Data model changes
|
||||||
|
|
||||||
|
None. `bottle.ssh` schema is unchanged; one new internal plan
|
||||||
|
dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`.
|
||||||
|
|
||||||
|
### External dependencies
|
||||||
|
|
||||||
|
- `alpine/socat` image, pinned by digest (declared next to the
|
||||||
|
`PIPELOCK_IMAGE` constant). No new Python packages.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
- Network topology: does the gate need its own per-agent egress
|
||||||
|
bridge, or can it share pipelock's egress network? Sharing is
|
||||||
|
simpler; per-gate isolates failure modes. Decide during
|
||||||
|
implementation; default to "share pipelock's egress network"
|
||||||
|
unless a concrete reason emerges.
|
||||||
|
- Socat container restart policy: a single socat that crashes
|
||||||
|
takes one upstream offline; do we want a wrapper that restarts
|
||||||
|
individual listeners, or just rely on `docker restart`? Default
|
||||||
|
to no-restart for v1 (matches pipelock).
|
||||||
|
- Connection-level audit log: socat's `-v` mode logs every
|
||||||
|
connect/close. Worth piping into the bottle's stderr stream, or
|
||||||
|
is that noise? Default off, reconsider if debugging gets hard.
|
||||||
|
- Docker DNS for the `<gate-container>` hostname inside the
|
||||||
|
agent: works via Docker's embedded resolver on user-defined
|
||||||
|
networks. Verify on the `--internal` network specifically before
|
||||||
|
implementation.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- PRD 0001: per-agent egress proxy via pipelock — the parent
|
||||||
|
topology this PRD slots into.
|
||||||
|
- PRD 0006: pipelock native TLS interception — the change that
|
||||||
|
surfaced this regression by making pipelock incompatible with
|
||||||
|
SSH-over-CONNECT.
|
||||||
|
- `claude_bottle/backend/docker/provision/ssh.py` — current SSH
|
||||||
|
provisioning that this PRD rewrites.
|
||||||
|
- `claude_bottle/pipelock.py` — current pipelock config builder
|
||||||
|
that gains the `bottle.ssh`-derived fields this PRD removes.
|
||||||
Reference in New Issue
Block a user