Files
bot-bottle/docs/prds/0007-ssh-egress-gate.md
T
didericis a3d77cd015
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 12s
fix(ssh-gate): listen on the upstream port so URL-supplied ports work
Bug: git fetch failed with "connect to host
claude-bottle-ssh-gate-implementer port 30009: Connection refused".
OpenSSH treats a URL-supplied port (the user's remote was
ssh://git@gitea.dideric.is:30009/...) as overriding the
~/.ssh/config Port directive, so even though the config wrote
Port 30000 the agent dialed :30009 — where nothing was listening
because the gate had been assigned BASE_LISTEN_PORT + index.

Fix: the gate's listen port now equals the upstream port. Same
script, same socat, just port = entry.Port. Two entries on the
same upstream port are rejected at prepare time (the gate is one
container with a flat port space).

Re-smoked: probe nc github.com via the gate at :22, banner came
back as expected.

PRD 0007 updated to record the design refinement.
2026-05-12 16:19:07 -04:00

8.3 KiB
Raw Blame History

PRD 0007: SSH egress gate

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-12

Summary

Per-agent TCP-forwarder sidecar built from bottle.ssh entries; SSH stops going through pipelock; pipelock keeps full TLS interception with no SSH carve-outs.

Problem

git fetch over SSH from inside an implementer-agent bottle is broken on main. The error surfaced after PRD 0006 enabled pipelock's native tls_interception:

kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
fatal: Could not read from remote repository.

The agent's ssh client tunnels through pipelock via a ProxyCommand socat - PROXY:pipelock:%h:%p and pipelock now bumps that CONNECT tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an ssh entry hits this — including the implementer agent used by the free-agent workflow, which can't pull or push.

Goals / Success Criteria

Integration test: spin up a bottle with an SSH entry, exec git fetch against a real-ish SSH host from inside the agent, observe exit 0. This is the same signal that's broken today; flipping it back to green is the test.

Non-goals

  • Pluggable forwarder backend. One TCP forwarder image is baked in; abstracting over haproxy / nginx-stream / etc. is deferred.
  • SSH-protocol awareness. The gate stays at L4. No SSH-version sniffing, no kex inspection, no per-key gating beyond what ssh itself enforces inside the agent.
  • Replacing pipelock for anything else. HTTPS / HTTP traffic continues to flow through pipelock unchanged. This PRD adds a sidecar; it doesn't displace one.
  • Connection rate limits or quotas. No per-host or per-agent rate limiting on the gate; future PRD if it ever matters.

Scope

In scope

  • Gate sidecar lifecycle. DockerSSHGate class with prepare / start / stop, mirroring DockerPipelockProxy's shape and network attachment story.
  • ssh provisioner rewrite. provision/ssh.py drops the socat ProxyCommand; ~/.ssh/config points each Host at the gate container and the per-host listen port.
  • Pipelock carve-out removal. Strip pipelock_bottle_ssh_trusted_domains, pipelock_bottle_ssh_ip_cidrs, and the related code paths in pipelock_build_config + tests. After this PRD, pipelock has no knowledge of bottle.ssh.
  • Plan rendering / dry-run. bottle_plan.py and the y/N preflight surface the new gate sidecar (name, listen ports, upstream targets).

Out of scope

  • SSH key generation / rotation. Bottle keys are still user-supplied via IdentityFile; the gate doesn't manage key material.
  • Per-host audit logging. The gate is dumb TCP forwarding; no in-band visibility into SSH session content. (Connection-level logs from socat are a nice-to-have, not a goal.)
  • Non-Docker backends. Implementation lands for Docker only; the BottleBackend abstraction can grow the hook but other backends are deferred.
  • Manifest schema changes. bottle.ssh stays exactly as it is today; this PRD is internals-only.

Proposed Design

New services / components

Mirror the pipelock layout:

  • claude_bottle/ssh_gate.py (new): abstract SSHGate + SSHGatePlan dataclass. prepare is host-side / side-effect-free on docker; renders the forwarder config under stage_dir.
  • claude_bottle/backend/docker/ssh_gate.py (new): DockerSSHGate concrete subclass — start does docker create on the internal network, copies the config in, attaches the egress network, docker start. stop is idempotent docker rm -f. Container name: claude-bottle-ssh-gate-<slug>.

Forwarder image: alpine/socat, pinned by digest. Must be self-sufficient at boot (no apk/apt pulls on first run) because the gate's agent-facing leg sits on the --internal network and has no internet at startup. One socat process per ssh entry, multiplexed inside the same gate container via an entrypoint script that backgrounds N socat invocations:

socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>

Listen ports mirror the upstream port (entry Port, default 22). That choice is load-bearing: OpenSSH treats a URL-supplied port (e.g. ssh://git@host:30009/repo.git) as overriding the config's Port directive, so the gate has to be reachable on the same port the URL names — otherwise git fetch hits "connection refused" on the URL's port even though the config block points elsewhere. Two ssh entries sharing an upstream port are a config error and rejected at prepare time. One container, N listeners, N upstreams.

Existing code touched

  • claude_bottle/backend/docker/provision/ssh.py: drop the ProxyCommand socat - PROXY:... plumbing and the pipelock_proxy_host_port import. The rendered ~/.ssh/config block per entry becomes:
    Host <name>
      HostName <gate-container>
      User <user>
      Port <listen-port>
      IdentityAgent <public-socket>
    
    known_hosts entries are keyed off <name> and the new [<gate-container>]:<listen-port> form so OpenSSH's strict host-key checking still matches.
  • claude_bottle/pipelock.py: delete pipelock_bottle_ssh_hostnames, pipelock_bottle_ssh_trusted_domains, pipelock_bottle_ssh_ip_cidrs, and the calls into them from pipelock_effective_allowlist and pipelock_build_config. The effective allowlist becomes baked-defaults bottle.egress.allowlist.
  • claude_bottle/backend/docker/backend.py: instantiate DockerSSHGate alongside DockerPipelockProxy; thread its prepare / start / stop through resolve_plan / launch.
  • claude_bottle/backend/docker/launch.py: add gate start / stop to the ExitStack in the right order — gate must be up before provision_ssh runs so the agent can dial it on first boot.
  • claude_bottle/backend/docker/bottle_plan.py: new SSHGatePlan field on DockerBottlePlan; preflight rendering surfaces the gate sidecar (name, per-entry listen ports, upstream Hostname:Port targets).
  • Tests: update tests/fixtures.py callers; rewrite tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape to assert pipelock no longer reflects ssh entries; add unit tests for SSHGate.prepare + render shape; add an integration test in tests/integration/ for the git fetch round-trip.

Data model changes

None. bottle.ssh schema is unchanged; one new internal plan dataclass (SSHGatePlan) under claude_bottle/ssh_gate.py.

External dependencies

  • alpine/socat image, pinned by digest (declared next to the PIPELOCK_IMAGE constant). No new Python packages.

Open questions

  • Network topology: does the gate need its own per-agent egress bridge, or can it share pipelock's egress network? Sharing is simpler; per-gate isolates failure modes. Decide during implementation; default to "share pipelock's egress network" unless a concrete reason emerges.
  • Socat container restart policy: a single socat that crashes takes one upstream offline; do we want a wrapper that restarts individual listeners, or just rely on docker restart? Default to no-restart for v1 (matches pipelock).
  • Connection-level audit log: socat's -v mode logs every connect/close. Worth piping into the bottle's stderr stream, or is that noise? Default off, reconsider if debugging gets hard.
  • Docker DNS for the <gate-container> hostname inside the agent: works via Docker's embedded resolver on user-defined networks. Verify on the --internal network specifically before implementation. Resolved. Spike confirmed: a container on a --internal user-defined network resolves another container's name via the embedded resolver at 127.0.0.11 and reaches it over TCP, while egress to the public internet remains blocked. The PRD's design assumption holds.

References

  • PRD 0001: per-agent egress proxy via pipelock — the parent topology this PRD slots into.
  • PRD 0006: pipelock native TLS interception — the change that surfaced this regression by making pipelock incompatible with SSH-over-CONNECT.
  • claude_bottle/backend/docker/provision/ssh.py — current SSH provisioning that this PRD rewrites.
  • claude_bottle/pipelock.py — current pipelock config builder that gains the bottle.ssh-derived fields this PRD removes.