Files
bot-bottle/docs/prds/0007-ssh-egress-gate.md
T
didericis 02a0fe679d
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 15s
docs(prd): 0007 SSH egress gate
PRD 0006 enabled pipelock's native TLS interception, which broke
git fetch over SSH from inside the agent: pipelock's SNI gate
rejects the SSH banner that follows CONNECT. Document the
architectural fix — a dedicated per-agent TCP-forwarder sidecar
built from bottle.ssh entries — so pipelock can stay maximally
strict on the HTTPS path with no SSH carve-outs.
2026-05-12 15:41:26 -04:00

7.4 KiB
Raw Blame History

PRD 0007: SSH egress gate

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-12

Summary

Per-agent TCP-forwarder sidecar built from bottle.ssh entries; SSH stops going through pipelock; pipelock keeps full TLS interception with no SSH carve-outs.

Problem

git fetch over SSH from inside an implementer-agent bottle is broken on main. The error surfaced after PRD 0006 enabled pipelock's native tls_interception:

kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
fatal: Could not read from remote repository.

The agent's ssh client tunnels through pipelock via a ProxyCommand socat - PROXY:pipelock:%h:%p and pipelock now bumps that CONNECT tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an ssh entry hits this — including the implementer agent used by the free-agent workflow, which can't pull or push.

Goals / Success Criteria

Integration test: spin up a bottle with an SSH entry, exec git fetch against a real-ish SSH host from inside the agent, observe exit 0. This is the same signal that's broken today; flipping it back to green is the test.

Non-goals

  • Pluggable forwarder backend. One TCP forwarder image is baked in; abstracting over haproxy / nginx-stream / etc. is deferred.
  • SSH-protocol awareness. The gate stays at L4. No SSH-version sniffing, no kex inspection, no per-key gating beyond what ssh itself enforces inside the agent.
  • Replacing pipelock for anything else. HTTPS / HTTP traffic continues to flow through pipelock unchanged. This PRD adds a sidecar; it doesn't displace one.
  • Connection rate limits or quotas. No per-host or per-agent rate limiting on the gate; future PRD if it ever matters.

Scope

In scope

  • Gate sidecar lifecycle. DockerSSHGate class with prepare / start / stop, mirroring DockerPipelockProxy's shape and network attachment story.
  • ssh provisioner rewrite. provision/ssh.py drops the socat ProxyCommand; ~/.ssh/config points each Host at the gate container and the per-host listen port.
  • Pipelock carve-out removal. Strip pipelock_bottle_ssh_trusted_domains, pipelock_bottle_ssh_ip_cidrs, and the related code paths in pipelock_build_config + tests. After this PRD, pipelock has no knowledge of bottle.ssh.
  • Plan rendering / dry-run. bottle_plan.py and the y/N preflight surface the new gate sidecar (name, listen ports, upstream targets).

Out of scope

  • SSH key generation / rotation. Bottle keys are still user-supplied via IdentityFile; the gate doesn't manage key material.
  • Per-host audit logging. The gate is dumb TCP forwarding; no in-band visibility into SSH session content. (Connection-level logs from socat are a nice-to-have, not a goal.)
  • Non-Docker backends. Implementation lands for Docker only; the BottleBackend abstraction can grow the hook but other backends are deferred.
  • Manifest schema changes. bottle.ssh stays exactly as it is today; this PRD is internals-only.

Proposed Design

New services / components

Mirror the pipelock layout:

  • claude_bottle/ssh_gate.py (new): abstract SSHGate + SSHGatePlan dataclass. prepare is host-side / side-effect-free on docker; renders the forwarder config under stage_dir.
  • claude_bottle/backend/docker/ssh_gate.py (new): DockerSSHGate concrete subclass — start does docker create on the internal network, copies the config in, attaches the egress network, docker start. stop is idempotent docker rm -f. Container name: claude-bottle-ssh-gate-<slug>.

Forwarder image: alpine/socat, pinned by digest. One socat process per ssh entry, multiplexed inside the same gate container via an entrypoint script that backgrounds N socat invocations:

socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>

Listen ports are assigned deterministically per ssh entry (e.g. 30000 + index). One container, N listeners, N upstreams.

Existing code touched

  • claude_bottle/backend/docker/provision/ssh.py: drop the ProxyCommand socat - PROXY:... plumbing and the pipelock_proxy_host_port import. The rendered ~/.ssh/config block per entry becomes:
    Host <name>
      HostName <gate-container>
      User <user>
      Port <listen-port>
      IdentityAgent <public-socket>
    
    known_hosts entries are keyed off <name> and the new [<gate-container>]:<listen-port> form so OpenSSH's strict host-key checking still matches.
  • claude_bottle/pipelock.py: delete pipelock_bottle_ssh_hostnames, pipelock_bottle_ssh_trusted_domains, pipelock_bottle_ssh_ip_cidrs, and the calls into them from pipelock_effective_allowlist and pipelock_build_config. The effective allowlist becomes baked-defaults bottle.egress.allowlist.
  • claude_bottle/backend/docker/backend.py: instantiate DockerSSHGate alongside DockerPipelockProxy; thread its prepare / start / stop through resolve_plan / launch.
  • claude_bottle/backend/docker/launch.py: add gate start / stop to the ExitStack in the right order — gate must be up before provision_ssh runs so the agent can dial it on first boot.
  • claude_bottle/backend/docker/bottle_plan.py: new SSHGatePlan field on DockerBottlePlan; preflight rendering surfaces the gate sidecar (name, per-entry listen ports, upstream Hostname:Port targets).
  • Tests: update tests/fixtures.py callers; rewrite tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape to assert pipelock no longer reflects ssh entries; add unit tests for SSHGate.prepare + render shape; add an integration test in tests/integration/ for the git fetch round-trip.

Data model changes

None. bottle.ssh schema is unchanged; one new internal plan dataclass (SSHGatePlan) under claude_bottle/ssh_gate.py.

External dependencies

  • alpine/socat image, pinned by digest (declared next to the PIPELOCK_IMAGE constant). No new Python packages.

Open questions

  • Network topology: does the gate need its own per-agent egress bridge, or can it share pipelock's egress network? Sharing is simpler; per-gate isolates failure modes. Decide during implementation; default to "share pipelock's egress network" unless a concrete reason emerges.
  • Socat container restart policy: a single socat that crashes takes one upstream offline; do we want a wrapper that restarts individual listeners, or just rely on docker restart? Default to no-restart for v1 (matches pipelock).
  • Connection-level audit log: socat's -v mode logs every connect/close. Worth piping into the bottle's stderr stream, or is that noise? Default off, reconsider if debugging gets hard.
  • Docker DNS for the <gate-container> hostname inside the agent: works via Docker's embedded resolver on user-defined networks. Verify on the --internal network specifically before implementation.

References

  • PRD 0001: per-agent egress proxy via pipelock — the parent topology this PRD slots into.
  • PRD 0006: pipelock native TLS interception — the change that surfaced this regression by making pipelock incompatible with SSH-over-CONNECT.
  • claude_bottle/backend/docker/provision/ssh.py — current SSH provisioning that this PRD rewrites.
  • claude_bottle/pipelock.py — current pipelock config builder that gains the bottle.ssh-derived fields this PRD removes.