Files
bot-bottle/docs/prds/0007-ssh-egress-gate.md
T
didericis 30d92bef48
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 21s
docs: drop ssh from README/example, supersede PRD 0007 (PRD 0009)
- README architecture diagram drops the socat/ssh image box and
  the agent's ~/.ssh/config; the prose-bullets section drops the
  ssh image; the manifest example swaps `ssh:` for `git:` so
  someone copy-pasting it picks up the new shape.
- claude-bottle.example.json: `default` bottle's `"ssh": []` is
  gone (now just an empty bottle); the gitea-dev example already
  uses `git:` since the ExtraHosts work.
- PRD 0007 carries a "Superseded by PRD 0009" header at the top
  with a one-paragraph block explaining why; the file stays so
  the rationale of the prior design is still in-tree.
- git_gate.py: drop the now-stale shadow-route mention from a
  docstring (the validator went away in the manifest layer).
2026-05-12 23:57:50 -04:00

209 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD 0007: SSH egress gate
- **Status:** Superseded by PRD 0009 (2026-05-13)
- **Author:** didericis
- **Created:** 2026-05-12
> **Superseded.** The ssh-gate sidecar and `bottle.ssh` manifest field
> described below were removed in PRD 0009. Every upstream this PRD
> targeted has since been folded into PRD 0008's git-gate, which
> covers the same use case with credential isolation and gitleaks
> scanning instead of bare L4 forwarding. Kept in-tree for the
> history of intent.
## Summary
Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops
going through pipelock; pipelock keeps full TLS interception with no
SSH carve-outs.
## Problem
`git fetch` over SSH from inside an implementer-agent bottle is broken
on `main`. The error surfaced after PRD 0006 enabled pipelock's
native `tls_interception`:
```
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
fatal: Could not read from remote repository.
```
The agent's ssh client tunnels through pipelock via a `ProxyCommand
socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT
tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's
SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an
`ssh` entry hits this — including the implementer agent used by the
free-agent workflow, which can't pull or push.
## Goals / Success Criteria
Integration test: spin up a bottle with an SSH entry, exec `git
fetch` against a real-ish SSH host from inside the agent, observe
exit 0. This is the same signal that's broken today; flipping it
back to green is the test.
## Non-goals
- Pluggable forwarder backend. One TCP forwarder image is baked in;
abstracting over haproxy / nginx-stream / etc. is deferred.
- SSH-protocol awareness. The gate stays at L4. No SSH-version
sniffing, no kex inspection, no per-key gating beyond what ssh
itself enforces inside the agent.
- Replacing pipelock for anything else. HTTPS / HTTP traffic
continues to flow through pipelock unchanged. This PRD adds a
sidecar; it doesn't displace one.
- Connection rate limits or quotas. No per-host or per-agent rate
limiting on the gate; future PRD if it ever matters.
## Scope
### In scope
- **Gate sidecar lifecycle.** `DockerSSHGate` class with
`prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s
shape and network attachment story.
- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat
`ProxyCommand`; `~/.ssh/config` points each `Host` at the gate
container and the per-host listen port.
- **Pipelock carve-out removal.** Strip
`pipelock_bottle_ssh_trusted_domains`,
`pipelock_bottle_ssh_ip_cidrs`, and the related code paths in
`pipelock_build_config` + tests. After this PRD, pipelock has no
knowledge of `bottle.ssh`.
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
preflight surface the new gate sidecar (name, listen ports,
upstream targets).
### Out of scope
- SSH key generation / rotation. Bottle keys are still
user-supplied via `IdentityFile`; the gate doesn't manage key
material.
- Per-host audit logging. The gate is dumb TCP forwarding; no
in-band visibility into SSH session content. (Connection-level
logs from socat are a nice-to-have, not a goal.)
- Non-Docker backends. Implementation lands for Docker only; the
`BottleBackend` abstraction can grow the hook but other backends
are deferred.
- Manifest schema changes. `bottle.ssh` stays exactly as it is
today; this PRD is internals-only.
## Proposed Design
### New services / components
Mirror the pipelock layout:
- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` +
`SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free
on docker; renders the forwarder config under `stage_dir`.
- **`claude_bottle/backend/docker/ssh_gate.py`** (new):
`DockerSSHGate` concrete subclass — `start` does `docker create`
on the internal network, copies the config in, attaches the
egress network, `docker start`. `stop` is idempotent `docker rm
-f`. Container name: `claude-bottle-ssh-gate-<slug>`.
Forwarder image: `alpine/socat`, pinned by digest. Must be
self-sufficient at boot (no apk/apt pulls on first run) because
the gate's agent-facing leg sits on the `--internal` network and
has no internet at startup. One socat process per ssh entry,
multiplexed inside the same gate container via an entrypoint
script that backgrounds N socat invocations:
```
socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>
```
Listen ports mirror the upstream port (entry `Port`, default 22).
That choice is load-bearing: OpenSSH treats a URL-supplied port
(e.g. `ssh://git@host:30009/repo.git`) as overriding the config's
`Port` directive, so the gate has to be reachable on the same port
the URL names — otherwise git fetch hits "connection refused" on
the URL's port even though the config block points elsewhere. Two
ssh entries sharing an upstream port are a config error and
rejected at prepare time. One container, N listeners, N upstreams.
### Existing code touched
- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the
`ProxyCommand socat - PROXY:...` plumbing and the
`pipelock_proxy_host_port` import. The rendered `~/.ssh/config`
block per entry becomes:
```
Host <name>
HostName <gate-container>
User <user>
Port <listen-port>
IdentityAgent <public-socket>
```
`known_hosts` entries are keyed off `<name>` and the new
`[<gate-container>]:<listen-port>` form so OpenSSH's strict
host-key checking still matches.
- **`claude_bottle/pipelock.py`**: delete
`pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`,
`pipelock_bottle_ssh_ip_cidrs`, and the calls into them from
`pipelock_effective_allowlist` and `pipelock_build_config`. The
effective allowlist becomes baked-defaults `bottle.egress.allowlist`.
- **`claude_bottle/backend/docker/backend.py`**: instantiate
`DockerSSHGate` alongside `DockerPipelockProxy`; thread its
`prepare` / `start` / `stop` through `resolve_plan` / `launch`.
- **`claude_bottle/backend/docker/launch.py`**: add gate start /
stop to the `ExitStack` in the right order — gate must be up
before `provision_ssh` runs so the agent can dial it on first
boot.
- **`claude_bottle/backend/docker/bottle_plan.py`**: new
`SSHGatePlan` field on `DockerBottlePlan`; preflight rendering
surfaces the gate sidecar (name, per-entry listen ports,
upstream `Hostname:Port` targets).
- **Tests**: update `tests/fixtures.py` callers; rewrite
`tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape`
to assert pipelock no longer reflects ssh entries; add unit
tests for `SSHGate.prepare` + render shape; add an integration
test in `tests/integration/` for the `git fetch` round-trip.
### Data model changes
None. `bottle.ssh` schema is unchanged; one new internal plan
dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`.
### External dependencies
- `alpine/socat` image, pinned by digest (declared next to the
`PIPELOCK_IMAGE` constant). No new Python packages.
## Open questions
- Network topology: does the gate need its own per-agent egress
bridge, or can it share pipelock's egress network? Sharing is
simpler; per-gate isolates failure modes. Decide during
implementation; default to "share pipelock's egress network"
unless a concrete reason emerges.
- Socat container restart policy: a single socat that crashes
takes one upstream offline; do we want a wrapper that restarts
individual listeners, or just rely on `docker restart`? Default
to no-restart for v1 (matches pipelock).
- Connection-level audit log: socat's `-v` mode logs every
connect/close. Worth piping into the bottle's stderr stream, or
is that noise? Default off, reconsider if debugging gets hard.
- ~~Docker DNS for the `<gate-container>` hostname inside the
agent: works via Docker's embedded resolver on user-defined
networks. Verify on the `--internal` network specifically before
implementation.~~ **Resolved.** Spike confirmed: a container on
a `--internal` user-defined network resolves another
container's name via the embedded resolver at 127.0.0.11 and
reaches it over TCP, while egress to the public internet
remains blocked. The PRD's design assumption holds.
## References
- PRD 0001: per-agent egress proxy via pipelock — the parent
topology this PRD slots into.
- PRD 0006: pipelock native TLS interception — the change that
surfaced this regression by making pipelock incompatible with
SSH-over-CONNECT.
- `claude_bottle/backend/docker/provision/ssh.py` — current SSH
provisioning that this PRD rewrites.
- `claude_bottle/pipelock.py` — current pipelock config builder
that gains the `bottle.ssh`-derived fields this PRD removes.