30d92bef48
- README architecture diagram drops the socat/ssh image box and the agent's ~/.ssh/config; the prose-bullets section drops the ssh image; the manifest example swaps `ssh:` for `git:` so someone copy-pasting it picks up the new shape. - claude-bottle.example.json: `default` bottle's `"ssh": []` is gone (now just an empty bottle); the gitea-dev example already uses `git:` since the ExtraHosts work. - PRD 0007 carries a "Superseded by PRD 0009" header at the top with a one-paragraph block explaining why; the file stays so the rationale of the prior design is still in-tree. - git_gate.py: drop the now-stale shadow-route mention from a docstring (the validator went away in the manifest layer).
209 lines
8.6 KiB
Markdown
209 lines
8.6 KiB
Markdown
# PRD 0007: SSH egress gate
|
||
|
||
- **Status:** Superseded by PRD 0009 (2026-05-13)
|
||
- **Author:** didericis
|
||
- **Created:** 2026-05-12
|
||
|
||
> **Superseded.** The ssh-gate sidecar and `bottle.ssh` manifest field
|
||
> described below were removed in PRD 0009. Every upstream this PRD
|
||
> targeted has since been folded into PRD 0008's git-gate, which
|
||
> covers the same use case with credential isolation and gitleaks
|
||
> scanning instead of bare L4 forwarding. Kept in-tree for the
|
||
> history of intent.
|
||
|
||
## Summary
|
||
|
||
Per-agent TCP-forwarder sidecar built from `bottle.ssh` entries; SSH stops
|
||
going through pipelock; pipelock keeps full TLS interception with no
|
||
SSH carve-outs.
|
||
|
||
## Problem
|
||
|
||
`git fetch` over SSH from inside an implementer-agent bottle is broken
|
||
on `main`. The error surfaced after PRD 0006 enabled pipelock's
|
||
native `tls_interception`:
|
||
|
||
```
|
||
kex_exchange_identification: Connection closed by remote host
|
||
Connection closed by UNKNOWN port 65535
|
||
fatal: Could not read from remote repository.
|
||
```
|
||
|
||
The agent's ssh client tunnels through pipelock via a `ProxyCommand
|
||
socat - PROXY:pipelock:%h:%p` and pipelock now bumps that CONNECT
|
||
tunnel. SSH sends its banner instead of a TLS ClientHello; pipelock's
|
||
SNI gate rejects it; the tunnel closes mid-kex. Every bottle with an
|
||
`ssh` entry hits this — including the implementer agent used by the
|
||
free-agent workflow, which can't pull or push.
|
||
|
||
## Goals / Success Criteria
|
||
|
||
Integration test: spin up a bottle with an SSH entry, exec `git
|
||
fetch` against a real-ish SSH host from inside the agent, observe
|
||
exit 0. This is the same signal that's broken today; flipping it
|
||
back to green is the test.
|
||
|
||
## Non-goals
|
||
|
||
- Pluggable forwarder backend. One TCP forwarder image is baked in;
|
||
abstracting over haproxy / nginx-stream / etc. is deferred.
|
||
- SSH-protocol awareness. The gate stays at L4. No SSH-version
|
||
sniffing, no kex inspection, no per-key gating beyond what ssh
|
||
itself enforces inside the agent.
|
||
- Replacing pipelock for anything else. HTTPS / HTTP traffic
|
||
continues to flow through pipelock unchanged. This PRD adds a
|
||
sidecar; it doesn't displace one.
|
||
- Connection rate limits or quotas. No per-host or per-agent rate
|
||
limiting on the gate; future PRD if it ever matters.
|
||
|
||
## Scope
|
||
|
||
### In scope
|
||
|
||
- **Gate sidecar lifecycle.** `DockerSSHGate` class with
|
||
`prepare` / `start` / `stop`, mirroring `DockerPipelockProxy`'s
|
||
shape and network attachment story.
|
||
- **ssh provisioner rewrite.** `provision/ssh.py` drops the socat
|
||
`ProxyCommand`; `~/.ssh/config` points each `Host` at the gate
|
||
container and the per-host listen port.
|
||
- **Pipelock carve-out removal.** Strip
|
||
`pipelock_bottle_ssh_trusted_domains`,
|
||
`pipelock_bottle_ssh_ip_cidrs`, and the related code paths in
|
||
`pipelock_build_config` + tests. After this PRD, pipelock has no
|
||
knowledge of `bottle.ssh`.
|
||
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
|
||
preflight surface the new gate sidecar (name, listen ports,
|
||
upstream targets).
|
||
|
||
### Out of scope
|
||
|
||
- SSH key generation / rotation. Bottle keys are still
|
||
user-supplied via `IdentityFile`; the gate doesn't manage key
|
||
material.
|
||
- Per-host audit logging. The gate is dumb TCP forwarding; no
|
||
in-band visibility into SSH session content. (Connection-level
|
||
logs from socat are a nice-to-have, not a goal.)
|
||
- Non-Docker backends. Implementation lands for Docker only; the
|
||
`BottleBackend` abstraction can grow the hook but other backends
|
||
are deferred.
|
||
- Manifest schema changes. `bottle.ssh` stays exactly as it is
|
||
today; this PRD is internals-only.
|
||
|
||
## Proposed Design
|
||
|
||
### New services / components
|
||
|
||
Mirror the pipelock layout:
|
||
|
||
- **`claude_bottle/ssh_gate.py`** (new): abstract `SSHGate` +
|
||
`SSHGatePlan` dataclass. `prepare` is host-side / side-effect-free
|
||
on docker; renders the forwarder config under `stage_dir`.
|
||
- **`claude_bottle/backend/docker/ssh_gate.py`** (new):
|
||
`DockerSSHGate` concrete subclass — `start` does `docker create`
|
||
on the internal network, copies the config in, attaches the
|
||
egress network, `docker start`. `stop` is idempotent `docker rm
|
||
-f`. Container name: `claude-bottle-ssh-gate-<slug>`.
|
||
|
||
Forwarder image: `alpine/socat`, pinned by digest. Must be
|
||
self-sufficient at boot (no apk/apt pulls on first run) because
|
||
the gate's agent-facing leg sits on the `--internal` network and
|
||
has no internet at startup. One socat process per ssh entry,
|
||
multiplexed inside the same gate container via an entrypoint
|
||
script that backgrounds N socat invocations:
|
||
|
||
```
|
||
socat TCP-LISTEN:<port_i>,reuseaddr,fork TCP:<Hostname_i>:<Port_i>
|
||
```
|
||
|
||
Listen ports mirror the upstream port (entry `Port`, default 22).
|
||
That choice is load-bearing: OpenSSH treats a URL-supplied port
|
||
(e.g. `ssh://git@host:30009/repo.git`) as overriding the config's
|
||
`Port` directive, so the gate has to be reachable on the same port
|
||
the URL names — otherwise git fetch hits "connection refused" on
|
||
the URL's port even though the config block points elsewhere. Two
|
||
ssh entries sharing an upstream port are a config error and
|
||
rejected at prepare time. One container, N listeners, N upstreams.
|
||
|
||
### Existing code touched
|
||
|
||
- **`claude_bottle/backend/docker/provision/ssh.py`**: drop the
|
||
`ProxyCommand socat - PROXY:...` plumbing and the
|
||
`pipelock_proxy_host_port` import. The rendered `~/.ssh/config`
|
||
block per entry becomes:
|
||
```
|
||
Host <name>
|
||
HostName <gate-container>
|
||
User <user>
|
||
Port <listen-port>
|
||
IdentityAgent <public-socket>
|
||
```
|
||
`known_hosts` entries are keyed off `<name>` and the new
|
||
`[<gate-container>]:<listen-port>` form so OpenSSH's strict
|
||
host-key checking still matches.
|
||
- **`claude_bottle/pipelock.py`**: delete
|
||
`pipelock_bottle_ssh_hostnames`, `pipelock_bottle_ssh_trusted_domains`,
|
||
`pipelock_bottle_ssh_ip_cidrs`, and the calls into them from
|
||
`pipelock_effective_allowlist` and `pipelock_build_config`. The
|
||
effective allowlist becomes baked-defaults ∪ `bottle.egress.allowlist`.
|
||
- **`claude_bottle/backend/docker/backend.py`**: instantiate
|
||
`DockerSSHGate` alongside `DockerPipelockProxy`; thread its
|
||
`prepare` / `start` / `stop` through `resolve_plan` / `launch`.
|
||
- **`claude_bottle/backend/docker/launch.py`**: add gate start /
|
||
stop to the `ExitStack` in the right order — gate must be up
|
||
before `provision_ssh` runs so the agent can dial it on first
|
||
boot.
|
||
- **`claude_bottle/backend/docker/bottle_plan.py`**: new
|
||
`SSHGatePlan` field on `DockerBottlePlan`; preflight rendering
|
||
surfaces the gate sidecar (name, per-entry listen ports,
|
||
upstream `Hostname:Port` targets).
|
||
- **Tests**: update `tests/fixtures.py` callers; rewrite
|
||
`tests/unit/test_pipelock_yaml.py::TestBuildConfig::test_ssh_shape`
|
||
to assert pipelock no longer reflects ssh entries; add unit
|
||
tests for `SSHGate.prepare` + render shape; add an integration
|
||
test in `tests/integration/` for the `git fetch` round-trip.
|
||
|
||
### Data model changes
|
||
|
||
None. `bottle.ssh` schema is unchanged; one new internal plan
|
||
dataclass (`SSHGatePlan`) under `claude_bottle/ssh_gate.py`.
|
||
|
||
### External dependencies
|
||
|
||
- `alpine/socat` image, pinned by digest (declared next to the
|
||
`PIPELOCK_IMAGE` constant). No new Python packages.
|
||
|
||
## Open questions
|
||
|
||
- Network topology: does the gate need its own per-agent egress
|
||
bridge, or can it share pipelock's egress network? Sharing is
|
||
simpler; per-gate isolates failure modes. Decide during
|
||
implementation; default to "share pipelock's egress network"
|
||
unless a concrete reason emerges.
|
||
- Socat container restart policy: a single socat that crashes
|
||
takes one upstream offline; do we want a wrapper that restarts
|
||
individual listeners, or just rely on `docker restart`? Default
|
||
to no-restart for v1 (matches pipelock).
|
||
- Connection-level audit log: socat's `-v` mode logs every
|
||
connect/close. Worth piping into the bottle's stderr stream, or
|
||
is that noise? Default off, reconsider if debugging gets hard.
|
||
- ~~Docker DNS for the `<gate-container>` hostname inside the
|
||
agent: works via Docker's embedded resolver on user-defined
|
||
networks. Verify on the `--internal` network specifically before
|
||
implementation.~~ **Resolved.** Spike confirmed: a container on
|
||
a `--internal` user-defined network resolves another
|
||
container's name via the embedded resolver at 127.0.0.11 and
|
||
reaches it over TCP, while egress to the public internet
|
||
remains blocked. The PRD's design assumption holds.
|
||
|
||
## References
|
||
|
||
- PRD 0001: per-agent egress proxy via pipelock — the parent
|
||
topology this PRD slots into.
|
||
- PRD 0006: pipelock native TLS interception — the change that
|
||
surfaced this regression by making pipelock incompatible with
|
||
SSH-over-CONNECT.
|
||
- `claude_bottle/backend/docker/provision/ssh.py` — current SSH
|
||
provisioning that this PRD rewrites.
|
||
- `claude_bottle/pipelock.py` — current pipelock config builder
|
||
that gains the `bottle.ssh`-derived fields this PRD removes.
|