Files
bot-bottle/docs/prds/0008-git-gate.md
T
2026-05-28 17:56:14 -04:00

255 lines
11 KiB
Markdown

# PRD 0008: Git gate
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-12
## Summary
Per-bottle sidecar that fronts the agent's git remotes as a
transparent mirror. Push is gated: gitleaks scans incoming refs
via a `pre-receive` hook, and only clean refs get forwarded to
the real upstream. Fetch is mirrored: every `upload-pack` first
runs `git fetch origin --prune` against the upstream via the
daemon's `--access-hook`, so an agent fetch returns whatever the
upstream has *now* (fail-closed if the upstream is unreachable).
Upstream credentials live in the gate, not the agent — so a
misbehaving agent cannot push a secret-bearing commit past it
and cannot acquire push access by inspecting the agent's own
filesystem.
## Problem
Today the agent holds its own SSH identity for each `bottle.ssh`
entry and pushes straight at gitea/github with ssh-gate doing dumb
L4 forwarding. There is no boundary between "the agent thinks this
commit is fine" and "the secret hits an external remote." If a
compromised or careless agent stages a `.env`, slips a token into
a fixture, or commits the `BOT_BOTTLE_OAUTH_TOKEN` itself, `git
push` ships it.
Host-side pre-commit / pre-push hooks are the usual defense, but
they live on the agent's side of the trust boundary: an agent with
shell access can `git push --no-verify` past them, edit
`.githooks/`, or `git config core.hooksPath /dev/null`. Anything
the agent can disable is not a gate.
## Goals / Success Criteria
Two integration tests, both with the gate as the only git path
for a declared upstream:
1. **Push:** drop a synthetic high-entropy secret into a commit,
run `git push` from inside the agent, observe a non-zero exit
and a gitleaks finding in the response. Repeat with a clean
commit and observe exit 0 + the commit landing on the real
upstream.
2. **Fetch:** clone the upstream through the gate (`git clone`
against the gate URL), observe the upstream's content. Push
a new commit to the upstream out-of-band, refetch through the
gate, observe the new commit. The gate must never serve stale
data — every fetch refreshes from upstream first.
## Non-goals
- Pre-commit scanning. The gate is a `pre-receive` checkpoint
only; it does not run on `git commit`, does not block local
commits, and does not edit the agent's working tree.
- Git-protocol awareness beyond what `pre-receive` already gives
you. No bespoke pack inspection; gitleaks runs against the
incoming ref(s) in a bare repo, full stop.
- Per-user authentication on the agent → gate hop. The hop sits
inside a single bottle on an `--internal` Docker network; only
the bottle's agent can reach the gate. No additional ACLs.
- Subsuming ssh-gate or pipelock. Non-git SSH (if any) keeps
flowing through ssh-gate; HTTPS through pipelock. The git-gate
is git-only.
- Multi-tenant gate. One gate is provisioned per bottle, not
shared across bottles (same one-sidecar-per-agent posture as
pipelock / ssh-gate).
- Smolmachines / microVM colocation policy. Whether the future
smolmachines backend packs gates into one VM or runs them as
separate VMs is a backend decision, not a manifest or design
decision in this PRD. See "Future work."
## Scope
### In scope
- **Gate sidecar lifecycle.** New `GitGate` + `DockerGitGate`,
mirroring `DockerSSHGate` and `DockerPipelockProxy` in shape and
network-attachment story.
- **Manifest field.** `bottle.git` — a list of git remotes the
bottle is allowed to talk to, each with the credential the gate
uses to push upstream. The agent gets no parallel `bottle.ssh`
entry for those upstreams. Each entry may also carry an
`ExtraHosts: { hostname: ip }` map, surfaced to the gate as
`--add-host` so the gate can resolve upstreams whose public DNS
doesn't point at the reachable IP (e.g. Tailscale-only hosts).
The agent-side `insteadOf` rewrite keys off the original hostname,
so the manifest's `Upstream` URL stays human-readable.
- **Agent-side URL rewrite.** Provisioner emits `~/.gitconfig`
with `[url "<gate-url>"] insteadOf = <real-url>` so every git
operation against the declared upstream (push, fetch, clone,
pull, ls-remote) transparently hits the gate.
- **Pre-receive gitleaks hook.** Baked into the gate image. On a
hit the hook exits non-zero and the push fails; on clean it
shells out `git push origin <ref>:<ref>` using the gate-resident
credential.
- **Access-hook upstream refresh.** `git daemon --access-hook` runs
`git fetch origin --prune` against the upstream before every
`upload-pack` request, so a fetch through the gate is observably
equivalent to a fetch against the real upstream. Failure to reach
the upstream is fail-closed: the access hook exits non-zero and
the agent's fetch fails.
- **Plan rendering / dry-run.** `bottle_plan.py` and the y/N
preflight surface the gate sidecar (name, listed upstreams,
which credential it holds per upstream).
### Out of scope
- Push policy beyond gitleaks. No commit-author allowlist, no
branch-name policy, no signed-commit enforcement. gitleaks is
the single rule for v1.
- Fetch caching / stale-while-revalidate. Every `upload-pack`
refresh is a synchronous round-trip to the upstream; there is
no TTL cache, no background refresh. If the upstream is slow,
the agent's fetch is slow.
- Quarantine / replay. A rejected push is discarded; we do not
stash it for the user to inspect.
- Non-Docker backends. Implementation lands for Docker only; the
`BottleBackend` abstraction gains the hook but other backends
are deferred.
- Bypass for trusted commits. No `[skip gitleaks]` trailer, no
allowlist by commit hash. If the gate is bypassable it isn't a
gate.
## Proposed Design
### New services / components
Mirror the existing sidecar layout:
- **`bot_bottle/git_gate.py`** (new): abstract `GitGate` +
`GitGatePlan` dataclass. `prepare` is host-side / side-effect-
free on docker; renders the per-upstream config and stages the
push credentials under `stage_dir`.
- **`bot_bottle/backend/docker/git_gate.py`** (new):
`DockerGitGate` concrete subclass. `start` does `docker create`
on the internal network, copies in the bare-repo skeleton, the
hook script, and per-upstream credentials, then `docker start`.
`stop` is idempotent `docker rm -f`. Container name:
`bot-bottle-git-gate-<slug>`.
Gate image: `git-daemon` + `openssh-client` over a
`zricethezav/gitleaks` base (alpine + gitleaks), pinned by digest.
For each declared upstream the gate hosts a bare repo at
`/git/<name>.git` with `remote.origin.url` set to the real
upstream (via `git remote add --mirror=fetch`), `hooks/pre-receive`
wired to gitleaks-then-`git push origin`, and the bare repo's
config carrying per-upstream credential paths.
Inside the bottle, the agent's `.gitconfig` rewrites the real
upstream URL to the gate's `git://` URL via `insteadOf`. Every
git operation against the declared upstream therefore hits the
gate.
For pushes, the pre-receive hook gitleaks-scans the incoming
refs and, on clean, pushes each accepted ref to the real
upstream using the credential the gate holds.
For fetches (clone, pull, fetch, ls-remote), `git daemon`'s
`--access-hook=<path>` runs `git fetch origin --prune` against
the real upstream before the upload-pack service serves the
client. The bare repo therefore reflects the upstream's current
state at the moment the agent's fetch begins; if the upstream
is unreachable, the access hook exits non-zero and the agent's
fetch fails — same observable behavior as if the agent were
talking to the upstream directly.
The agent never sees the upstream credential under either
operation.
### Existing code touched
- **`bot_bottle/manifest.py`**: parse and validate the new
`bottle.git` block; reject `bottle.ssh` entries whose upstream
is also claimed by a `bottle.git` upstream (one path per
remote, no shadow route).
- **`bot_bottle/backend/docker/provision/git.py`** (new) or an
extension of the ssh provisioner: render the `insteadOf` config
and any extra `~/.gitconfig` plumbing.
- **`bot_bottle/backend/docker/backend.py`**: instantiate
`DockerGitGate` alongside `DockerPipelockProxy` and
`DockerSSHGate`; thread its `prepare` / `start` / `stop`
through `resolve_plan` / `launch`.
- **`bot_bottle/backend/docker/launch.py`**: add gate start /
stop to the `ExitStack` so the gate is up before any
provisioner that writes the agent's `~/.gitconfig`.
- **`bot_bottle/backend/docker/bottle_plan.py`**: new
`GitGatePlan` field on `DockerBottlePlan`; preflight rendering
surfaces the gate sidecar (name, per-upstream local paths,
upstream real URLs, which credential is in use).
- **Tests**: unit tests for `GitGate.prepare` and render shape;
manifest validator tests for the new field and the
no-shadow-route rule; an integration test in
`tests/integration/` for the push-with-secret (rejected) and
push-without-secret (forwarded) cases.
### Data model changes
`Bottle` grows an optional `git: list[GitEntry]` field. A
`GitEntry` carries the upstream URL, the local name the gate
exposes it as, and the credential the gate uses to push upstream
(initial shape: `identity_file` + `known_host_key`, matching
`bottle.ssh`).
### External dependencies
- `zricethezav/gitleaks` base image, pinned by digest. The base
ships gitleaks + git; the gate Dockerfile adds `git-daemon` and
`openssh-client` on top.
- No new Python packages.
## Future work
- **Smolmachines colocation.** The eventual smolmachines backend
may pack pipelock + ssh-gate + git-gate into a single microVM,
or split git-gate off because it holds push creds and the
others don't. That decision belongs to the backend; the shared
`BottleBackend` interface keeps sidecars independent so either
packing is possible without touching this PRD's design.
## Open questions
- Protocol on the agent → gate hop: SSH (`sshd` + `git-shell`
inside the gate) or HTTP smart protocol (`git-http-backend`
behind a tiny webserver)? SSH matches the existing ssh-gate
patterns and the user's existing `~/.ssh` muscle memory; HTTP
is lighter on image size and avoids an `authorized_keys`
story. Default: SSH unless image size becomes a problem.
- Where gitleaks runs: pre-receive hook against a checkout of the
incoming ref vs. a wrapper around `git-receive-pack` that
inspects the pack file directly. Hook is canonical; defer the
wrapper variant.
- Rejection signalling: gitleaks failures surface as a normal
pre-receive reject (the user sees gitleaks's report on
stderr). Worth a "redacted" mode that hides the matched bytes
from the rejection message? Default: show file + line, hide
the matched bytes.
- Credential reuse vs. duplication from `bottle.ssh`. If a user
lists the same identity for ssh-gate (read) and git-gate
(write), we can either reference by name or require two
copies. Default: inline copies; revisit when it gets annoying.
## References
- PRD 0001: per-agent egress proxy via pipelock — sidecar
pattern this PRD reuses.
- PRD 0007: SSH egress gate — the L4 SSH forwarder this PRD
sits alongside; explicitly *not* the place to add
git-protocol awareness.
- `bot_bottle/ssh_gate.py` / `bot_bottle/pipelock.py`
existing sidecar abstractions to mirror.
- gitleaks: <https://github.com/gitleaks/gitleaks>