# PRD 0008: Git gate - **Status:** Draft - **Author:** didericis - **Created:** 2026-05-12 ## Summary Per-bottle sidecar that fronts the agent's git remotes as a transparent mirror. Push is gated: gitleaks scans incoming refs via a `pre-receive` hook, and only clean refs get forwarded to the real upstream. Fetch is mirrored: every `upload-pack` first runs `git fetch origin --prune` against the upstream via the daemon's `--access-hook`, so an agent fetch returns whatever the upstream has *now* (fail-closed if the upstream is unreachable). Upstream credentials live in the gate, not the agent — so a misbehaving agent cannot push a secret-bearing commit past it and cannot acquire push access by inspecting the agent's own filesystem. ## Problem Today the agent holds its own SSH identity for each `bottle.ssh` entry and pushes straight at gitea/github with ssh-gate doing dumb L4 forwarding. There is no boundary between "the agent thinks this commit is fine" and "the secret hits an external remote." If a compromised or careless agent stages a `.env`, slips a token into a fixture, or commits the `BOT_BOTTLE_OAUTH_TOKEN` itself, `git push` ships it. Host-side pre-commit / pre-push hooks are the usual defense, but they live on the agent's side of the trust boundary: an agent with shell access can `git push --no-verify` past them, edit `.githooks/`, or `git config core.hooksPath /dev/null`. Anything the agent can disable is not a gate. ## Goals / Success Criteria Two integration tests, both with the gate as the only git path for a declared upstream: 1. **Push:** drop a synthetic high-entropy secret into a commit, run `git push` from inside the agent, observe a non-zero exit and a gitleaks finding in the response. Repeat with a clean commit and observe exit 0 + the commit landing on the real upstream. 2. **Fetch:** clone the upstream through the gate (`git clone` against the gate URL), observe the upstream's content. Push a new commit to the upstream out-of-band, refetch through the gate, observe the new commit. The gate must never serve stale data — every fetch refreshes from upstream first. ## Non-goals - Pre-commit scanning. The gate is a `pre-receive` checkpoint only; it does not run on `git commit`, does not block local commits, and does not edit the agent's working tree. - Git-protocol awareness beyond what `pre-receive` already gives you. No bespoke pack inspection; gitleaks runs against the incoming ref(s) in a bare repo, full stop. - Per-user authentication on the agent → gate hop. The hop sits inside a single bottle on an `--internal` Docker network; only the bottle's agent can reach the gate. No additional ACLs. - Subsuming ssh-gate or pipelock. Non-git SSH (if any) keeps flowing through ssh-gate; HTTPS through pipelock. The git-gate is git-only. - Multi-tenant gate. One gate is provisioned per bottle, not shared across bottles (same one-sidecar-per-agent posture as pipelock / ssh-gate). - Smolmachines / microVM colocation policy. Whether the future smolmachines backend packs gates into one VM or runs them as separate VMs is a backend decision, not a manifest or design decision in this PRD. See "Future work." ## Scope ### In scope - **Gate sidecar lifecycle.** New `GitGate` + `DockerGitGate`, mirroring `DockerSSHGate` and `DockerPipelockProxy` in shape and network-attachment story. - **Manifest field.** `bottle.git` — a list of git remotes the bottle is allowed to talk to, each with the credential the gate uses to push upstream. The agent gets no parallel `bottle.ssh` entry for those upstreams. Each entry may also carry an `ExtraHosts: { hostname: ip }` map, surfaced to the gate as `--add-host` so the gate can resolve upstreams whose public DNS doesn't point at the reachable IP (e.g. Tailscale-only hosts). The agent-side `insteadOf` rewrite keys off the original hostname, so the manifest's `Upstream` URL stays human-readable. - **Agent-side URL rewrite.** Provisioner emits `~/.gitconfig` with `[url ""] insteadOf = ` so every git operation against the declared upstream (push, fetch, clone, pull, ls-remote) transparently hits the gate. - **Pre-receive gitleaks hook.** Baked into the gate image. On a hit the hook exits non-zero and the push fails; on clean it shells out `git push origin :` using the gate-resident credential. - **Access-hook upstream refresh.** `git daemon --access-hook` runs `git fetch origin --prune` against the upstream before every `upload-pack` request, so a fetch through the gate is observably equivalent to a fetch against the real upstream. Failure to reach the upstream is fail-closed: the access hook exits non-zero and the agent's fetch fails. - **Plan rendering / dry-run.** `bottle_plan.py` and the y/N preflight surface the gate sidecar (name, listed upstreams, which credential it holds per upstream). ### Out of scope - Push policy beyond gitleaks. No commit-author allowlist, no branch-name policy, no signed-commit enforcement. gitleaks is the single rule for v1. - Fetch caching / stale-while-revalidate. Every `upload-pack` refresh is a synchronous round-trip to the upstream; there is no TTL cache, no background refresh. If the upstream is slow, the agent's fetch is slow. - Quarantine / replay. A rejected push is discarded; we do not stash it for the user to inspect. - Non-Docker backends. Implementation lands for Docker only; the `BottleBackend` abstraction gains the hook but other backends are deferred. - Bypass for trusted commits. No `[skip gitleaks]` trailer, no allowlist by commit hash. If the gate is bypassable it isn't a gate. ## Proposed Design ### New services / components Mirror the existing sidecar layout: - **`bot_bottle/git_gate.py`** (new): abstract `GitGate` + `GitGatePlan` dataclass. `prepare` is host-side / side-effect- free on docker; renders the per-upstream config and stages the push credentials under `stage_dir`. - **`bot_bottle/backend/docker/git_gate.py`** (new): `DockerGitGate` concrete subclass. `start` does `docker create` on the internal network, copies in the bare-repo skeleton, the hook script, and per-upstream credentials, then `docker start`. `stop` is idempotent `docker rm -f`. Container name: `bot-bottle-git-gate-`. Gate image: `git-daemon` + `openssh-client` over a `zricethezav/gitleaks` base (alpine + gitleaks), pinned by digest. For each declared upstream the gate hosts a bare repo at `/git/.git` with `remote.origin.url` set to the real upstream (via `git remote add --mirror=fetch`), `hooks/pre-receive` wired to gitleaks-then-`git push origin`, and the bare repo's config carrying per-upstream credential paths. Inside the bottle, the agent's `.gitconfig` rewrites the real upstream URL to the gate's `git://` URL via `insteadOf`. Every git operation against the declared upstream therefore hits the gate. For pushes, the pre-receive hook gitleaks-scans the incoming refs and, on clean, pushes each accepted ref to the real upstream using the credential the gate holds. For fetches (clone, pull, fetch, ls-remote), `git daemon`'s `--access-hook=` runs `git fetch origin --prune` against the real upstream before the upload-pack service serves the client. The bare repo therefore reflects the upstream's current state at the moment the agent's fetch begins; if the upstream is unreachable, the access hook exits non-zero and the agent's fetch fails — same observable behavior as if the agent were talking to the upstream directly. The agent never sees the upstream credential under either operation. ### Existing code touched - **`bot_bottle/manifest.py`**: parse and validate the new `bottle.git` block; reject `bottle.ssh` entries whose upstream is also claimed by a `bottle.git` upstream (one path per remote, no shadow route). - **`bot_bottle/backend/docker/provision/git.py`** (new) or an extension of the ssh provisioner: render the `insteadOf` config and any extra `~/.gitconfig` plumbing. - **`bot_bottle/backend/docker/backend.py`**: instantiate `DockerGitGate` alongside `DockerPipelockProxy` and `DockerSSHGate`; thread its `prepare` / `start` / `stop` through `resolve_plan` / `launch`. - **`bot_bottle/backend/docker/launch.py`**: add gate start / stop to the `ExitStack` so the gate is up before any provisioner that writes the agent's `~/.gitconfig`. - **`bot_bottle/backend/docker/bottle_plan.py`**: new `GitGatePlan` field on `DockerBottlePlan`; preflight rendering surfaces the gate sidecar (name, per-upstream local paths, upstream real URLs, which credential is in use). - **Tests**: unit tests for `GitGate.prepare` and render shape; manifest validator tests for the new field and the no-shadow-route rule; an integration test in `tests/integration/` for the push-with-secret (rejected) and push-without-secret (forwarded) cases. ### Data model changes `Bottle` grows an optional `git: list[GitEntry]` field. A `GitEntry` carries the upstream URL, the local name the gate exposes it as, and the credential the gate uses to push upstream (initial shape: `identity_file` + `known_host_key`, matching `bottle.ssh`). ### External dependencies - `zricethezav/gitleaks` base image, pinned by digest. The base ships gitleaks + git; the gate Dockerfile adds `git-daemon` and `openssh-client` on top. - No new Python packages. ## Future work - **Smolmachines colocation.** The eventual smolmachines backend may pack pipelock + ssh-gate + git-gate into a single microVM, or split git-gate off because it holds push creds and the others don't. That decision belongs to the backend; the shared `BottleBackend` interface keeps sidecars independent so either packing is possible without touching this PRD's design. ## Open questions - Protocol on the agent → gate hop: SSH (`sshd` + `git-shell` inside the gate) or HTTP smart protocol (`git-http-backend` behind a tiny webserver)? SSH matches the existing ssh-gate patterns and the user's existing `~/.ssh` muscle memory; HTTP is lighter on image size and avoids an `authorized_keys` story. Default: SSH unless image size becomes a problem. - Where gitleaks runs: pre-receive hook against a checkout of the incoming ref vs. a wrapper around `git-receive-pack` that inspects the pack file directly. Hook is canonical; defer the wrapper variant. - Rejection signalling: gitleaks failures surface as a normal pre-receive reject (the user sees gitleaks's report on stderr). Worth a "redacted" mode that hides the matched bytes from the rejection message? Default: show file + line, hide the matched bytes. - Credential reuse vs. duplication from `bottle.ssh`. If a user lists the same identity for ssh-gate (read) and git-gate (write), we can either reference by name or require two copies. Default: inline copies; revisit when it gets annoying. ## References - PRD 0001: per-agent egress proxy via pipelock — sidecar pattern this PRD reuses. - PRD 0007: SSH egress gate — the L4 SSH forwarder this PRD sits alongside; explicitly *not* the place to add git-protocol awareness. - `bot_bottle/ssh_gate.py` / `bot_bottle/pipelock.py` — existing sidecar abstractions to mirror. - gitleaks: