Files
bot-bottle/docs/prds/0008-git-gate.md
T
didericis-codex cdb1870b1c
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 43s
docs(agent): clarify claude oauth env
2026-05-28 18:20:09 -04:00

11 KiB

PRD 0008: Git gate

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-12

Summary

Per-bottle sidecar that fronts the agent's git remotes as a transparent mirror. Push is gated: gitleaks scans incoming refs via a pre-receive hook, and only clean refs get forwarded to the real upstream. Fetch is mirrored: every upload-pack first runs git fetch origin --prune against the upstream via the daemon's --access-hook, so an agent fetch returns whatever the upstream has now (fail-closed if the upstream is unreachable).

Upstream credentials live in the gate, not the agent — so a misbehaving agent cannot push a secret-bearing commit past it and cannot acquire push access by inspecting the agent's own filesystem.

Problem

Today the agent holds its own SSH identity for each bottle.ssh entry and pushes straight at gitea/github with ssh-gate doing dumb L4 forwarding. There is no boundary between "the agent thinks this commit is fine" and "the secret hits an external remote." If a compromised or careless agent stages a .env, slips a token into a fixture, or commits the BOT_BOTTLE_CLAUDE_OAUTH_TOKEN itself, git push ships it.

Host-side pre-commit / pre-push hooks are the usual defense, but they live on the agent's side of the trust boundary: an agent with shell access can git push --no-verify past them, edit .githooks/, or git config core.hooksPath /dev/null. Anything the agent can disable is not a gate.

Goals / Success Criteria

Two integration tests, both with the gate as the only git path for a declared upstream:

  1. Push: drop a synthetic high-entropy secret into a commit, run git push from inside the agent, observe a non-zero exit and a gitleaks finding in the response. Repeat with a clean commit and observe exit 0 + the commit landing on the real upstream.
  2. Fetch: clone the upstream through the gate (git clone against the gate URL), observe the upstream's content. Push a new commit to the upstream out-of-band, refetch through the gate, observe the new commit. The gate must never serve stale data — every fetch refreshes from upstream first.

Non-goals

  • Pre-commit scanning. The gate is a pre-receive checkpoint only; it does not run on git commit, does not block local commits, and does not edit the agent's working tree.
  • Git-protocol awareness beyond what pre-receive already gives you. No bespoke pack inspection; gitleaks runs against the incoming ref(s) in a bare repo, full stop.
  • Per-user authentication on the agent → gate hop. The hop sits inside a single bottle on an --internal Docker network; only the bottle's agent can reach the gate. No additional ACLs.
  • Subsuming ssh-gate or pipelock. Non-git SSH (if any) keeps flowing through ssh-gate; HTTPS through pipelock. The git-gate is git-only.
  • Multi-tenant gate. One gate is provisioned per bottle, not shared across bottles (same one-sidecar-per-agent posture as pipelock / ssh-gate).
  • Smolmachines / microVM colocation policy. Whether the future smolmachines backend packs gates into one VM or runs them as separate VMs is a backend decision, not a manifest or design decision in this PRD. See "Future work."

Scope

In scope

  • Gate sidecar lifecycle. New GitGate + DockerGitGate, mirroring DockerSSHGate and DockerPipelockProxy in shape and network-attachment story.
  • Manifest field. bottle.git — a list of git remotes the bottle is allowed to talk to, each with the credential the gate uses to push upstream. The agent gets no parallel bottle.ssh entry for those upstreams. Each entry may also carry an ExtraHosts: { hostname: ip } map, surfaced to the gate as --add-host so the gate can resolve upstreams whose public DNS doesn't point at the reachable IP (e.g. Tailscale-only hosts). The agent-side insteadOf rewrite keys off the original hostname, so the manifest's Upstream URL stays human-readable.
  • Agent-side URL rewrite. Provisioner emits ~/.gitconfig with [url "<gate-url>"] insteadOf = <real-url> so every git operation against the declared upstream (push, fetch, clone, pull, ls-remote) transparently hits the gate.
  • Pre-receive gitleaks hook. Baked into the gate image. On a hit the hook exits non-zero and the push fails; on clean it shells out git push origin <ref>:<ref> using the gate-resident credential.
  • Access-hook upstream refresh. git daemon --access-hook runs git fetch origin --prune against the upstream before every upload-pack request, so a fetch through the gate is observably equivalent to a fetch against the real upstream. Failure to reach the upstream is fail-closed: the access hook exits non-zero and the agent's fetch fails.
  • Plan rendering / dry-run. bottle_plan.py and the y/N preflight surface the gate sidecar (name, listed upstreams, which credential it holds per upstream).

Out of scope

  • Push policy beyond gitleaks. No commit-author allowlist, no branch-name policy, no signed-commit enforcement. gitleaks is the single rule for v1.
  • Fetch caching / stale-while-revalidate. Every upload-pack refresh is a synchronous round-trip to the upstream; there is no TTL cache, no background refresh. If the upstream is slow, the agent's fetch is slow.
  • Quarantine / replay. A rejected push is discarded; we do not stash it for the user to inspect.
  • Non-Docker backends. Implementation lands for Docker only; the BottleBackend abstraction gains the hook but other backends are deferred.
  • Bypass for trusted commits. No [skip gitleaks] trailer, no allowlist by commit hash. If the gate is bypassable it isn't a gate.

Proposed Design

New services / components

Mirror the existing sidecar layout:

  • bot_bottle/git_gate.py (new): abstract GitGate + GitGatePlan dataclass. prepare is host-side / side-effect- free on docker; renders the per-upstream config and stages the push credentials under stage_dir.
  • bot_bottle/backend/docker/git_gate.py (new): DockerGitGate concrete subclass. start does docker create on the internal network, copies in the bare-repo skeleton, the hook script, and per-upstream credentials, then docker start. stop is idempotent docker rm -f. Container name: bot-bottle-git-gate-<slug>.

Gate image: git-daemon + openssh-client over a zricethezav/gitleaks base (alpine + gitleaks), pinned by digest. For each declared upstream the gate hosts a bare repo at /git/<name>.git with remote.origin.url set to the real upstream (via git remote add --mirror=fetch), hooks/pre-receive wired to gitleaks-then-git push origin, and the bare repo's config carrying per-upstream credential paths.

Inside the bottle, the agent's .gitconfig rewrites the real upstream URL to the gate's git:// URL via insteadOf. Every git operation against the declared upstream therefore hits the gate.

For pushes, the pre-receive hook gitleaks-scans the incoming refs and, on clean, pushes each accepted ref to the real upstream using the credential the gate holds.

For fetches (clone, pull, fetch, ls-remote), git daemon's --access-hook=<path> runs git fetch origin --prune against the real upstream before the upload-pack service serves the client. The bare repo therefore reflects the upstream's current state at the moment the agent's fetch begins; if the upstream is unreachable, the access hook exits non-zero and the agent's fetch fails — same observable behavior as if the agent were talking to the upstream directly.

The agent never sees the upstream credential under either operation.

Existing code touched

  • bot_bottle/manifest.py: parse and validate the new bottle.git block; reject bottle.ssh entries whose upstream is also claimed by a bottle.git upstream (one path per remote, no shadow route).
  • bot_bottle/backend/docker/provision/git.py (new) or an extension of the ssh provisioner: render the insteadOf config and any extra ~/.gitconfig plumbing.
  • bot_bottle/backend/docker/backend.py: instantiate DockerGitGate alongside DockerPipelockProxy and DockerSSHGate; thread its prepare / start / stop through resolve_plan / launch.
  • bot_bottle/backend/docker/launch.py: add gate start / stop to the ExitStack so the gate is up before any provisioner that writes the agent's ~/.gitconfig.
  • bot_bottle/backend/docker/bottle_plan.py: new GitGatePlan field on DockerBottlePlan; preflight rendering surfaces the gate sidecar (name, per-upstream local paths, upstream real URLs, which credential is in use).
  • Tests: unit tests for GitGate.prepare and render shape; manifest validator tests for the new field and the no-shadow-route rule; an integration test in tests/integration/ for the push-with-secret (rejected) and push-without-secret (forwarded) cases.

Data model changes

Bottle grows an optional git: list[GitEntry] field. A GitEntry carries the upstream URL, the local name the gate exposes it as, and the credential the gate uses to push upstream (initial shape: identity_file + known_host_key, matching bottle.ssh).

External dependencies

  • zricethezav/gitleaks base image, pinned by digest. The base ships gitleaks + git; the gate Dockerfile adds git-daemon and openssh-client on top.
  • No new Python packages.

Future work

  • Smolmachines colocation. The eventual smolmachines backend may pack pipelock + ssh-gate + git-gate into a single microVM, or split git-gate off because it holds push creds and the others don't. That decision belongs to the backend; the shared BottleBackend interface keeps sidecars independent so either packing is possible without touching this PRD's design.

Open questions

  • Protocol on the agent → gate hop: SSH (sshd + git-shell inside the gate) or HTTP smart protocol (git-http-backend behind a tiny webserver)? SSH matches the existing ssh-gate patterns and the user's existing ~/.ssh muscle memory; HTTP is lighter on image size and avoids an authorized_keys story. Default: SSH unless image size becomes a problem.
  • Where gitleaks runs: pre-receive hook against a checkout of the incoming ref vs. a wrapper around git-receive-pack that inspects the pack file directly. Hook is canonical; defer the wrapper variant.
  • Rejection signalling: gitleaks failures surface as a normal pre-receive reject (the user sees gitleaks's report on stderr). Worth a "redacted" mode that hides the matched bytes from the rejection message? Default: show file + line, hide the matched bytes.
  • Credential reuse vs. duplication from bottle.ssh. If a user lists the same identity for ssh-gate (read) and git-gate (write), we can either reference by name or require two copies. Default: inline copies; revisit when it gets annoying.

References

  • PRD 0001: per-agent egress proxy via pipelock — sidecar pattern this PRD reuses.
  • PRD 0007: SSH egress gate — the L4 SSH forwarder this PRD sits alongside; explicitly not the place to add git-protocol awareness.
  • bot_bottle/ssh_gate.py / bot_bottle/pipelock.py — existing sidecar abstractions to mirror.
  • gitleaks: https://github.com/gitleaks/gitleaks