PRD 0028: git-gate new-branch push scan scope #107

Merged
didericis-codex merged 3 commits from prd-0028-git-gate-new-branch-scan into main 2026-05-29 02:29:30 -04:00
3 changed files with 205 additions and 3 deletions
+10 -3
View File
@@ -280,7 +280,14 @@ while IFS=' ' read -r old new ref; do
[ -z "$ref" ] && continue
[ "$new" = "$zero" ] && continue
if [ "$old" = "$zero" ]; then
log_opts="$new"
# New ref: scan only the commits this push introduces — those
# reachable from $new but not from any ref the gate already has.
# Everything already on the gate arrived via upstream mirror-fetch
# or a previously gitleaks-scanned push, so it's already-upstream
# or already-scanned; re-scanning it (the old `$new` full-ancestry
# range) only resurfaces historical findings and blocks every new
# branch. See PRD 0028 / issue #106.
log_opts="$new --not --all"
else
log_opts="$old..$new"
fi
@@ -300,7 +307,7 @@ if [ ! -f "$hostsfile" ]; then
echo "git-gate: add KnownHostKey to the bottle.git entry and restart the bottle" >&2
exit 1
fi
ssh_cmd="ssh -i $keyfile -o UserKnownHostsFile=$hostsfile -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes"
ssh_cmd="ssh -i $keyfile -o UserKnownHostsFile=$hostsfile -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o BatchMode=yes -o ConnectTimeout=10"
while IFS=' ' read -r old new ref; do
[ -z "$ref" ] && continue
@@ -355,7 +362,7 @@ if [ -z "$keyfile" ] || [ ! -f "$hostsfile" ]; then
echo "git-gate: missing credentials for $repo_dir; refusing fetch" >&2
exit 1
fi
ssh_cmd="ssh -i $keyfile -o UserKnownHostsFile=$hostsfile -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes"
ssh_cmd="ssh -i $keyfile -o UserKnownHostsFile=$hostsfile -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o BatchMode=yes -o ConnectTimeout=10"
echo "git-gate: refreshing $repo_dir from upstream" >&2
if ! GIT_SSH_COMMAND="$ssh_cmd" git -C "$repo_dir" fetch origin --prune >&2; then
@@ -0,0 +1,170 @@
# PRD 0028: git-gate new-branch push scan scope
- **Status:** Active
- **Author:** didericis-claude
- **Created:** 2026-05-29
- **Issue:** #106
## Summary
git-gate's pre-receive hook scans the **entire ancestry** of a *new*
branch for secrets, so any pre-existing finding in repo history blocks
every new-branch push. Scope the scan to the commits a push actually
introduces (`$new --not --all`) so a push is gated on what *it* adds,
not on what's already on the gate/upstream. Also harden the forward
`ssh` against hangs. Net: new branches can be pushed through the gate
again, with no loss of leak-detection coverage.
## Problem
In `git_gate_render_hook()` (`bot_bottle/git_gate.py`) the pre-receive
hook chooses the gitleaks revision range per ref:
```sh
if [ "$old" = "$zero" ]; then
log_opts="$new" # new branch: `git log <new>` = the FULL ancestry
else
log_opts="$old..$new" # existing branch: just the pushed delta
fi
gitleaks git --log-opts="$log_opts" ... # exit 1 if ANY finding in range
```
For a **new** ref there is no `old` to diff against, so the hook passes
`$new`, which `git log` expands to every commit reachable from the new
tip. This repo's history contains 11 deliberately secret-shaped strings
(demo manifests, `docs/demo.tape`, and the pipelock/sandbox-escape
integration tests that exist *to exercise* the DLP). gitleaks reports
`438 commits scanned … leaks found: 11`, the hook `exit 1`s, and the
push is rejected. Confirmed live against issue #106's bottle: the
branch never lands in the bare repo and is never forwarded.
Consequence: **no new branch can ever be pushed through git-gate** as
long as a single historical finding exists — which is permanent.
Two adjacent problems surfaced while diagnosing #106:
- The rejection is **invisible to the client** — over the `git://` +
smolmachines forward it presented as a ~75s silent hang, not a
`remote: git-gate: gitleaks rejected …` message.
- The forward `ssh` lacks `BatchMode`/`ConnectTimeout`, so an
unreachable upstream or a prompt would hang the hook indefinitely.
(Not the cause of #106 — the forward itself works — but a latent
hang risk.)
## Goals / Success Criteria
- A new-branch push is scanned **only for the commits it introduces**
(reachable from `$new`, not from any ref the gate already has).
- A new branch that adds no new findings **pushes successfully**, even
though historical fixtures still trip a full-history scan.
- A new branch that *does* introduce a finding is **still rejected**.
- No reduction in leak coverage for the commits a push actually brings
to the upstream (see "Security analysis").
- Forward `ssh` fails fast (`BatchMode=yes` + `ConnectTimeout`) instead
of hanging on a prompt/unreachable upstream.
- Existing git-gate unit + integration tests stay green; new tests lock
the scoped-scan behaviour.
## Non-goals
- **Scrubbing the historical fixture findings.** They're intentional
test/demo inputs; scoping the scan resolves the practical problem
without rewriting history.
- **Relaxing the existing-branch path.** `$old..$new` already scans the
delta; this PRD only fixes the new-ref branch (and optionally unifies
on `--not --all`, see Open questions).
- **The client-visibility fix is investigation-gated.** Surfacing the
rejection over the `git://` + smolmachines path may need separate work
(sideband relay); tracked here but may land as a follow-up rather than
block the scan-scope fix.
## Design
### Scoped scan
Replace the new-ref range with one that excludes everything the gate
already knows:
```sh
if [ "$new" = "$zero" ]; then
continue # deletion: nothing to scan (unchanged)
elif [ "$old" = "$zero" ]; then
log_opts="$new --not --all" # new branch: only commits new to the gate
else
log_opts="$old..$new" # existing branch: the pushed delta
fi
```
`git log $new --not --all` = commits reachable from the pushed tip but
**not** reachable from any ref already in the gate's bare repo.
### Security analysis (why excluding "already-on-the-gate" is safe)
Commits enter the gate's bare repo by exactly two paths:
1. **mirror-fetch from the upstream** — the bare repo is
`remote add --mirror=fetch origin`, and the access-hook fetches the
upstream before every upload-pack; and
2. **a push through the gate** — which is gitleaks-scanned before it is
forwarded.
Therefore every commit reachable from a gate ref is *already on the
upstream* or *was already scanned when pushed*. A commit excluded by
`--not --all` cannot be a new secret arriving at the upstream via this
push:
- if it's already upstream, re-scanning changes nothing — the content is
already there and blocking this branch wouldn't remove it; and
- if it arrived via an earlier push, it was already scanned.
The only commits that can carry a *new* secret upstream are the ones the
push introduces — exactly the set `$new --not --all` scans. An agent
cannot pre-seed a secret commit as "already known" to dodge the scan: it
can't write refs into the bare repo except by pushing (which scans), and
the mirror refs come only from the trusted upstream.
**Invariant this relies on:** the bare repo's refs are populated *only*
by upstream mirror-fetch and gitleaks-gated pushes. That holds in the
current design (nothing writes refs out-of-band); revisit if that
changes.
### Forward ssh hardening
Add `-o BatchMode=yes -o ConnectTimeout=<n>` to the hook's `ssh_cmd` so
a prompt or unreachable upstream fails fast with a clear error instead
of hanging the receive-pack.
## Implementation chunks
1. **PRD (this commit).**
2. **Hook scan scope + ssh hardening.** Edit `git_gate_render_hook()`:
the new-ref range → `$new --not --all`; add `BatchMode`/`ConnectTimeout`
to `ssh_cmd`. Unit tests in `test_git_gate.py` asserting the rendered
hook uses the scoped range for new refs and the hardened ssh flags.
3. **Integration coverage.** A new-branch push carrying no new finding
succeeds through a gate whose history contains a fixture finding; a
new-branch push that introduces a finding is still rejected.
4. **(Optional / follow-up) client visibility.** Make a gitleaks/forward
rejection reach the client as a `remote:` error over the git:// +
smolmachines path.
## Testing strategy
- **Unit (must):** rendered-hook assertions — new-ref uses
`$new --not --all`, existing-ref still `$old..$new`, deletion still
skipped; ssh_cmd carries `BatchMode=yes` + a `ConnectTimeout`.
- **Integration (should):** against a real gate seeded with a
fixture-bearing history, a clean new branch forwards to the upstream;
a new branch with a planted secret is rejected. Skips cleanly on hosts
that can't run the bundle (same shape as the existing git-gate
integration test).
## Open questions
- **Unify both branches on `--not --all`?** It's also more robust than
`$old..$new` for non-fast-forward/force pushes (which can skip commits
off the direct path). Tempting to use it for the existing-ref case
too; deferred to keep this change tight, but worth a follow-up.
- **Client visibility mechanism.** Whether the silent-hang is a git
daemon sideband-relay issue or specific to the smolmachines forward
needs a focused repro before committing to a fix.
+25
View File
@@ -197,6 +197,24 @@ class TestHookRender(unittest.TestCase):
# Stdin is buffered to a tempfile so both phases can re-read.
self.assertIn("refs_file=$(mktemp)", hook)
def test_new_ref_scan_scoped_to_incoming_commits(self):
# A new branch (old=all-zeros) must scan only commits new to the
# gate, not the full ancestry — otherwise historical findings
# block every new-branch push (PRD 0028 / issue #106).
hook = git_gate_render_hook()
self.assertIn('log_opts="$new --not --all"', hook)
# The old over-broad full-ancestry range must be gone.
self.assertNotIn('log_opts="$new"', hook)
# Existing-branch delta scan is unchanged.
self.assertIn('log_opts="$old..$new"', hook)
def test_forward_ssh_is_non_interactive_and_bounded(self):
# No prompt (BatchMode) and a connect timeout, so an unreachable
# upstream fails fast instead of hanging the receive-pack.
hook = git_gate_render_hook()
self.assertIn("BatchMode=yes", hook)
self.assertIn("ConnectTimeout=", hook)
class TestAccessHookRender(unittest.TestCase):
def test_access_hook_refreshes_origin_on_upload_pack(self):
@@ -216,6 +234,13 @@ class TestAccessHookRender(unittest.TestCase):
self.assertIn("refusing to serve stale data", hook)
self.assertIn("exit 1", hook)
def test_access_hook_ssh_is_non_interactive_and_bounded(self):
# Same hardening as the forward path: the fetch ssh must not
# prompt and must time out rather than hang upload-pack.
hook = git_gate_render_access_hook()
self.assertIn("BatchMode=yes", hook)
self.assertIn("ConnectTimeout=", hook)
class TestPrepare(unittest.TestCase):
def setUp(self):