Files
bot-bottle/docs/research/git-gate-commit-approval.md
T
didericis a74dd2b97f
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
docs: research on git-gate commit approval; link from PRD 0012
2026-05-24 23:39:17 -04:00

230 lines
11 KiB
Markdown

# Approving specific commits past git-gate
Research into (1) whether a dashboard or operator surface for the
git-gate (a.k.a. "gitlock", PRD 0008) already exists, and (2) what a
narrowly-scoped approval flow for false-positive gitleaks rejections
could look like without compromising the gate's "if it's bypassable it
isn't a gate" property.
Motivated by PRD 0012's open question: when an agent commits docs
containing intentionally-bogus tokens that the secret scanner
correctly flags, the rejection is correct in the literal sense and
wrong in the user-intent sense, and there is no way to say so.
## Summary
There is no dashboard for the git-gate today. The CLI ships
`init / list / info / start / edit / cleanup` for bottles; the gate is
visible only as a sidecar in `bottle_plan.py`'s preflight rendering.
No `gate` subcommand exists.
There is also no exception mechanism. The pre-receive hook calls
`gitleaks git --log-opts="$range" --no-banner --redact` with no
config path and no allowlist surface. PRD 0008 explicitly rejects
exceptions ("Bypass for trusted commits. No `[skip gitleaks]`
trailer, no allowlist by commit hash. If the gate is bypassable it
isn't a gate.").
That non-goal is correct under its own framing — *any path the agent
can take* invalidates the gate — but it conflates two distinct
questions: can the *agent* bypass the gate (must be no), and can the
*user* approve a narrowly-scoped exception (could be yes, under
constraints). PRD 0012's recovery flow is exactly the seam where a
user-side, out-of-band approval could live without giving the agent
any in-band bypass.
The design problem is therefore not "should there be exceptions" but
"how narrow does an exception have to be before the gate is still a
gate." This note surveys gitleaks's native allowlist primitives,
sketches three approval-scope designs, and recommends a direction.
## Question 1: Is there a dashboard / operator surface for git-gate?
No, in three senses:
- **No CLI subcommand.** `claude_bottle/cli/` has `_common, cleanup,
edit, info, init, list, start` and nothing gate-specific.
`claude-bottle list` shows bottles, not their gates' state or
recent rejections.
- **No gate-side log surface.** Rejections are written to the
pre-receive hook's stderr (`echo "git-gate: gitleaks rejected push
to $ref" >&2`); the agent sees the rejection in its `git push`
output, but nothing persists outside the container's logs.
- **No upstream UI for git-gate.** gitleaks itself is a CLI; it has
no built-in dashboard. The hosted secret-scanning UIs surveyed in
`git-secret-scanning-hardening.md` (ggshield, TruffleHog Enterprise)
are SaaS products that ship repo content to a vendor — explicitly
the wrong shape for a project whose premise is sandbox isolation.
The PRD 0012 dashboard, when it exists, is the natural place for
git-gate operator surface to live: list pending change requests,
show recent rejections per bottle, render the diff of any
exception-approval request. There is no reason to build a separate
gate dashboard.
## Question 2: How could specific commits be approved?
### What gitleaks gives you natively
Gitleaks's TOML config supports an `[allowlist]` block (or
`[[rules.allowlists]]` per-rule) with four selectors that can be
combined inside a single entry. The selectors observed in current
gitleaks (v8) are:
- `paths` — list of regex against file paths.
- `regexes` — list of regex matched against the *finding's* matched
bytes; on match, suppress the finding. `regexTarget` chooses
whether the regex applies to the matched bytes, the surrounding
line, or the secret group only.
- `stopwords` — substrings that, if present in the finding, suppress
it. Cheaper than `regexes` for literal matches.
- `commits` — explicit commit SHAs to skip entirely.
- `condition` — `AND` (default) or `OR` across the above selectors,
letting an entry require, e.g., both a path match *and* a content
match before suppressing.
`commits` is the bluntest tool and the easiest to misuse: a single
SHA can hide arbitrary content. `paths + regexes` (with AND) is the
narrowest scope: a finding is only suppressed if it lives at a
specific path *and* matches a specific byte pattern. That's the
shape that makes a per-finding exception still defensible.
The hook today does not pass `--config` to gitleaks. Adding it would
mean baking a config file into the gate image *or* mounting one in
at `start` time. The image is built per `DockerGitGate.start`, so
either is mechanically straightforward.
### The design tension
PRD 0008's "no bypass for trusted commits" non-goal is load-bearing
*against the agent*. It is not load-bearing against the user, who
already has every privilege the gate is trying to deny the agent.
The risk of letting the user approve exceptions is not direct (the
user can already do whatever they want); it is indirect:
- **Prompt-injection laundering.** An attacker who has captured the
agent's prompt-stream can ask the agent to *request* an exception
that looks plausible ("I just need to commit the test fixture for
the new auth flow"). If the user rubber-stamps the request, the
attacker has used the user as a bypass channel. This is the same
risk as any human-in-the-loop control: it degrades to "no control"
if the human always says yes.
- **Scope creep of a granted exception.** A commit-SHA allowlist
approved for one commit could, in principle, be re-targeted at a
different commit if the allowlist isn't tied to the content. This
is why `commits` alone is unsafe; `paths + regexes` is the form
that survives content-substitution.
- **Persistence past intent.** An exception granted "just for this
commit" that stays in the gate's config indefinitely is no longer
a per-commit exception; it's a permanent allowlist entry. Without
TTL or a clean teardown, exceptions accrete.
These three risks shape the design constraints below.
### Three design options
**Option A — Reject and rotate.** Treat every gitleaks hit as
"rewrite the commit to not contain the literal token, then re-push."
For docs with fake tokens, use a sentinel string the repo's
gitleaks config recognizes as obviously not a real secret (e.g.
`AKIAIOSFODNN7EXAMPLE`, AWS's documented example key, or a project-
specific placeholder like `<aws-access-key-id>`).
- *Cost:* zero. No new code.
- *Property:* gate stays unbypassable in both senses.
- *Friction:* every author must know the placeholder convention. The
first time someone pastes a realistic-looking fake into a doc,
they get rejected and have to redo the commit. Probably fine for
the host repo; less fine for bottles authoring third-party content.
- *Verdict:* this should be the *default*. The exception flow exists
only for cases where Option A genuinely fails (e.g. the example is
specifically about a real-looking token format, or the upstream
doc requires the literal pattern).
**Option B — Per-finding narrow allowlist via PRD 0012 flow.** When
the agent's push is rejected, the agent invokes
`/request-gate-exception` (or `/request-bottle-change` with an
exception variant). The slash command POSTs to the cred-proxy
endpoint, carrying:
- the file path that triggered the finding
- the finding's matched-byte hash (not the bytes themselves, to keep
the request artifact non-secret on its own)
- the gitleaks rule ID
- a free-text justification ("docs example for AWS auth flow")
The user reviews the request in the dashboard, sees the file and the
diff, and approves an entry of shape `{ paths: [<exact path>],
regexes: [<exact-match regex over matched bytes>], condition: AND }`.
The gate restarts with that config entry merged into its
`.gitleaks.toml`. A future commit on the same path with a *different*
finding still hits the gate and rejects.
- *Property:* approved exceptions are content-locked, not commit-
locked. Substituting bytes on the same path triggers a fresh
rejection.
- *Auditability:* the approval is a manifest diff; it lives in git
history and in the PR conversation thread per PRD 0012.
- *Open: TTL.* Should the entry expire? Plausible defaults: never
(it's content-locked anyway), or "until the next manifest version
bump." Lean "never" for v1; revisit if exception lists balloon.
**Option C — Pre-flight scan with author signoff.** Run gitleaks
client-side inside the bottle (as a non-gating advisory check) so
the agent sees findings *before* attempting the push. The slash
command then includes the pre-known findings; the dashboard shows
the user the finding inline rather than having to go look at the
rejection log. On approval, same Option-B-style allowlist entry
gets added.
- *Property:* identical end-state to Option B; better UX because
the agent stops before the rejected push, not after.
- *Cost:* one more place that needs gitleaks installed (the bottle
image), and an in-bottle advisory check that the agent can in
principle ignore. That's fine because it's *advisory* — the gate
still rejects; the in-bottle check just avoids one round-trip.
- *Verdict:* nice-to-have over Option B, not a substitute.
### Recommendation
Default to Option A as the canonical answer ("rewrite to use a
placeholder"). Build Option B as the PRD 0012 exception path, scoped
narrowly: `paths + regexes` with AND, no `commits` selector exposed
to the approval flow. Defer Option C to a follow-up; it's an
ergonomic win, not a security property.
This puts the answer to PRD 0012's open question as:
- Same recovery shape (`/request-bottle-change`), distinguishable
request type. The dashboard renders an exception request
differently from a manifest-change request because the *diff*
being approved is to the gate's allowlist, not to the manifest.
- Exceptions are expressed as `(path, content-pattern)` pairs, not
commit SHAs. Re-pushing different bytes on the same path
re-triggers the gate.
- The approval is recorded twice for audit: in the PR thread (free-
text), and as a versioned diff to the gate's allowlist config (or
the manifest field that materializes into it).
## Cross-references
- PRD 0008 — git-gate design and "no bypass" non-goal.
- PRD 0010 — cred-proxy; the inbound endpoint PRD 0012 reuses for
exception requests.
- PRD 0012 — stuck-agent recovery flow; the open question this note
informs.
- `docs/research/git-secret-scanning-hardening.md` — prior research
on the secret-scanning tool landscape and why gitleaks is the fit.
## Sources
- [gitleaks configuration documentation](https://github.com/gitleaks/gitleaks#configuration)
— `[allowlist]` selectors (`paths`, `regexes`, `stopwords`,
`commits`, `regexTarget`, `condition`).
- [AWS example access key (`AKIAIOSFODNN7EXAMPLE`)](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html)
— documented placeholder safe to use in examples without
triggering most secret scanners.
- `claude_bottle/git_gate.py` — pre-receive hook implementation
(`gitleaks git --log-opts="$log_opts" --no-banner --redact`, no
`--config` argument today).