docs(research): survey gitleaks dashboards + add baseline-file primitive
This commit is contained in:
@@ -13,86 +13,200 @@ wrong in the user-intent sense, and there is no way to say so.
|
||||
|
||||
## Summary
|
||||
|
||||
There is no dashboard for the git-gate today. The CLI ships
|
||||
`init / list / info / start / edit / cleanup` for bottles; the gate is
|
||||
visible only as a sidecar in `bottle_plan.py`'s preflight rendering.
|
||||
No `gate` subcommand exists.
|
||||
No off-the-shelf dashboard fits the shape claude-bottle needs
|
||||
(per-bottle, host-local, integrated into a pre-receive rejection
|
||||
with approval feeding back into the gate's own decision). Gitleaks
|
||||
itself is a CLI with no UI and was declared **feature-complete** in
|
||||
early 2026; the author's successor project **Betterleaks** is
|
||||
explicitly "for the agentic era" but is also CLI-shaped and still
|
||||
young. The closest open-source dashboard is **DefectDojo**, which
|
||||
ingests gitleaks JSON but is post-hoc and org-scale — its "marked
|
||||
as accepted" state does not feed back into the scanner. SaaS
|
||||
dashboards (GitGuardian, TruffleHog Enterprise) ship repo content
|
||||
to a vendor and were already disqualified by
|
||||
`git-secret-scanning-hardening.md`.
|
||||
|
||||
There is also no exception mechanism. The pre-receive hook calls
|
||||
`gitleaks git --log-opts="$range" --no-banner --redact` with no
|
||||
config path and no allowlist surface. PRD 0008 explicitly rejects
|
||||
exceptions ("Bypass for trusted commits. No `[skip gitleaks]`
|
||||
trailer, no allowlist by commit hash. If the gate is bypassable it
|
||||
isn't a gate.").
|
||||
The git-gate ships no exception mechanism today: the pre-receive
|
||||
hook calls `gitleaks git --log-opts="$range" --no-banner --redact`
|
||||
with no `--config` and no `--baseline-path`, and PRD 0008
|
||||
explicitly rejects exceptions ("Bypass for trusted commits. No
|
||||
`[skip gitleaks]` trailer, no allowlist by commit hash. If the
|
||||
gate is bypassable it isn't a gate.").
|
||||
|
||||
That non-goal is correct under its own framing — *any path the agent
|
||||
can take* invalidates the gate — but it conflates two distinct
|
||||
questions: can the *agent* bypass the gate (must be no), and can the
|
||||
*user* approve a narrowly-scoped exception (could be yes, under
|
||||
constraints). PRD 0012's recovery flow is exactly the seam where a
|
||||
user-side, out-of-band approval could live without giving the agent
|
||||
any in-band bypass.
|
||||
That non-goal is correct against the *agent* but conflates two
|
||||
questions: can the *agent* bypass the gate (must be no), and can
|
||||
the *user* approve a narrowly-scoped exception out-of-band (could
|
||||
be yes). PRD 0012's recovery flow is exactly the seam where the
|
||||
user-side approval can live without giving the agent any in-band
|
||||
bypass.
|
||||
|
||||
The design problem is therefore not "should there be exceptions" but
|
||||
"how narrow does an exception have to be before the gate is still a
|
||||
gate." This note surveys gitleaks's native allowlist primitives,
|
||||
sketches three approval-scope designs, and recommends a direction.
|
||||
Gitleaks does ship one native primitive that maps well to "approve
|
||||
this specific finding" — the **baseline file** — which is
|
||||
semantically a better fit for per-finding approval than the
|
||||
allowlist config (a suppression *rule*). This note surveys the
|
||||
dashboard landscape, the two native primitives (allowlist and
|
||||
baseline), and recommends a direction.
|
||||
|
||||
## Question 1: Is there a dashboard / operator surface for git-gate?
|
||||
## Question 1: Existing dashboards and control surfaces
|
||||
|
||||
No, in three senses:
|
||||
### Inside claude-bottle today
|
||||
|
||||
- **No CLI subcommand.** `claude_bottle/cli/` has `_common, cleanup,
|
||||
edit, info, init, list, start` and nothing gate-specific.
|
||||
`claude-bottle list` shows bottles, not their gates' state or
|
||||
recent rejections.
|
||||
- **No gate-side log surface.** Rejections are written to the
|
||||
pre-receive hook's stderr (`echo "git-gate: gitleaks rejected push
|
||||
to $ref" >&2`); the agent sees the rejection in its `git push`
|
||||
output, but nothing persists outside the container's logs.
|
||||
- **No upstream UI for git-gate.** gitleaks itself is a CLI; it has
|
||||
no built-in dashboard. The hosted secret-scanning UIs surveyed in
|
||||
`git-secret-scanning-hardening.md` (ggshield, TruffleHog Enterprise)
|
||||
are SaaS products that ship repo content to a vendor — explicitly
|
||||
the wrong shape for a project whose premise is sandbox isolation.
|
||||
`claude_bottle/cli/` has `_common, cleanup, edit, info, init, list,
|
||||
start` — nothing gate-specific. The gate appears only as a sidecar
|
||||
in `bottle_plan.py`'s preflight rendering. Rejections are written
|
||||
to the pre-receive hook's stderr (`echo "git-gate: gitleaks
|
||||
rejected push to $ref" >&2`) and surface only in the agent's
|
||||
`git push` output — nothing persists outside the container's logs.
|
||||
|
||||
The PRD 0012 dashboard, when it exists, is the natural place for
|
||||
git-gate operator surface to live: list pending change requests,
|
||||
show recent rejections per bottle, render the diff of any
|
||||
exception-approval request. There is no reason to build a separate
|
||||
gate dashboard.
|
||||
### Native gitleaks: CLI-only, and now feature-complete
|
||||
|
||||
Gitleaks has no built-in dashboard or web UI. As of early 2026 the
|
||||
project has been declared **feature complete** — only security
|
||||
patches will be merged going forward. The original maintainer
|
||||
(Zachary Rice) has moved active work to Betterleaks (below), so
|
||||
any dashboard built directly against gitleaks should treat the
|
||||
gitleaks surface as frozen rather than evolving.
|
||||
|
||||
### Betterleaks: the same author's "agentic era" successor
|
||||
|
||||
Started February 2026 and explicitly framed for AI agents driving
|
||||
the scanner: flag-based output for low-token-overhead consumption,
|
||||
parallelized Git scanning, CEL-based filtering in place of the
|
||||
TOML allowlist, and a roadmap that includes LLM-assisted
|
||||
classification and automatic secret revocation via provider APIs.
|
||||
Still CLI-shaped — no dashboard either.
|
||||
|
||||
Relevant to claude-bottle in two ways:
|
||||
|
||||
- The upstream direction of travel is *toward* agent-driven
|
||||
scanners, which makes "the bottle invokes a scanner and reports
|
||||
findings up" a supported pattern rather than a hack.
|
||||
- CEL is a richer expression language for filter entries than
|
||||
gitleaks's selector struct, which loosens the design space for
|
||||
Option B (below). If claude-bottle ever swaps gitleaks for
|
||||
Betterleaks, the approval-flow design should be expressible in
|
||||
both.
|
||||
|
||||
### Output formats: SARIF + viewers
|
||||
|
||||
Both gitleaks and Betterleaks can emit SARIF. That plugs into
|
||||
GitHub Advanced Security's Code Scanning tab (read-only viewer
|
||||
with a dismiss-as-not-a-problem state) and assorted open-source
|
||||
SARIF viewers (`sarif-web-component`, Microsoft's VS Code
|
||||
extension). These render findings; they do not handle approval
|
||||
state or feed back into the scanner. Useful for *seeing* findings;
|
||||
not useful as the approval surface.
|
||||
|
||||
### Findings aggregators
|
||||
|
||||
[**DefectDojo**](https://defectdojo.com/integrations/gitleaks) is
|
||||
the closest open-source thing to "a dashboard for gitleaks." It
|
||||
ingests gitleaks JSON (and ~200 other scanners), aggregates and
|
||||
deduplicates, lets you triage and mark findings as accepted or
|
||||
false-positive in its UI, and tracks remediation state. Designed
|
||||
for org-scale: one DefectDojo instance covers many repos and
|
||||
scanners.
|
||||
|
||||
Shape mismatch for claude-bottle:
|
||||
|
||||
- DefectDojo's review state is *informational* — marking a finding
|
||||
as accepted in DefectDojo does not write to gitleaks's allowlist
|
||||
or baseline and does not change what the gate decides on the
|
||||
next push.
|
||||
- It expects findings as artifacts of CI runs, not as the
|
||||
rejection-cause of an in-flight push.
|
||||
- A single shared instance violates the one-sidecar-per-bottle
|
||||
posture; per-bottle DefectDojo instances are absurd overhead.
|
||||
|
||||
Useful to know it exists, especially for long-term post-hoc
|
||||
finding tracking. Not the v1 answer for the in-flight approval
|
||||
flow PRD 0012 needs.
|
||||
|
||||
A separate [JupiterOne integration](https://github.com/gitleaks-findings/gitleaks)
|
||||
exists but ships findings to JupiterOne's commercial platform and
|
||||
has effectively zero public adoption (0 stars, 0 forks). Mentioned
|
||||
only because its repo name suggests "the dashboard" and isn't.
|
||||
|
||||
### SaaS dashboards (disqualified by sandbox premise)
|
||||
|
||||
GitGuardian / ggshield and TruffleHog Enterprise both offer
|
||||
incident-triage UIs with finding-level approval state. Both ship
|
||||
repo content to a vendor; already disqualified in
|
||||
`git-secret-scanning-hardening.md` for a project whose entire
|
||||
premise is sandbox isolation.
|
||||
|
||||
### Bottom line
|
||||
|
||||
No off-the-shelf dashboard fits claude-bottle's shape: per-bottle,
|
||||
host-local, integrated into a pre-receive rejection with the
|
||||
approval feeding back into the gate's own decision-making. The
|
||||
nearest open-source analogue (DefectDojo) is post-hoc and
|
||||
org-scale; the nearest UX (GitGuardian) is SaaS. The PRD 0012
|
||||
dashboard — sharing surface with the broader stuck-agent recovery
|
||||
flow — remains the right place to build this.
|
||||
|
||||
## Question 2: How could specific commits be approved?
|
||||
|
||||
### What gitleaks gives you natively
|
||||
|
||||
Gitleaks's TOML config supports an `[allowlist]` block (or
|
||||
`[[rules.allowlists]]` per-rule) with four selectors that can be
|
||||
combined inside a single entry. The selectors observed in current
|
||||
gitleaks (v8) are:
|
||||
Two distinct primitives, and the distinction matters for designing
|
||||
an approval flow.
|
||||
|
||||
**Allowlists** are *suppression rules* — config-level patterns that
|
||||
say "ignore findings matching X." Gitleaks's TOML config supports
|
||||
an `[allowlist]` block (or `[[rules.allowlists]]` per-rule) with
|
||||
four selectors:
|
||||
|
||||
- `paths` — list of regex against file paths.
|
||||
- `regexes` — list of regex matched against the *finding's* matched
|
||||
bytes; on match, suppress the finding. `regexTarget` chooses
|
||||
whether the regex applies to the matched bytes, the surrounding
|
||||
line, or the secret group only.
|
||||
- `stopwords` — substrings that, if present in the finding, suppress
|
||||
it. Cheaper than `regexes` for literal matches.
|
||||
- `regexes` — list of regex matched against the finding bytes;
|
||||
`regexTarget` directs the regex at the extracted secret
|
||||
(default), the entire regex match, or the whole line.
|
||||
- `stopwords` — substrings that, if present, suppress the finding.
|
||||
- `commits` — explicit commit SHAs to skip entirely.
|
||||
- `condition` — `AND` (default) or `OR` across the above selectors,
|
||||
letting an entry require, e.g., both a path match *and* a content
|
||||
match before suppressing.
|
||||
|
||||
`commits` is the bluntest tool and the easiest to misuse: a single
|
||||
SHA can hide arbitrary content. `paths + regexes` (with AND) is the
|
||||
narrowest scope: a finding is only suppressed if it lives at a
|
||||
specific path *and* matches a specific byte pattern. That's the
|
||||
shape that makes a per-finding exception still defensible.
|
||||
Selectors combine with `condition = "OR"` (default; suppress if any
|
||||
selector matches) or `condition = "AND"` (suppress only if all
|
||||
match). `commits` is the bluntest tool and the easiest to misuse:
|
||||
a single SHA can hide arbitrary content. `paths + regexes` with
|
||||
AND is the narrowest scope, and the form that makes a per-finding
|
||||
exception still defensible.
|
||||
|
||||
The hook today does not pass `--config` to gitleaks. Adding it would
|
||||
mean baking a config file into the gate image *or* mounting one in
|
||||
at `start` time. The image is built per `DockerGitGate.start`, so
|
||||
either is mechanically straightforward.
|
||||
**Baselines** are a *known-findings list* — a JSON file of
|
||||
previously detected findings that gitleaks's `IsNew` function
|
||||
compares against on the next scan, so only new findings get
|
||||
reported. The file is generated by saving a scan's JSON output and
|
||||
fed back in via `--baseline-path`. The comparison checks RuleID,
|
||||
description, file path, line numbers, secret content, commit, and
|
||||
author/timestamp. When `--redact` is enabled, redacted Secret and
|
||||
Match fields are ignored in the comparison so the baseline still
|
||||
functions with redacted reports.
|
||||
|
||||
Detection flow is: global allowlist → rule-specific allowlist →
|
||||
baseline → reported finding. Allowlist suppressions therefore win
|
||||
over baseline; baseline is the last gate before report.
|
||||
|
||||
The hook today passes neither `--config` nor `--baseline-path`.
|
||||
Wiring either in is mechanically straightforward: the gate image
|
||||
is built per `DockerGitGate.start`, so the config / baseline can be
|
||||
baked into the image *or* mounted in at start.
|
||||
|
||||
**Allowlist vs baseline for approval storage.** Both can express
|
||||
"don't reject this finding," but they imply different things about
|
||||
intent:
|
||||
|
||||
- An *allowlist* entry says "any future finding that matches this
|
||||
pattern is fine." Generative: it covers findings that don't
|
||||
exist yet on commits that haven't been made.
|
||||
- A *baseline* entry says "this exact finding I've already seen is
|
||||
fine." Specific: it pins to the bytes / location / rule of one
|
||||
observed finding; a different finding on the same path on a
|
||||
later commit re-triggers.
|
||||
|
||||
For a per-commit user approval, baseline is the better semantic
|
||||
match: each approval is an attestation about one observed finding,
|
||||
not a rule that pre-approves a pattern. Baseline entries can also
|
||||
be diffed in PRs trivially (it's a JSON list) — they double as the
|
||||
audit record.
|
||||
|
||||
### The design tension
|
||||
|
||||
@@ -141,40 +255,41 @@ specific placeholder like `<aws-access-key-id>`).
|
||||
specifically about a real-looking token format, or the upstream
|
||||
doc requires the literal pattern).
|
||||
|
||||
**Option B — Per-finding narrow allowlist via PRD 0012 flow.** When
|
||||
the agent's push is rejected, the agent invokes
|
||||
**Option B — Per-finding approval via PRD 0012 flow.** When the
|
||||
agent's push is rejected, the agent invokes
|
||||
`/request-gate-exception` (or `/request-bottle-change` with an
|
||||
exception variant). The slash command POSTs to the cred-proxy
|
||||
endpoint, carrying:
|
||||
endpoint, carrying the gitleaks finding record (rule ID, file path,
|
||||
line, redacted match) and a free-text justification ("docs example
|
||||
for AWS auth flow").
|
||||
|
||||
- the file path that triggered the finding
|
||||
- the finding's matched-byte hash (not the bytes themselves, to keep
|
||||
the request artifact non-secret on its own)
|
||||
- the gitleaks rule ID
|
||||
- a free-text justification ("docs example for AWS auth flow")
|
||||
The user reviews the request in the dashboard, sees the file and
|
||||
the diff, and approves. The approval gets written into the gate's
|
||||
**baseline file** — the JSON list of known-OK findings the gate
|
||||
passes as `--baseline-path` to gitleaks. The gate restarts with
|
||||
the new baseline.
|
||||
|
||||
The user reviews the request in the dashboard, sees the file and the
|
||||
diff, and approves an entry of shape `{ paths: [<exact path>],
|
||||
regexes: [<exact-match regex over matched bytes>], condition: AND }`.
|
||||
The gate restarts with that config entry merged into its
|
||||
`.gitleaks.toml`. A future commit on the same path with a *different*
|
||||
finding still hits the gate and rejects.
|
||||
|
||||
- *Property:* approved exceptions are content-locked, not commit-
|
||||
locked. Substituting bytes on the same path triggers a fresh
|
||||
rejection.
|
||||
- *Auditability:* the approval is a manifest diff; it lives in git
|
||||
history and in the PR conversation thread per PRD 0012.
|
||||
- *Open: TTL.* Should the entry expire? Plausible defaults: never
|
||||
(it's content-locked anyway), or "until the next manifest version
|
||||
bump." Lean "never" for v1; revisit if exception lists balloon.
|
||||
- *Property:* approved findings are pinned to the specific
|
||||
observed bytes / path / rule. A different secret on the same
|
||||
path on a later commit re-triggers the gate.
|
||||
- *Auditability:* baseline file is JSON in git history; each PR
|
||||
approval becomes a diff to that file. The free-text
|
||||
justification lives in the PR thread per PRD 0012.
|
||||
- *Fallback to allowlist for canonical cases.* If a particular
|
||||
fixture file should be permanently understood as "examples only,"
|
||||
the user can promote a baseline entry to an `[allowlist]` rule
|
||||
with `paths + regexes` AND — explicit generalization, opt-in by
|
||||
the user, never by the agent.
|
||||
- *Open: TTL.* Should baseline entries expire? Baseline is specific
|
||||
by construction, so the case for expiration is weaker than for
|
||||
allowlist. Lean "never" for v1; revisit if baselines balloon.
|
||||
|
||||
**Option C — Pre-flight scan with author signoff.** Run gitleaks
|
||||
client-side inside the bottle (as a non-gating advisory check) so
|
||||
the agent sees findings *before* attempting the push. The slash
|
||||
command then includes the pre-known findings; the dashboard shows
|
||||
the user the finding inline rather than having to go look at the
|
||||
rejection log. On approval, same Option-B-style allowlist entry
|
||||
rejection log. On approval, same Option-B-style baseline entry
|
||||
gets added.
|
||||
|
||||
- *Property:* identical end-state to Option B; better UX because
|
||||
@@ -188,23 +303,28 @@ gets added.
|
||||
### Recommendation
|
||||
|
||||
Default to Option A as the canonical answer ("rewrite to use a
|
||||
placeholder"). Build Option B as the PRD 0012 exception path, scoped
|
||||
narrowly: `paths + regexes` with AND, no `commits` selector exposed
|
||||
to the approval flow. Defer Option C to a follow-up; it's an
|
||||
ergonomic win, not a security property.
|
||||
placeholder"). Build Option B as the PRD 0012 exception path,
|
||||
storing approvals in the gate's **baseline file** (not in an
|
||||
allowlist rule). Baseline is the right primitive because each
|
||||
approval is an attestation about one observed finding, not a
|
||||
generative pattern. Allowlist promotion is a separate, user-
|
||||
initiated escalation for cases that genuinely deserve patterning.
|
||||
The `commits` selector is never exposed to the approval flow under
|
||||
either path — it hides arbitrary content. Defer Option C to a
|
||||
follow-up; it's an ergonomic win, not a security property.
|
||||
|
||||
This puts the answer to PRD 0012's open question as:
|
||||
|
||||
- Same recovery shape (`/request-bottle-change`), distinguishable
|
||||
request type. The dashboard renders an exception request
|
||||
differently from a manifest-change request because the *diff*
|
||||
being approved is to the gate's allowlist, not to the manifest.
|
||||
- Exceptions are expressed as `(path, content-pattern)` pairs, not
|
||||
commit SHAs. Re-pushing different bytes on the same path
|
||||
re-triggers the gate.
|
||||
- The approval is recorded twice for audit: in the PR thread (free-
|
||||
text), and as a versioned diff to the gate's allowlist config (or
|
||||
the manifest field that materializes into it).
|
||||
being approved is to the gate's baseline file, not to the
|
||||
manifest.
|
||||
- Exceptions are expressed as baseline-file entries — finding-
|
||||
specific JSON records — not commit SHAs or regex patterns.
|
||||
- The approval is recorded twice for audit: in the PR thread
|
||||
(free-text justification), and as a versioned diff to the
|
||||
baseline file (which is committed alongside the manifest).
|
||||
|
||||
## Cross-references
|
||||
|
||||
@@ -218,12 +338,32 @@ This puts the answer to PRD 0012's open question as:
|
||||
|
||||
## Sources
|
||||
|
||||
- [gitleaks configuration documentation](https://github.com/gitleaks/gitleaks#configuration)
|
||||
— `[allowlist]` selectors (`paths`, `regexes`, `stopwords`,
|
||||
`commits`, `regexTarget`, `condition`).
|
||||
- [gitleaks repository](https://github.com/gitleaks/gitleaks) —
|
||||
`[allowlist]` selectors (`paths`, `regexes`, `stopwords`,
|
||||
`commits`, `regexTarget`, `condition`); also home of the
|
||||
feature-complete notice.
|
||||
- [Gitleaks allowlists & baselines (DeepWiki)](https://deepwiki.com/gitleaks/gitleaks/4.4-allowlists-and-baselines)
|
||||
— detailed walk-through of the allowlist selector struct, the
|
||||
baseline file format, the `IsNew` comparison logic, and the
|
||||
global→rule→baseline detection order. Primary source for the
|
||||
allowlist-vs-baseline distinction this note rests on.
|
||||
- [Betterleaks (GitHub)](https://github.com/betterleaks/betterleaks)
|
||||
— Zachary Rice's successor project; CEL filtering, agent-driven
|
||||
output design, roadmap for LLM-assisted classification.
|
||||
- [Help Net Security on Betterleaks](https://www.helpnetsecurity.com/2026/03/19/betterleaks-open-source-secrets-scanner/)
|
||||
and [The New Stack](https://thenewstack.io/betterleaks-open-source-secret-scanner/)
|
||||
— context on the "agentic era" framing and why gitleaks froze.
|
||||
- [DefectDojo gitleaks parser](https://defectdojo.com/integrations/gitleaks)
|
||||
— JSON ingest, finding triage UI, accept/false-positive state.
|
||||
Open-source, generic, post-hoc; informational state only —
|
||||
marking a finding as accepted does not feed back into the
|
||||
scanner. Shape mismatch for in-flight per-bottle approval.
|
||||
- [gitleaks-findings/gitleaks](https://github.com/gitleaks-findings/gitleaks)
|
||||
— JupiterOne integration, not a dashboard. Listed because the
|
||||
repo name is misleading.
|
||||
- [AWS example access key (`AKIAIOSFODNN7EXAMPLE`)](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html)
|
||||
— documented placeholder safe to use in examples without
|
||||
triggering most secret scanners.
|
||||
- `claude_bottle/git_gate.py` — pre-receive hook implementation
|
||||
(`gitleaks git --log-opts="$log_opts" --no-banner --redact`, no
|
||||
`--config` argument today).
|
||||
- `claude_bottle/git_gate.py` — pre-receive hook implementation.
|
||||
Today: `gitleaks git --log-opts="$log_opts" --no-banner
|
||||
--redact`; no `--config`, no `--baseline-path`.
|
||||
|
||||
Reference in New Issue
Block a user