PRD 0012: Stuck-agent recovery flow #18

Merged
didericis merged 13 commits from agent-unstuck into main 2026-05-25 04:19:52 -04:00
3 changed files with 608 additions and 0 deletions
@@ -0,0 +1,73 @@
# PRD 0012: Stuck-agent recovery flow
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-24
## Summary
When an agent running inside a claude-bottle container gets blocked, it invokes one of three MCP tool calls — `cred-proxy-block`, `pipelock-block`, or `capability-block` — passing a *proposed* config change (modified `routes.json`, modified pipelock allowlist, or modified agent Dockerfile) plus text describing why the change is justified. The supervisor sees the proposal in a host-side TUI, approves / modifies / rejects it, and the corresponding remediation runs: SIGHUP-reload cred-proxy with the new routes; restart pipelock with the new allowlist; rebuild the bottle from the new Dockerfile on the same branch. The agent's tool call blocks until the operator acts. The supervisor never opens a live channel into a running bottle; all signal flow goes through a per-bottle MCP sidecar on the existing internal network.
This PRD is the overview. Implementation is split across four follow-on PRDs (00130016); see *Implementation chunks* below.
## Problem
Running parallel agents in isolated bottles makes it cheap to spin up work in parallel, but expensive to recover when an agent gets stuck. Today, if a bottle is missing a permission or a tool the agent needs to make progress, the only options are to kill the container and start over (losing work) or open a live channel into the bottle to fix it in place (breaking the sandbox property that makes bottles trustworthy in the first place). The user feels this directly whenever a parallel run blocks on something the manifest didn't anticipate.
## Goals / Success Criteria
A real stuck agent recovers end-to-end in each of the three categories: a **cred-proxy block** is fixed by the operator approving the agent's proposed `routes.json`, SIGHUP-reloading cred-proxy, and the tool returning "approved, retry now"; a **pipelock block** is fixed by the operator approving the proposed allowlist, restarting pipelock, and the tool returning "approved, retry now"; a **capability block** triggers a bottle rebuild from the proposed Dockerfile, with the replacement agent picking up on the same branch. All three complete without anyone running `docker attach` or opening any live channel into the original container.
## Non-goals
- Live attach or in-place mutation of running containers. The whole design exists to avoid this.
- Agent-to-agent communication. Re-stated from the project's existing non-goals; the recovery flow is human→agent only.
- Auditing or forensic replay of agent runs. Git/forge history is the audit log; this PRD does not add a separate run log.
- Reducing time-to-unstuck below some target. Faster than kill-and-restart is implicit, but no specific SLO is in scope.
## Stuck categories
Three named categories, each with its own MCP tool. Ordered by remediation cost:
- **cred-proxy block.** Tool: `cred-proxy-block`. The agent's request was refused by cred-proxy — missing route, expired token, wrong scope. The agent reads the current `routes.json` from `/etc/claude-bottle/current-config/`, composes a modified version, and calls the tool with `{routes: <new file>, justification: "..."}`. The operator reviews the diff in the TUI; on approval, the supervisor writes the new `routes.json` and cred-proxy SIGHUP-reloads. In-flight connections are not dropped. The tool returns `{status: "approved", notes: "..."}` and the agent retries. Implementation: PRD 0014.
- **pipelock block.** Tool: `pipelock-block`. The agent's outbound request was refused by pipelock — host not in the allowlist, protocol not permitted. The agent reads the current allowlist, composes a modified version, and calls the tool with `{allowlist: <new file>, justification: "..."}`. On approval, the supervisor writes the new allowlist and restarts pipelock; in-flight outbound calls may drop and rely on retry. The tool returns the same approve/reject shape. Implementation: PRD 0015.
- **capability block.** Tool: `capability-block`. The bottle is missing a tool, skill, permission, or env var the agent needs — something that lives in the agent Dockerfile rather than in routes or the pipelock allowlist. The agent reads the current Dockerfile, composes a modified version, and calls the tool with `{dockerfile: <new file>, justification: "..."}`. On approval, the rebuild orchestrator tears down the bottle, builds from the new Dockerfile, and starts a replacement bottle on the same branch via the state-preservation helper. Because the current agent is about to be replaced, the tool's return is best-effort — the replacement agent inherits the approval record via the preserved transcript. Implementation: PRD 0016.
The three tools are dispatched by name, so the operator's TUI knows which remediation engine to wire to which proposal. The agent must choose the right tool for what failed: a 403 from a credentialed request is a `cred-proxy-block`; a connection refused at the egress is a `pipelock-block`; a "command not found" or missing-skill error is a `capability-block`.
## Why the MCP server is a sidecar, not in-container
The MCP server could in principle run inside the agent container. It doesn't, for reasons that are individually soft but together argue for the sidecar shape.
The security boundary is the operator approving in the TUI plus the host-side config write, neither of which depends on where the MCP server runs. Even if a compromised agent forged a proposal end-to-end, a human still has to approve before cred-proxy, pipelock, or the Dockerfile change. So in-container isn't a hard violation of the bottle wall.
It's still the wrong placement for five reasons:
1. **Pattern consistency.** cred-proxy and pipelock are sidecars because boundary services don't live in the agent's process space. An in-container MCP server would be the first exception, and the next reader has to ask why this one is different.
2. **Audit metadata trust.** Sidecar-generated timestamps and bottle metadata are trusted; in-container metadata is whatever the agent wrote. Same outcome (the operator is still the gate) but weaker provenance.
3. **Connection lifecycle.** A sidecar holds the tool-call connection independently of the agent process — agent crash or restart doesn't orphan a pending operator response.
4. **Future enforcement headroom.** If the MCP server ever needs to *enforce* something (rate limits, dedup, schema-strict rejection), it has to be a trusted process. Building it in-container now means re-architecting later.
5. **Pipelock cleanliness.** Sidecar-on-internal-network is the same egress shape pipelock already permits. In-container would need a loopback exception in the allowlist.
## Implementation chunks
- **PRD 0013 — Supervise plane foundation.** MCP sidecar shell, three tool definitions, proposal queue, read-only current-config mount, minimal TUI, audit log format. After 0013, an operator can see proposals and approve/reject them but no remediation actually runs (the approval handlers are no-ops).
- **PRD 0014 — cred-proxy block remediation.** cred-proxy SIGHUP reload, host-side write on approval, `routes edit <bottle>` TUI verb, cred-proxy audit log filled in. First end-to-end useful category.
- **PRD 0015 — pipelock block remediation.** pipelock restart wiring, host-side write on approval, `pipelock edit <bottle>` TUI verb, pipelock audit log filled in. Same shape as 0014 for a different sidecar.
- **PRD 0016 — capability block remediation.** Rebuild orchestrator, state-preservation helper, `capability-block` end-to-end wiring, bottle-lifecycle changes for orchestrated teardown + rebuild. Heaviest chunk, lands last.
0013 is a hard prerequisite for 00140016. The other three can in principle ship in any order, but the recommended sequence is cheapest-blast-radius first (0014 → 0015 → 0016) so cheaper wins land while the rebuild path is being designed.
## Open questions
- **Text-only vs. structured tools.** An earlier draft of this PRD used a text-only protocol (`/supervise/notify` returning `{text}`); this revision uses three structured MCP tools that carry the agent's proposed file. **Structured wins on:** richer triage signal (operator sees the diff up front, not just a description of it), cleaner audit (the agent's proposed shape is captured alongside the operator's action), and the agent does diff-authoring work the operator would otherwise have to do. **Structured costs:** larger wire surface, the agent has to know the file formats (`routes.json` schema, Dockerfile syntax, pipelock allowlist format), miscategorization is possible (e.g. a 403 the agent reads as a `cred-proxy-block` might actually be a pipelock issue at a different layer). **Text-only wins on:** smallest possible protocol, no schema burden on the agent, easy to extend (every new category is just another reason in prose). **Text-only costs:** operator does all the diff authoring, audit log loses the agent's proposed shape, no opportunity for the agent's understanding of the fix to be inspected. Worth re-litigating if the MCP sidecar grows complex relative to the value it produces.
- **Tool-denial auto-detection.** Should v1 also ship a denial hook that auto-invokes one of the three tools without the agent's reasoning step, or strictly the agent-initiated form? Currently deferred; agent-initiated is safer (the agent has the most context about *why* it needed the call that was denied).
## References
- PRD 0010 — cred-proxy (gains SIGHUP reload of `routes.json` in 0014).
- PRD 0013 — supervise plane foundation.
- PRD 0014 — cred-proxy block remediation.
- PRD 0015 — pipelock block remediation.
- PRD 0016 — capability block remediation.
- `CLAUDE.md` — project non-goal on agent-to-agent communication; this PRD stays on the human→agent side of that line.
@@ -0,0 +1,96 @@
# Built-in Supervisor Design
## Question
Can claude-bottle grow a built-in supervisor — TUI inventory plus PR-feedback routing — without breaking the per-bottle isolation model, and without departing from the bash-first, low-dependency posture?
## Context
claude-bottle today is a fleet *executor*: `./cli.py start <agent>` brings up one bottle (agent container + pipelock + optional git-gate + optional cred-proxy on a per-bottle internal network), and `cli.py` tears it down when the session ends. There is no inventory view, no idle-detection, no automated reaction to PR or CI events. In parallel use, a human is the supervisor — opening one terminal per bottle, switching between them, and watching upstream PR state by hand.
A separate survey of the broader ecosystem ([agent control dashboards research, mid-2026](https://gitea.dideric.is/didericis/consilium-research/src/branch/main/developer-workflow/agent-control-dashboards-2026-05-24.md)) sorts dashboards into five tiers (session managers, parallel runners, Kanban boards, mission-control SPAs, observability backends). The earlier first-pass conclusion was that a full SPA tier conflicts with claude-bottle's isolation model. This doc reconsiders the smaller question: a TUI supervisor in the existing Python CLI.
## What I got wrong the first time
The earlier framing treated "add a supervisor" as synonymous with "adopt something Composio-AO-shaped" — a Next.js SPA with plugins, dashboards, and a long-running web server. On that framing, the answer is correctly "no, that's too heavy and breaks isolation."
But the framing collapses two different costs that aren't actually coupled:
1. The runtime cost of *each bottle* (already paid: container + 13 sidecars + 2 networks).
2. The runtime cost of a *supervisor* that watches and controls bottles.
A supervisor doesn't have to be heavy. A TUI built into the existing Python CLI, reading `docker ps` and host-side log files, is closer in spirit to `tmux-agent-status` than to Mission Control. The trust analysis below is what actually matters.
## Proposed design
Three layers, each independently useful, in order of ambition:
### 1. `./cli.py status` — read-only inventory
Reads `docker ps` filtered by a bottle label and tails each bottle's session log. Reports per bottle: name, agent, uptime, last-activity timestamp, token spend if available, associated PR/branch if recorded.
No new daemons. No new ports. No new credentials. ~100 lines.
### 2. `./cli.py watch` — TUI over the same data
Same data as `status`, rendered with auto-refresh and keyboard shortcuts that shell out to the existing `cli.py attach / stop / start` commands.
Library choice: prefer the stdlib `curses` module to stay bash-first-adjacent; fall back to `rich` or `textual` only if the curses path proves painful. Both `rich` and `textual` are single-purpose, pure-Python deps with no transitive bloat, but they are still new deps and per the project conventions warrant a deliberate decision.
This is the Claude Squad / tmux-agent-status pattern, applied to bottles instead of tmux sessions. The whole category exists *because* a TUI is the lightweight shape that doesn't require what the SPA tier requires.
### 3. `./cli.py supervise` — PR feedback router
The optional, more ambitious layer. The bottle manifest gains an optional field:
```yaml
pr_watch:
upstream: gitea.dideric.is/didericis/myproject
branch: agent/task-42
```
`./cli.py supervise` polls the named upstream for new review comments and CI failures on `branch`. When one fires, it surfaces as a desktop notification or a flash in the TUI. The human decides what to do with the feedback — there is no autonomous loop that feeds the comment back into a bottle's next prompt (see "Where to be conservative" for why).
The polling token is a **host** token (the same `GH_PAT` / Gitea token the host already keeps in shell env), not a bottle credential. The supervisor never holds bottle secrets.
## Why this doesn't break the trust model
The load-bearing question is whether the supervisor introduces the privileged-channel-into-every-bottle problem that disqualifies the SPA tier. It does not, for four reasons:
| Concern | Mitigation |
|---|---|
| Reaching into running bottles | Supervisor reads `docker ps` and host-side log files. The host already sees both — Docker is the trust boundary, the supervisor is on the host side of it. |
| Holding bottle credentials | The polling token is a host token. The supervisor never receives `bottle.cred_proxy.routes` entries; it has no path to them. |
| Bridging between bottles | The supervisor does not relay state from bottle A to bottle B. It relays *upstream PR state* to a bottle's next prompt — and only if the manifest opts in. |
| New attack surface | All "control" actions go through `./cli.py start <agent>`, which already enforces the manifest. The supervisor is an automated caller of the existing CLI, not a parallel control plane. |
The boundary stays at the bottle wall. The supervisor looks outward at git/PR state and downward at Docker; it does not look *inward* through pipelock.
This also doesn't conflict with the "lean on git history for auditing" non-goal. The supervisor is using git/PR state as the *input* to its loop, not constructing a separate audit log. Git history remains the source of truth for what happened.
## Where to be conservative
A few design defaults worth holding:
- **No auto-respawn.** The supervisor surfaces PR feedback to a human, never to the bottle's next prompt. The autonomous flow (review-comment → tear down → relaunch with the comment prepended) was considered and rejected: in a public-ish repo, any commenter could inject content that the next launch would treat as system instructions, with the agent's full bottle privileges. Available mitigations — commenter allowlists, prompt-injection regex screens, private-repo defaults — are all soft. The load-bearing defense is to keep the human between the review comment and any agent prompt. Notify-only is the only mode.
- **Idle detection is harder than it looks.** Last-log-line-age works ~80% of the time. Codeman's Ralph Loop tracker (watching for `<promise>` tags) is more accurate but adds complexity and tooling-coupling. Start with the dumb version; add heuristics only when actual confusion arises.
- **No web UI.** A browser UI reintroduces the privileged-channel problem — the browser talks to a server that talks to all bottles. TUI sidesteps it because the supervisor runs in the user's own shell context, not as a long-running daemon serving multiple consumers.
- **State file in `~/.claude-bottle/`, not inside any bottle.** The mapping of bottle → PR → status lives next to the manifest. Nothing about the supervisor's bookkeeping enters a bottle.
- **No new credentials on bottles.** PR-watch is a host-side concern. A bottle's manifest *names* the upstream/branch to watch; it does not grant the bottle the ability to read PR state itself.
## Trust-model edge cases worth flagging
- **Cross-host supervisor.** If the supervisor ever grows to coordinate bottles on multiple hosts, the trust analysis changes — the polling token now has to travel, and the "host" boundary is no longer one machine. Out of scope for v1; flag in any future design doc that contemplates it.
- **Native Claude Code OpenTelemetry as an alternative observability path.** Setting `CLAUDE_CODE_ENABLE_TELEMETRY=1` inside a bottle would emit OTel data — but the bottle's pipelock allowlist must then include the OTel collector's host, and the trace data is generated by the (untrusted) agent rather than by the (trusted) host-side perimeter. The supervisor-via-Docker-plus-pipelock-logs path is the cleaner observability spine for this project's threat model; in-bottle OTel is an extra signal worth weighing case-by-case, not the default.
## Scope estimate
The full `status` / `watch` / `supervise` trio is plausibly ~500 lines of Python on top of the existing CLI, no new runtimes, no new daemons, no new ports, and (with `curses`) no new deps. That fits "Low dependencies by default. The project is bash-first; ask before adding new tools, runtimes, or package managers" without requiring an exception.
Phased: `status` first (purely additive, no design decisions), then `watch` (the design decisions are mostly UX, not architecture), then `supervise` (the only layer that introduces a new behavioral default and warrants a PRD of its own).
## Conclusion
A supervisor that respects the bottle wall is a small natural extension of what claude-bottle already is, not a category shift toward Mission Control / Codeman / Composio AO. The mistake in earlier framing was treating "supervisor" as synonymous with "dashboard SPA." The trust-model question that disqualifies the SPA tier (privileged channel into every bottle) does not apply to a TUI that reads host-side signals and shells out to the existing CLI.
Recommendation: build `status` and `watch` opportunistically when the pain is felt; treat `supervise` as a separate PRD before implementation, scoped to notify-only (no autonomous loop from review comment to next agent prompt — see "Where to be conservative").
+439
View File
@@ -0,0 +1,439 @@
# Approving specific commits past git-gate
Research into (1) whether a dashboard or operator surface for the
git-gate (a.k.a. "gitlock", PRD 0008) already exists, and (2) what a
narrowly-scoped approval flow for false-positive gitleaks rejections
could look like without compromising the gate's "if it's bypassable it
isn't a gate" property.
Motivated by PRD 0012's open question: when an agent commits docs
containing intentionally-bogus tokens that the secret scanner
correctly flags, the rejection is correct in the literal sense and
wrong in the user-intent sense, and there is no way to say so.
## Summary
No off-the-shelf dashboard fits the shape claude-bottle needs
(per-bottle, host-local, integrated into a pre-receive rejection
with approval feeding back into the gate's own decision). Gitleaks
itself is a CLI with no UI and was declared **feature-complete** in
early 2026; the author's successor project **Betterleaks** is
explicitly "for the agentic era" but is also CLI-shaped and still
young. The closest open-source dashboard is **DefectDojo**, which
ingests gitleaks JSON but is post-hoc and org-scale — its "marked
as accepted" state does not feed back into the scanner. SaaS
dashboards (GitGuardian, TruffleHog Enterprise) ship repo content
to a vendor and were already disqualified by
`git-secret-scanning-hardening.md`.
The git-gate ships no exception mechanism today: the pre-receive
hook calls `gitleaks git --log-opts="$range" --no-banner --redact`
with no `--config` and no `--baseline-path`, and PRD 0008
explicitly rejects exceptions ("Bypass for trusted commits. No
`[skip gitleaks]` trailer, no allowlist by commit hash. If the
gate is bypassable it isn't a gate.").
That non-goal is correct against the *agent* but conflates two
questions: can the *agent* bypass the gate (must be no), and can
the *user* approve a narrowly-scoped exception out-of-band (could
be yes). PRD 0012's recovery flow is exactly the seam where the
user-side approval can live without giving the agent any in-band
bypass.
Gitleaks does ship one native primitive that maps well to "approve
this specific finding" — the **baseline file** — which is
semantically a better fit for per-finding approval than the
allowlist config (a suppression *rule*). This note surveys the
dashboard landscape, the two native primitives (allowlist and
baseline), and recommends a direction.
## Question 1: Existing dashboards and control surfaces
### Inside claude-bottle today
`claude_bottle/cli/` has `_common, cleanup, edit, info, init, list,
start` — nothing gate-specific. The gate appears only as a sidecar
in `bottle_plan.py`'s preflight rendering. Rejections are written
to the pre-receive hook's stderr (`echo "git-gate: gitleaks
rejected push to $ref" >&2`) and surface only in the agent's
`git push` output — nothing persists outside the container's logs.
### Native gitleaks: CLI-only, and now feature-complete
Gitleaks has no built-in dashboard or web UI. As of early 2026 the
project has been declared **feature complete** — only security
patches will be merged going forward. The original maintainer
(Zachary Rice) has moved active work to Betterleaks (below), so
any dashboard built directly against gitleaks should treat the
gitleaks surface as frozen rather than evolving.
### Betterleaks: the same author's "agentic era" successor
Started February 2026 and explicitly framed for AI agents driving
the scanner: flag-based output for low-token-overhead consumption,
parallelized Git scanning, CEL-based filtering in place of the
TOML allowlist, and a roadmap that includes LLM-assisted
classification and automatic secret revocation via provider APIs.
Still CLI-shaped — no dashboard either.
Relevant to claude-bottle in two ways:
- The upstream direction of travel is *toward* agent-driven
scanners, which makes "the bottle invokes a scanner and reports
findings up" a supported pattern rather than a hack.
- CEL is a richer expression language for filter entries than
gitleaks's selector struct, which loosens the design space for
Option B (below). If claude-bottle ever swaps gitleaks for
Betterleaks, the approval-flow design should be expressible in
both.
### Output formats: SARIF + viewers
Both gitleaks and Betterleaks can emit SARIF. That plugs into
GitHub Advanced Security's Code Scanning tab (read-only viewer
with a dismiss-as-not-a-problem state) and assorted open-source
SARIF viewers (`sarif-web-component`, Microsoft's VS Code
extension). These render findings; they do not handle approval
state or feed back into the scanner. Useful for *seeing* findings;
not useful as the approval surface.
### Findings aggregators
[**DefectDojo**](https://defectdojo.com/integrations/gitleaks) is
the closest open-source thing to "a dashboard for gitleaks." It
ingests gitleaks JSON (and ~200 other scanners), aggregates and
deduplicates, lets you triage and mark findings as accepted or
false-positive in its UI, and tracks remediation state. Designed
for org-scale: one DefectDojo instance covers many repos and
scanners.
Shape mismatch for claude-bottle:
- DefectDojo's review state is *informational* — marking a finding
as accepted in DefectDojo does not write to gitleaks's allowlist
or baseline and does not change what the gate decides on the
next push.
- It expects findings as artifacts of CI runs, not as the
rejection-cause of an in-flight push.
- A single shared instance violates the one-sidecar-per-bottle
posture; per-bottle DefectDojo instances are absurd overhead.
Useful to know it exists, especially for long-term post-hoc
finding tracking. Not the v1 answer for the in-flight approval
flow PRD 0012 needs.
A separate [JupiterOne integration](https://github.com/gitleaks-findings/gitleaks)
exists but ships findings to JupiterOne's commercial platform and
has effectively zero public adoption (0 stars, 0 forks). Mentioned
only because its repo name suggests "the dashboard" and isn't.
### SaaS dashboards (disqualified by sandbox premise)
GitGuardian / ggshield and TruffleHog Enterprise both offer
incident-triage UIs with finding-level approval state. Both ship
repo content to a vendor; already disqualified in
`git-secret-scanning-hardening.md` for a project whose entire
premise is sandbox isolation.
### Bottom line
No off-the-shelf dashboard fits claude-bottle's shape: per-bottle,
host-local, integrated into a pre-receive rejection with the
approval feeding back into the gate's own decision-making. The
nearest open-source analogue (DefectDojo) is post-hoc and
org-scale; the nearest UX (GitGuardian) is SaaS. The PRD 0012
dashboard — sharing surface with the broader stuck-agent recovery
flow — remains the right place to build this.
## Question 2: How could specific commits be approved?
### What gitleaks gives you natively
Two distinct primitives, and the distinction matters for designing
an approval flow.
**Allowlists** are *suppression rules* — config-level patterns that
say "ignore findings matching X." Gitleaks's TOML config supports
an `[allowlist]` block (or `[[rules.allowlists]]` per-rule) with
four selectors:
- `paths` — list of regex against file paths.
- `regexes` — list of regex matched against the finding bytes;
`regexTarget` directs the regex at the extracted secret
(default), the entire regex match, or the whole line.
- `stopwords` — substrings that, if present, suppress the finding.
- `commits` — explicit commit SHAs to skip entirely.
Selectors combine with `condition = "OR"` (default; suppress if any
selector matches) or `condition = "AND"` (suppress only if all
match). `commits` is the bluntest tool and the easiest to misuse:
a single SHA can hide arbitrary content. `paths + regexes` with
AND is the narrowest scope, and the form that makes a per-finding
exception still defensible.
**Baselines** are a *known-findings list* — a JSON file of
previously detected findings that gitleaks's `IsNew` function
compares against on the next scan, so only new findings get
reported. The file is generated by saving a scan's JSON output and
fed back in via `--baseline-path`. The comparison checks RuleID,
description, file path, line numbers, secret content, commit, and
author/timestamp. When `--redact` is enabled, redacted Secret and
Match fields are ignored in the comparison so the baseline still
functions with redacted reports.
Detection flow is: global allowlist → rule-specific allowlist →
baseline → reported finding. Allowlist suppressions therefore win
over baseline; baseline is the last gate before report.
The hook today passes neither `--config` nor `--baseline-path`.
Wiring either in is mechanically straightforward: the gate image
is built per `DockerGitGate.start`, so the config / baseline can be
baked into the image *or* mounted in at start.
**Allowlist vs baseline for approval storage.** Both can express
"don't reject this finding," but they imply different things about
intent:
- An *allowlist* entry says "any future finding that matches this
pattern is fine." Generative: it covers findings that don't
exist yet on commits that haven't been made.
- A *baseline* entry says "this exact finding I've already seen is
fine." Specific: it pins to the bytes / location / rule of one
observed finding; a different finding on the same path on a
later commit re-triggers.
For a per-commit user approval, baseline is the better semantic
match: each approval is an attestation about one observed finding,
not a rule that pre-approves a pattern. Baseline entries can also
be diffed in PRs trivially (it's a JSON list) — they double as the
audit record.
### The design tension
PRD 0008's "no bypass for trusted commits" non-goal is load-bearing
*against the agent*. It is not load-bearing against the user, who
already has every privilege the gate is trying to deny the agent.
The risk of letting the user approve exceptions is not direct (the
user can already do whatever they want); it is indirect:
- **Prompt-injection laundering.** An attacker who has captured the
agent's prompt-stream can ask the agent to *request* an exception
that looks plausible ("I just need to commit the test fixture for
the new auth flow"). If the user rubber-stamps the request, the
attacker has used the user as a bypass channel. This is the same
risk as any human-in-the-loop control: it degrades to "no control"
if the human always says yes.
- **Scope creep of a granted exception.** A commit-SHA allowlist
approved for one commit could, in principle, be re-targeted at a
different commit if the allowlist isn't tied to the content. This
is why `commits` alone is unsafe; `paths + regexes` is the form
that survives content-substitution.
- **Persistence past intent.** An exception granted "just for this
commit" that stays in the gate's config indefinitely is no longer
a per-commit exception; it's a permanent allowlist entry. Without
TTL or a clean teardown, exceptions accrete.
These three risks shape the design constraints below.
### Three design options
**Option A — Reject and rotate.** Treat every gitleaks hit as
"rewrite the commit to not contain the literal token, then re-push."
For docs with fake tokens, use a sentinel string the repo's
gitleaks config recognizes as obviously not a real secret (e.g.
`AKIAIOSFODNN7EXAMPLE`, AWS's documented example key, or a project-
specific placeholder like `<aws-access-key-id>`).
- *Cost:* zero. No new code.
- *Property:* gate stays unbypassable in both senses.
- *Friction:* every author must know the placeholder convention. The
first time someone pastes a realistic-looking fake into a doc,
they get rejected and have to redo the commit. Probably fine for
the host repo; less fine for bottles authoring third-party content.
- *Verdict:* this should be the *default*. The exception flow exists
only for cases where Option A genuinely fails (e.g. the example is
specifically about a real-looking token format, or the upstream
doc requires the literal pattern).
**Option B — Per-finding approval via PRD 0012 flow.** When the
agent's push is rejected, the agent invokes
`/request-gate-exception` (or `/request-bottle-change` with an
exception variant). The slash command POSTs to the cred-proxy
endpoint, carrying the gitleaks finding record (rule ID, file path,
line, redacted match) and a free-text justification ("docs example
for AWS auth flow").
The user reviews the request in the dashboard, sees the file and
the diff, and approves. The approval gets written into the gate's
**baseline file** — the JSON list of known-OK findings the gate
passes as `--baseline-path` to gitleaks. The gate restarts with
the new baseline.
- *Property:* approved findings are pinned to the specific
observed bytes / path / rule. A different secret on the same
path on a later commit re-triggers the gate.
- *Auditability:* baseline file is JSON in git history; each PR
approval becomes a diff to that file. The free-text
justification lives in the PR thread per PRD 0012.
- *Fallback to allowlist for canonical cases.* If a particular
fixture file should be permanently understood as "examples only,"
the user can promote a baseline entry to an `[allowlist]` rule
with `paths + regexes` AND — explicit generalization, opt-in by
the user, never by the agent.
- *Open: TTL.* Should baseline entries expire? Baseline is specific
by construction, so the case for expiration is weaker than for
allowlist. Lean "never" for v1; revisit if baselines balloon.
**Option C — Pre-flight scan with author signoff.** Run gitleaks
client-side inside the bottle (as a non-gating advisory check) so
the agent sees findings *before* attempting the push. The slash
command then includes the pre-known findings; the dashboard shows
the user the finding inline rather than having to go look at the
rejection log. On approval, same Option-B-style baseline entry
gets added.
- *Property:* identical end-state to Option B; better UX because
the agent stops before the rejected push, not after.
- *Cost:* one more place that needs gitleaks installed (the bottle
image), and an in-bottle advisory check that the agent can in
principle ignore. That's fine because it's *advisory* — the gate
still rejects; the in-bottle check just avoids one round-trip.
- *Verdict:* nice-to-have over Option B, not a substitute.
### Recommendation
Default to Option A as the canonical answer ("rewrite to use a
placeholder"). Build Option B as the PRD 0012 exception path,
storing approvals in the gate's **baseline file** (not in an
allowlist rule). Baseline is the right primitive because each
approval is an attestation about one observed finding, not a
generative pattern. Allowlist promotion is a separate, user-
initiated escalation for cases that genuinely deserve patterning.
The `commits` selector is never exposed to the approval flow under
either path — it hides arbitrary content. Defer Option C to a
follow-up; it's an ergonomic win, not a security property.
This puts the answer to PRD 0012's open question as:
- Same recovery shape (`/request-bottle-change`), distinguishable
request type. The dashboard renders an exception request
differently from a manifest-change request because the *diff*
being approved is to the gate's baseline file, not to the
manifest.
- Exceptions are expressed as baseline-file entries — finding-
specific JSON records — not commit SHAs or regex patterns.
- The approval is recorded twice for audit: in the PR thread
(free-text justification), and as a versioned diff to the
baseline file (which is committed alongside the manifest).
## Should we switch to Betterleaks?
Not yet. Worth tracking. The wins are real on paper, but a
security gate is the wrong place to bet on a three-month-old
project, and the vendor-side benchmark numbers (98.6% recall vs
gitleaks's 70.4% on CredData) have not been independently
reproduced in published sources.
### What Betterleaks would add for claude-bottle
- **Detection coverage on encoded secrets.** Native handling of
doubly- and triply-encoded matches. This matters in the
threat model: an agent coaxed into base64-wrapping a token to
slip past the gate is a plausible attack, and gitleaks's
entropy-based approach misses many encodings.
- **CEL filters instead of the TOML allowlist struct.** More
expressive than `paths + regexes + condition`. Doesn't unlock
anything fundamental, but cleaner if exception rules ever need
conjunctive logic ("allow if path matches X *and* line contains
a documented placeholder string").
- **Agent-aware output.** Flag-based, low-token-overhead CLI
output designed for an AI agent (like one running inside a
bottle) to consume. Useful for the `/request-gate-exception`
slash command's parsing path; ergonomic win, not security-
load-bearing.
- **Avoids the frozen-upstream problem.** Gitleaks is feature-
complete, so a migration is eventually forced; the question is
whether to pay the cost now or later.
### What it would cost
- The existing pre-receive hook calls `gitleaks git
--log-opts=<range> --no-banner --redact`. Betterleaks's CLI
surface is similar but not identical and was not designed as a
drop-in for that specific invocation. Some hook rewrite is
likely.
- Whether Betterleaks has a baseline-file equivalent (the
storage format Option B recommends) is unconfirmed at the time
of writing. If it does not, Option B's storage format would
have to be re-derived against whatever Betterleaks offers.
- A three-month-old project has fewer security audits, fewer
third-party integrations, and a smaller community than
gitleaks has accumulated since 2018. The gate is exactly where
that asymmetry matters most.
### Criteria to revisit
Revisit when at least two of the following are true:
- Betterleaks has accumulated ~12 months of stable releases and
at least one external security audit.
- The CredData benchmark numbers have been independently
reproduced.
- A baseline-file equivalent (or a clearly better primitive for
per-finding approval storage) is shipped and documented.
- Gitleaks releases a security patch we cannot apply because the
underlying issue is a design choice rather than a bug — i.e.
the frozen status starts to bite.
### Forward-compatibility for the approval flow
Independent of the switching decision, Option B should treat the
choice of scanner as substitutable. Practically: the approval-
flow contract is "an approval is a finding-specific JSON record
stored alongside the manifest"; the *format* of that record
(gitleaks baseline schema today, something else later) is a
serialization concern downstream of the contract. Swapping
scanners then becomes a serialization migration, not a flow
redesign.
## Cross-references
- PRD 0008 — git-gate design and "no bypass" non-goal.
- PRD 0010 — cred-proxy; the inbound endpoint PRD 0012 reuses for
exception requests.
- PRD 0012 — stuck-agent recovery flow; the open question this note
informs.
- `docs/research/git-secret-scanning-hardening.md` — prior research
on the secret-scanning tool landscape and why gitleaks is the fit.
## Sources
- [gitleaks repository](https://github.com/gitleaks/gitleaks) —
`[allowlist]` selectors (`paths`, `regexes`, `stopwords`,
`commits`, `regexTarget`, `condition`); also home of the
feature-complete notice.
- [Gitleaks allowlists & baselines (DeepWiki)](https://deepwiki.com/gitleaks/gitleaks/4.4-allowlists-and-baselines)
— detailed walk-through of the allowlist selector struct, the
baseline file format, the `IsNew` comparison logic, and the
global→rule→baseline detection order. Primary source for the
allowlist-vs-baseline distinction this note rests on.
- [Betterleaks (GitHub)](https://github.com/betterleaks/betterleaks)
— Zachary Rice's successor project; CEL filtering, agent-driven
output design, roadmap for LLM-assisted classification.
- [Help Net Security on Betterleaks](https://www.helpnetsecurity.com/2026/03/19/betterleaks-open-source-secrets-scanner/)
and [The New Stack](https://thenewstack.io/betterleaks-open-source-secret-scanner/)
— context on the "agentic era" framing and why gitleaks froze.
- [DefectDojo gitleaks parser](https://defectdojo.com/integrations/gitleaks)
— JSON ingest, finding triage UI, accept/false-positive state.
Open-source, generic, post-hoc; informational state only —
marking a finding as accepted does not feed back into the
scanner. Shape mismatch for in-flight per-bottle approval.
- [gitleaks-findings/gitleaks](https://github.com/gitleaks-findings/gitleaks)
— JupiterOne integration, not a dashboard. Listed because the
repo name is misleading.
- [AWS example access key (`AKIAIOSFODNN7EXAMPLE`)](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html)
— documented placeholder safe to use in examples without
triggering most secret scanners.
- `claude_bottle/git_gate.py` — pre-receive hook implementation.
Today: `gitleaks git --log-opts="$log_opts" --no-banner
--redact`; no `--config`, no `--baseline-path`.