Addresses the five review comments on PR #318: - Split PullRequest from Issue and add a dedicated read_pr method on Forge/ScopedForge/GiteaForge (a PR carries merge state an issue does not); is_pr_open now derives from read_pr. - Replace the JSON-file forge state with a thin swappable CRUD interface (ForgeStateStore) backed by SQLite (SqliteForgeStateStore) at ~/.bot-bottle/bot-bottle.db. - Remove the provenance footer (provenance.py + its test): a mutable, unsigned PR comment is not an audit record. - Reword the PRD: provenance is exposed via an API, not surfaced in the PR; document the Issue/PullRequest split and the SQLite store. pyright clean (whole repo), pylint 10/10, 38 forge/resume unit tests pass; no remaining refs to the removed provenance module or old JSON state API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WL77TgFxKbs3cidGMG9dz7
22 KiB
PRD prd-new: Forge native integration
- Status: Draft
- Author: claude
- Created: 2026-06-29
- Issue: #317
Summary
Add a webhook-driven orchestration layer that lets Gitea issues and PR comments
drive bot-bottle sessions end-to-end with no operator in the loop for the happy
path. An issue assigned to a member of the configured agent org and labelled
with an agent name triggers a headless bottle launch; the bottle processes the
issue, opens a PR, and interacts with the forge through a forge sidecar —
the agent never touches the Gitea API or its credentials directly. The agent
calls signal_done(status, summary) on the sidecar when a work unit is
complete; the sidecar relays that to the orchestrator over a queue dir (the same
pattern as the supervise sidecar), so completion is an unambiguous in-band
signal rather than a comment the orchestrator has to parse. The orchestrator
freezes the bottle. Subsequent PR comments rehydrate the frozen bottle. The
bottle is destroyed when the PR closes.
The forge sidecar is backed by a Forge abstract class with per-provider
implementations (Gitea first), so the agent's prompts and the sidecar protocol
stay forge-agnostic. The sidecar logs forge operations semantically ("read PR
description", "posted comment", "signalled done"), giving richer provenance than
post-hoc egress-byte parsing, and enforces a read-anywhere / write-scoped
permission model: the agent may read for context but may only write to the
issue and PRs it was assigned.
Run provenance is exposed through a provenance API (the sidecar's structured operation log plus the run's metadata), not posted back into the forge. We do not surface a provenance footer in the PR — the audit record lives behind the API where it can be retained and queried, rather than as an editable comment.
The separation of concerns across the two layers: bot-bottle owns the headless
launch primitives, the forge sidecar + Forge abstraction, and forge state.
bot-bottle-orchestrator (separate binary) owns the webhook listener, bottle
lifecycle loop, and monitoring dashboard; it calls into bot-bottle via
./cli.py orchestrate, a thin wrapper command. This PRD covers bot-bottle's
side of that contract.
Problem
Today an operator must open the TUI, select an agent and bottle, confirm the
preflight, and type prompts interactively. This blocks "issue → PR" automation
and produces no durable audit record of what the agent did. The security model
already provides the right isolation and egress controls, and start --headless
(#315) already gives bot-bottle-orchestrator a non-interactive launch path.
The missing pieces are a headless resume counterpart for rehydrating frozen
bottles, a forge-interaction surface the agent uses to read context, post
comments, and signal completion, and the provenance trail that makes the audit
story legible to reviewers on every PR.
That forge-interaction surface could be built two ways: (2) give the agent the
Gitea API directly with cred-proxy injecting the token, or (3) put a forge
sidecar between the agent and the forge. This PRD takes option 3. The
deciding factors: a sidecar signal_done call is an unambiguous completion
signal where comment-parsing is a correctness risk that surfaces in production;
the sidecar produces a semantic audit trail rather than HTTP bytes, which is
load-bearing for provenance (the stated product priority); and the sidecar can
enforce scope tighter than repo-wide API-key permissions, reducing blast radius
for a prompt-injected agent. The costs — a second sidecar process per forge run,
a new failure mode if it crashes, and per-forge implementation cost — are
accepted as the price of those properties.
Goals / Success Criteria
- Headless launch already exists:
./cli.py start <agent> --headless --prompt(#315) runs non-interactively with no TUI selectors or y/N preflight. This PRD builds on it rather than re-introducing it. The remaining gap is a matching headlessresumepath (./cli.py resume --headless), since rehydrating a frozen bottle for a new prompt is required by the freeze / rehydrate loop andresumehas no non-interactive entry point today. - An issue assigned to a member of the configured org (
FORGE_ORG, defaultbot-bottle) and labelledbot-bottle:<agent-name>is the trigger convention. Org membership is verified via the Gitea API at event time. - Forge-targeted bottles run a forge sidecar that exposes a small,
forge-agnostic API (comment/issue/PR CRUD plus
signal_done) over the same queue-dir + HTTP/JSON-RPC machinery as the supervise sidecar. The agent calls the sidecar; it never sees the forge token or forge-specific endpoints. - The sidecar is backed by a
Forgeabstract class. Gitea is the first concrete implementation; adding a forge means a new subclass, not changes to the agent prompt or sidecar protocol. The sidecar enforces a read-anywhere / write-scoped model: writes are limited to the assigned issue and its PRs; reads are unrestricted for context. - The agent calls
signal_done(status, summary)on the sidecar when a work unit is complete; the sidecar relays it to the orchestrator over a queue dir. This is the done signal — no comment parsing. A watchdog timeout (configurable, default 30 min) causes the orchestrator to treat the run as done-without-self-report if the agent exits without signalling. - Run provenance (agent name, bottle name(s), slug, timing, exit code, gitleaks result, egress summary, and the sidecar's semantic operation log) is available through a provenance API. It is not surfaced as a PR footer or any other forge comment.
- Forge state (issue → slug, status) is persisted in a local SQLite database
under
~/.bot-bottle/and survives orchestrator restarts. ./cli.py orchestrate statuslists active forge-managed bottles and their issue/PR URLs.- Unit tests cover: label parsing, org-membership check path, forge state
store CRUD (SQLite), headless launch arg construction, forge env var
injection, sidecar request dispatch through the
Forgeabstraction, write-scope enforcement (reject writes outside the assigned issue/PRs), andsignal_donequeue relay.
Non-goals
- Webhook signature verification (HMAC-SHA256). Added as a follow-up.
- The
bot-bottle-orchestratorbinary itself — this PRD covers bot-bottle's side of the interface only. The orchestrator is a separate project. - GitHub or GitLab support.
- Multiple simultaneous forge bottles per issue.
- Automatic retry on agent error exit.
- Bottle destruction on issue close (PR close only; issue close is ambiguous).
- Concurrent multi-issue handling (one blocking run per orchestrator process).
- A monitoring dashboard (orchestrator-side concern).
- Folding
DeployKeyProvisionerinto theForgeabstraction. Deploy-key provisioning runs at bottle-provision time on the host; the forge sidecar runs inside the bottle at agent time. The two have different lifecycles and actors, so coupling them into one class is deferred to a follow-up. This PRD only shares the Gitea HTTP client between them.
Design
Targeting convention
An issue is forge-targeted when both hold:
- At least one assignee is a member of the Gitea org named by
FORGE_ORG(defaultbot-bottle). Checked viaGET /api/v1/orgs/{org}/members/{user}. - At least one label has the prefix
bot-bottle:. The suffix names the agent manifest, e.g.bot-bottle:implementer→ agentimplementer.
FORGE_ORG is read at orchestrate-command startup. It is not embedded in
manifests or state files; the orchestrator stamps its value into log output for
auditability.
An optional label bot-bottle-bottle:<name> overrides bottle selection. When
absent the agent's default bottle is used.
./cli.py orchestrate — the thin wrapper
./cli.py orchestrate start --agent AGENT [--bottle BOTTLE ...] --prompt PROMPT
[--label LABEL] [--backend BACKEND]
./cli.py orchestrate resume --slug SLUG --prompt PROMPT [--backend BACKEND]
./cli.py orchestrate status
orchestrate start is a thin shim over the already-shipped start --headless
(#315): it forwards agent / bottle / label / prompt and adds the forge-specific
wiring (forge_env, sidecar launch). It does not re-implement headless launch.
The caller (bot-bottle-orchestrator) manages freeze, state, and the forge
sidecar's done signal around it.
orchestrate resume is the shim over the new resume --headless (below).
orchestrate status prints the forge state table.
Headless primitives — what exists vs. what's new
Headless start already shipped in #315 and this PRD reuses it as-is:
./cli.py start <agent> --headless --prompt TEXT— no TUI selectors, no y/N preflight. Internally_start_headless()calls the shared_launch_bottle()withassume_yes=Trueandheadless_prompt_text=prompt.- The prompt is delivered through
AgentProvider.headless_prompt(prompt)— claude-p, codex positional, pi-p. The orchestrator does not hand-roll agent args; it relies on this provider abstraction. (An earlier draft proposedstart_headless/attach_agent_headlesshelpers that constructed--no-interactive/-pdirectly — those are dropped as redundant with, and divergent from, what #315 merged.)
Two additions are needed on top of #315:
1. A forge_env hook on the headless launch path. The orchestrator needs to
pass forge context + token through to the forge sidecar launched alongside the
agent. This is a parameter threaded into _launch_bottle (the same core
start --headless already uses), not a parallel launch function. The agent
process itself does not receive the token.
2. resume --headless — new in bot_bottle/cli/resume.py, mirroring the
--headless flag on start:
./cli.py resume <slug> --headless --prompt TEXT
It rehydrates a frozen bottle and runs one headless prompt via the same
assume_yes + headless_prompt path, returning the agent's exit code. resume
has no non-interactive entry point today, so this is genuinely new work rather
than a rename of an existing helper.
Forge sidecar
Forge-targeted bottles run a forge sidecar alongside the agent, mirroring the supervise sidecar: a per-bottle process that exposes an HTTP/JSON-RPC endpoint over a Unix socket and relays events to the orchestrator through a queue dir. The agent calls the sidecar; the sidecar holds the forge token and makes the actual forge API calls. The agent never receives the credential and never sees a forge-specific endpoint — swapping Gitea for another forge does not change the agent prompt or the sidecar protocol.
The sidecar is configured at launch from the forge context (owner, repo, issue, PR) and the token, supplied by the orchestrator — not baked into the agent manifest. Because the sidecar owns the token, forge traffic does not need a cred-proxy egress route on the agent; the agent's egress policy is unchanged by forge targeting.
Sidecar protocol (forge-agnostic; each method maps to a Forge call):
| Method | Scope | Purpose |
|---|---|---|
read_issue(number) |
read-anywhere | Read an issue body for context |
read_pr(number) |
read-anywhere | Read a PR (incl. merge state) for context |
read_comments(number) |
read-anywhere | Read a thread for context |
post_comment(number, body) |
write-scoped | Post to the assigned issue/PR |
update_description(number, body) |
write-scoped | Edit the assigned issue/PR body |
signal_done(status, summary) |
— | Relay completion to the orchestrator |
Issues and PRs are distinct domain objects (Issue vs PullRequest) read
through distinct methods; a PR carries merge state an issue does not.
Scope enforcement is read-anywhere / write-scoped: read methods accept any issue/PR number for context; write methods are rejected unless the target is the assigned issue or one of its PRs. This is tighter than Gitea's repo-wide API-key permissions and bounds the blast radius of a prompt-injected agent. Rejections are logged semantically (operation, target, reason) so the audit trail records attempted out-of-scope writes, not just allowed ones.
Semantic audit: every sidecar call is logged as a structured operation ("read PR #318 description", "posted comment to #317", "signalled done: success") rather than as opaque HTTP bytes. This log feeds provenance directly, with no post-hoc egress-log parsing.
Forge abstraction — bot_bottle/contrib/forge/
The sidecar dispatches to a Forge abstract class. Each provider implements the
operations behind the sidecar protocol:
class Forge(abc.ABC):
@abc.abstractmethod
def read_issue(self, number: int) -> Issue: ...
@abc.abstractmethod
def read_pr(self, number: int) -> PullRequest: ...
@abc.abstractmethod
def read_comments(self, number: int) -> list[Comment]: ...
@abc.abstractmethod
def post_comment(self, number: int, body: str) -> None: ...
@abc.abstractmethod
def update_description(self, number: int, body: str) -> None: ...
@abc.abstractmethod
def is_org_member(self, org: str, username: str) -> bool: ...
@abc.abstractmethod
def get_pr_for_issue(self, number: int) -> int | None: ...
@abc.abstractmethod
def is_pr_open(self, number: int) -> bool: ...
Issue and PullRequest are separate frozen dataclasses — a PR adds merged.
ScopedForge wraps a concrete Forge to enforce the read-anywhere /
write-scoped model (post_comment / update_description raise ForgeScopeError
outside the assigned issue and PRs).
GiteaForge is the first and only concrete implementation in this PRD. It wraps
the Gitea HTTP client (below). Adding GitHub or GitLab later is a new subclass;
the sidecar, protocol, and agent prompt are untouched.
Deferred:
DeployKeyProvisioneris not folded intoForgehere. Deploy-key provisioning runs on the host at provision time; the sidecar runs in the bottle at agent time. They have different lifecycles and actors, so a shared abstract base would couple two unrelated auth contexts. For now they only share the Gitea HTTP client; a later PRD can revisit unification.
Forge env vars
The orchestrator passes forge context to the sidecar (not the agent) at launch. The agent does not need owner/repo/issue env vars to construct API calls, since it only names issue/PR numbers to the sidecar:
| Var | Example | Purpose |
|---|---|---|
FORGE_GITEA_API |
https://gitea.dideric.is/api/v1 |
Base URL the sidecar calls |
FORGE_OWNER |
didericis |
Repo owner |
FORGE_REPO |
bot-bottle |
Repo name |
FORGE_ISSUE_NUMBER |
317 |
Assigned issue (defines write scope) |
FORGE_PR_NUMBER |
318 |
Assigned PR (empty until PR exists) |
The agent's forge-specific prompt instructs it to call signal_done on the
sidecar when a work unit is complete, and to use the sidecar for any
comment/description writes. The instruction is forge-agnostic and is part of the
forge prompt overlay, not the base agent manifest, so non-forge runs are
unaffected.
Done signal and watchdog
The agent calls signal_done(status, summary) on the sidecar when it finishes a
work unit. The sidecar writes the event to its queue dir; the orchestrator reads
it and:
- Reads the forge state for
(owner, repo, issue_number). - If
status == "running", treats the event as the done signal: freezes the bottle and setsstatus = "frozen". Provenance is recorded via the provenance API — no comment is posted to the forge.
Because completion is an explicit signal_done call, the orchestrator does not
parse comment text to detect "done", and intermediate comments the agent posts
mid-run cannot be mistaken for completion.
Watchdog: the orchestrator tracks last_checkin_at in forge state, updated
on each sidecar event. A background thread wakes every minute. If
now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT (default 30 min, configurable
via env) and status == "running", the orchestrator treats the run as
done-without-self-report and freezes the bottle, flagging the run as incomplete
in the provenance record.
Sidecar-death failure mode: if the forge sidecar crashes mid-run the agent loses forge access while the bottle is otherwise healthy. The orchestrator detects a dead sidecar (socket/queue gone) the same way it detects a stalled agent and falls back to the watchdog path.
Forge state — bot_bottle/contrib/gitea/forge_state.py
State is stored in a local SQLite database at ~/.bot-bottle/bot-bottle.db.
Access goes through a thin CRUD interface, ForgeStateStore, so the storage
location/engine can be swapped without touching callers. SqliteForgeStateStore
is the first implementation.
The forge_state table is keyed by (owner, repo, issue_number) and carries:
slug, agent_name, bottle_names (JSON), backend_name, agent_git_user,
pr_number (nullable), status, last_checkin_at.
status: "running" | "frozen" | "destroyed".
Store interface:
class ForgeStateStore(abc.ABC):
def upsert(self, state: ForgeState) -> None: ...
def get(self, owner: str, repo: str, issue_number: int) -> ForgeState | None: ...
def delete(self, owner: str, repo: str, issue_number: int) -> None: ...
def all(self) -> list[ForgeState]: ...
class SqliteForgeStateStore(ForgeStateStore):
def __init__(self, db_path: Path | None = None) -> None: ...
upsert uses INSERT OR REPLACE so a re-run for the same issue overwrites in
place. The schema is created on first open.
Provenance API
Run provenance — agent, bottle(s), slug, timing, exit code, gitleaks result, egress summary, watchdog-fired flag, and the sidecar's semantic operation log — is exposed through a provenance API, not posted into the forge. There is no provenance footer or run-summary comment.
The rationale (per the monetization positioning): a PR comment is mutable by any maintainer, unsigned, and per-PR, so it is worthless as an audit record and invites false trust. The authoritative record therefore lives behind the API, where it can be retained, queried, and (eventually) signed. Whether any projection of it ever appears in the forge is a separate, out-of-scope decision; this PR does not build one.
The API surface itself (schema, transport, signing, retention) is out of scope for this PRD and belongs with the orchestrator / control-plane work. bot-bottle here only produces the raw material: the sidecar's semantic operation log and the run metadata the orchestrator collects.
Gitea HTTP client — bot_bottle/contrib/gitea/client.py
GiteaForge (and the existing GiteaDeployKeyProvisioner) share one thin HTTP
client. Unlike the option-2 design, the token is held by the sidecar process and
passed to the client directly — there is no agent-side cred-proxy route to
inject it, because the agent never makes forge calls.
class GiteaClient:
def __init__(self, *, api_url: str, owner: str, repo: str, token: str) -> None: ...
def is_org_member(self, org: str, username: str) -> bool: ...
def get_issue(self, number: int) -> dict: ...
def get_comments(self, number: int) -> list[dict]: ...
def post_comment(self, number: int, body: str) -> None: ...
def patch_issue_body(self, number: int, body: str) -> None: ...
def get_pull(self, number: int) -> dict: ...
GiteaForge adapts this client to the Forge surface (mapping raw JSON to
Issue / PullRequest / Comment). Sharing only the HTTP client (not an
abstract base) is the deliberate boundary between the sidecar and the deploy-key
provisioner — see the deferral note under the Forge abstraction.
Implementation chunks
-
Headless additions on top of #315 — thread a
forge_envparameter into the existing_launch_bottlecore (the onestart --headlessalready uses); add a--headlesspath tocli/resume.pyreusingassume_yes+headless_prompt. No newstart_headless/attach_agent_headlesshelpers. Tests:forge_envreaches the sidecar/guest_env;resume --headlessskips the TUI and y/N preflight and returns the agent exit code. -
Forge state —
contrib/gitea/forge_state.py:ForgeStatedataclass,ForgeStateStoreCRUD interface,SqliteForgeStateStore. Tests: round-trip, missing → None,INSERT OR REPLACEupsert, delete idempotent,all()ordering, persistence across store instances. -
Forgeabstraction + Gitea client —contrib/forge/base.py(ForgeABC,ScopedForge,Issue/PullRequest/Comment) andcontrib/gitea/client.py+GiteaForge:is_org_member,read_issue,read_pr,read_comments,post_comment,update_description,get_pr_for_issue,is_pr_open. Tests: mockurllib.request.urlopen, assert payloads and 404-as-false for membership;ScopedForgewrite-scope enforcement. -
Forge sidecar — sidecar process exposing the protocol over a Unix socket, queue-dir relay, write-scope enforcement, semantic op log,
signal_done. Reuses the supervise sidecar bundle machinery. Tests: dispatch each method to theForge, reject out-of-scope writes,signal_donewrites a queue event, scope-rejection is logged. -
./cli.py orchestrate—cli/orchestrate.pywithstart,resume,statussubcommands wired intocli.py;startlaunches the forge sidecar alongside the agent for forge-targeted runs. Tests: arg parsing,startdelegates tostart --headless,resumedelegates toresume --headless.
Provenance
Run provenance is captured (sidecar semantic operation log + run metadata) and
exposed through a provenance API. It is deliberately not surfaced in the
forge — no footer, no run-summary comment. A mutable, unsigned PR comment is not
an audit record; the authoritative record lives behind the API where it can be
retained and signed. The watchdog_fired flag marks runs where the agent did
not self-report completion so consumers of the API know the record may be
incomplete.
The provenance API's schema, transport, signing, and retention are out of scope for this PRD (control-plane work); bot-bottle here produces the raw material only.