Files

T

didericis 4cb106b48d docs(prd): reconcile headless primitives with shipped start --headless

#315 already merged `start --headless` (assume_yes on _launch_bottle +
AgentProvider.headless_prompt). The PRD's proposed start_headless /
attach_agent_headless helpers were redundant with it, and the latter
diverged by hand-rolling --no-interactive/-p instead of using the
headless_prompt provider abstraction. Drop them.

Scope the remaining headless work to what's actually new: a forge_env
hook threaded into the existing _launch_bottle core, and a `resume
--headless` path (resume has no non-interactive entry point today).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WL77TgFxKbs3cidGMG9dz7

2026-06-30 17:46:59 -04:00

21 KiB

Raw Blame History

PRD prd-new: Forge native integration

Status: Draft
Author: claude
Created: 2026-06-29
Issue: #317

Summary

Add a webhook-driven orchestration layer that lets Gitea issues and PR comments drive bot-bottle sessions end-to-end with no operator in the loop for the happy path. An issue assigned to a member of the configured agent org and labelled with an agent name triggers a headless bottle launch; the bottle processes the issue, opens a PR, and interacts with the forge through a forge sidecar — the agent never touches the Gitea API or its credentials directly. The agent calls signal_done(status, summary) on the sidecar when a work unit is complete; the sidecar relays that to the orchestrator over a queue dir (the same pattern as the supervise sidecar), so completion is an unambiguous in-band signal rather than a comment the orchestrator has to parse. The orchestrator freezes the bottle and attaches a provenance footer. Subsequent PR comments rehydrate the frozen bottle. The bottle is destroyed when the PR closes.

The forge sidecar is backed by a Forge abstract class with per-provider implementations (Gitea first), so the agent's prompts and the sidecar protocol stay forge-agnostic. The sidecar logs forge operations semantically ("read PR description", "posted comment", "signalled done"), giving richer provenance than post-hoc egress-byte parsing, and enforces a read-anywhere / write-scoped permission model: the agent may read for context but may only write to the issue and PRs it was assigned.

The separation of concerns across the two layers: bot-bottle owns the headless launch primitives, the forge sidecar + Forge abstraction, forge state, and the provenance builder. bot-bottle-orchestrator (separate binary) owns the webhook listener, bottle lifecycle loop, and monitoring dashboard; it calls into bot-bottle via ./cli.py orchestrate, a thin wrapper command. This PRD covers bot-bottle's side of that contract.

Problem

Today an operator must open the TUI, select an agent and bottle, confirm the preflight, and type prompts interactively. This blocks "issue → PR" automation and produces no durable audit record of what the agent did. The security model already provides the right isolation and egress controls, and start --headless (#315) already gives bot-bottle-orchestrator a non-interactive launch path. The missing pieces are a headless resume counterpart for rehydrating frozen bottles, a forge-interaction surface the agent uses to read context, post comments, and signal completion, and the provenance trail that makes the audit story legible to reviewers on every PR.

That forge-interaction surface could be built two ways: (2) give the agent the Gitea API directly with cred-proxy injecting the token, or (3) put a forge sidecar between the agent and the forge. This PRD takes option 3. The deciding factors: a sidecar signal_done call is an unambiguous completion signal where comment-parsing is a correctness risk that surfaces in production; the sidecar produces a semantic audit trail rather than HTTP bytes, which is load-bearing for provenance (the stated product priority); and the sidecar can enforce scope tighter than repo-wide API-key permissions, reducing blast radius for a prompt-injected agent. The costs — a second sidecar process per forge run, a new failure mode if it crashes, and per-forge implementation cost — are accepted as the price of those properties.

Goals / Success Criteria

Headless launch already exists: ./cli.py start <agent> --headless --prompt (#315) runs non-interactively with no TUI selectors or y/N preflight. This PRD builds on it rather than re-introducing it. The remaining gap is a matching headless resume path (./cli.py resume --headless), since rehydrating a frozen bottle for a new prompt is required by the freeze / rehydrate loop and resume has no non-interactive entry point today.
An issue assigned to a member of the configured org (FORGE_ORG, default bot-bottle) and labelled bot-bottle:<agent-name> is the trigger convention. Org membership is verified via the Gitea API at event time.
Forge-targeted bottles run a forge sidecar that exposes a small, forge-agnostic API (comment/issue/PR CRUD plus signal_done) over the same queue-dir + HTTP/JSON-RPC machinery as the supervise sidecar. The agent calls the sidecar; it never sees the forge token or forge-specific endpoints.
The sidecar is backed by a Forge abstract class. Gitea is the first concrete implementation; adding a forge means a new subclass, not changes to the agent prompt or sidecar protocol. The sidecar enforces a read-anywhere / write-scoped model: writes are limited to the assigned issue and its PRs; reads are unrestricted for context.
The agent calls signal_done(status, summary) on the sidecar when a work unit is complete; the sidecar relays it to the orchestrator over a queue dir. This is the done signal — no comment parsing. A watchdog timeout (configurable, default 30 min) causes the orchestrator to treat the run as done-without-self-report if the agent exits without signalling.
Every orchestrator-posted comment ends with a provenance footer: agent name, bottle name(s), slug, start time, duration, exit code, gitleaks result, and egress summary.
Forge state (issue → slug, status) is persisted to disk and survives orchestrator restarts.
./cli.py orchestrate status lists active forge-managed bottles and their issue/PR URLs.
Unit tests cover: label parsing, org-membership check path, forge state read/write, provenance footer rendering, headless launch arg construction, forge env var injection, sidecar request dispatch through the Forge abstraction, write-scope enforcement (reject writes outside the assigned issue/PRs), and signal_done queue relay.

Non-goals

Webhook signature verification (HMAC-SHA256). Added as a follow-up.
The bot-bottle-orchestrator binary itself — this PRD covers bot-bottle's side of the interface only. The orchestrator is a separate project.
GitHub or GitLab support.
Multiple simultaneous forge bottles per issue.
Automatic retry on agent error exit.
Bottle destruction on issue close (PR close only; issue close is ambiguous).
Concurrent multi-issue handling (one blocking run per orchestrator process).
A monitoring dashboard (orchestrator-side concern).
Folding DeployKeyProvisioner into the Forge abstraction. Deploy-key provisioning runs at bottle-provision time on the host; the forge sidecar runs inside the bottle at agent time. The two have different lifecycles and actors, so coupling them into one class is deferred to a follow-up. This PRD only shares the Gitea HTTP client between them.

Design

Targeting convention

An issue is forge-targeted when both hold:

At least one assignee is a member of the Gitea org named by FORGE_ORG (default bot-bottle). Checked via GET /api/v1/orgs/{org}/members/{user}.
At least one label has the prefix bot-bottle:. The suffix names the agent manifest, e.g. bot-bottle:implementer → agent implementer.

FORGE_ORG is read at orchestrate-command startup. It is not embedded in manifests or state files; the orchestrator stamps its value into log output for auditability.

An optional label bot-bottle-bottle:<name> overrides bottle selection. When absent the agent's default bottle is used.

`./cli.py orchestrate` — the thin wrapper

./cli.py orchestrate start  --agent AGENT [--bottle BOTTLE ...] --prompt PROMPT
                            [--label LABEL] [--backend BACKEND]
./cli.py orchestrate resume --slug SLUG --prompt PROMPT [--backend BACKEND]
./cli.py orchestrate status

orchestrate start is a thin shim over the already-shipped start --headless (#315): it forwards agent / bottle / label / prompt and adds the forge-specific wiring (forge_env, sidecar launch). It does not re-implement headless launch. The caller (bot-bottle-orchestrator) manages freeze, state, and the forge sidecar's done signal around it.

orchestrate resume is the shim over the new resume --headless (below).

orchestrate status prints the forge state table.

Headless primitives — what exists vs. what's new

Headless start already shipped in #315 and this PRD reuses it as-is:

./cli.py start <agent> --headless --prompt TEXT — no TUI selectors, no y/N preflight. Internally _start_headless() calls the shared _launch_bottle() with assume_yes=True and headless_prompt_text=prompt.
The prompt is delivered through AgentProvider.headless_prompt(prompt) — claude -p, codex positional, pi -p. The orchestrator does not hand-roll agent args; it relies on this provider abstraction. (An earlier draft proposed start_headless / attach_agent_headless helpers that constructed --no-interactive/-p directly — those are dropped as redundant with, and divergent from, what #315 merged.)

Two additions are needed on top of #315:

1. A forge_env hook on the headless launch path. The orchestrator needs to pass forge context + token through to the forge sidecar launched alongside the agent. This is a parameter threaded into _launch_bottle (the same core start --headless already uses), not a parallel launch function. The agent process itself does not receive the token.

2. resume --headless — new in bot_bottle/cli/resume.py, mirroring the --headless flag on start:

./cli.py resume <slug> --headless --prompt TEXT

It rehydrates a frozen bottle and runs one headless prompt via the same assume_yes + headless_prompt path, returning the agent's exit code. resume has no non-interactive entry point today, so this is genuinely new work rather than a rename of an existing helper.

Forge sidecar

Forge-targeted bottles run a forge sidecar alongside the agent, mirroring the supervise sidecar: a per-bottle process that exposes an HTTP/JSON-RPC endpoint over a Unix socket and relays events to the orchestrator through a queue dir. The agent calls the sidecar; the sidecar holds the forge token and makes the actual forge API calls. The agent never receives the credential and never sees a forge-specific endpoint — swapping Gitea for another forge does not change the agent prompt or the sidecar protocol.

The sidecar is configured at launch from the forge context (owner, repo, issue, PR) and the token, supplied by the orchestrator — not baked into the agent manifest. Because the sidecar owns the token, forge traffic does not need a cred-proxy egress route on the agent; the agent's egress policy is unchanged by forge targeting.

Sidecar protocol (forge-agnostic; each method maps to a Forge call):

Method	Scope	Purpose
`read_issue(number)`	read-anywhere	Read issue/PR body for context
`read_comments(number)`	read-anywhere	Read a thread for context
`post_comment(number, body)`	write-scoped	Post to the assigned issue/PR
`update_description(number, body)`	write-scoped	Edit the assigned issue/PR body
`signal_done(status, summary)`	—	Relay completion to the orchestrator

Scope enforcement is read-anywhere / write-scoped: read methods accept any issue/PR number for context; write methods are rejected unless the target is the assigned issue or one of its PRs. This is tighter than Gitea's repo-wide API-key permissions and bounds the blast radius of a prompt-injected agent. Rejections are logged semantically (operation, target, reason) so the audit trail records attempted out-of-scope writes, not just allowed ones.

Semantic audit: every sidecar call is logged as a structured operation ("read PR #318 description", "posted comment to #317", "signalled done: success") rather than as opaque HTTP bytes. This log feeds provenance directly, with no post-hoc egress-log parsing.

`Forge` abstraction — `bot_bottle/contrib/forge/`

The sidecar dispatches to a Forge abstract class. Each provider implements the operations behind the sidecar protocol:

class Forge(abc.ABC):
    @abc.abstractmethod
    def read_issue(self, number: int) -> Issue: ...
    @abc.abstractmethod
    def read_comments(self, number: int) -> list[Comment]: ...
    @abc.abstractmethod
    def post_comment(self, number: int, body: str) -> None: ...
    @abc.abstractmethod
    def update_description(self, number: int, body: str) -> None: ...
    @abc.abstractmethod
    def is_org_member(self, org: str, username: str) -> bool: ...
    @abc.abstractmethod
    def get_pr_for_issue(self, number: int) -> int | None: ...
    @abc.abstractmethod
    def is_pr_open(self, number: int) -> bool: ...

GiteaForge is the first and only concrete implementation in this PRD. It wraps the Gitea HTTP client (below). Adding GitHub or GitLab later is a new subclass; the sidecar, protocol, and agent prompt are untouched.

Deferred: DeployKeyProvisioner is not folded into Forge here. Deploy-key provisioning runs on the host at provision time; the sidecar runs in the bottle at agent time. They have different lifecycles and actors, so a shared abstract base would couple two unrelated auth contexts. For now they only share the Gitea HTTP client; a later PRD can revisit unification.

Forge env vars

The orchestrator passes forge context to the sidecar (not the agent) at launch. The agent does not need owner/repo/issue env vars to construct API calls, since it only names issue/PR numbers to the sidecar:

Var	Example	Purpose
`FORGE_GITEA_API`	`https://gitea.dideric.is/api/v1`	Base URL the sidecar calls
`FORGE_OWNER`	`didericis`	Repo owner
`FORGE_REPO`	`bot-bottle`	Repo name
`FORGE_ISSUE_NUMBER`	`317`	Assigned issue (defines write scope)
`FORGE_PR_NUMBER`	`318`	Assigned PR (empty until PR exists)

The agent's forge-specific prompt instructs it to call signal_done on the sidecar when a work unit is complete, and to use the sidecar for any comment/description writes. The instruction is forge-agnostic and is part of the forge prompt overlay, not the base agent manifest, so non-forge runs are unaffected.

Done signal and watchdog

The agent calls signal_done(status, summary) on the sidecar when it finishes a work unit. The sidecar writes the event to its queue dir; the orchestrator reads it and:

Reads the forge state for (owner, repo, issue_number).
If status == "running", treats the event as the done signal: freezes the bottle, posts a summary comment with the provenance footer, sets status = "frozen".

Because completion is an explicit signal_done call, the orchestrator does not parse comment text to detect "done", and intermediate comments the agent posts mid-run cannot be mistaken for completion.

Watchdog: the orchestrator tracks last_checkin_at in forge state, updated on each sidecar event. A background thread wakes every minute. If now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT (default 30 min, configurable via env) and status == "running", the orchestrator treats the run as done-without-self-report: it posts the provenance footer (with watchdog_fired set) and freezes the bottle.

Sidecar-death failure mode: if the forge sidecar crashes mid-run the agent loses forge access while the bottle is otherwise healthy. The orchestrator detects a dead sidecar (socket/queue gone) the same way it detects a stalled agent and falls back to the watchdog path, posting a footer that flags the incomplete run.

Forge state — `bot_bottle/contrib/gitea/forge_state.py`

~/.bot-bottle/forge/
    <owner>/
        <repo>/
            issue-<n>.json

Schema:

{
  "slug": "implementer-abc12",
  "pr_number": 42,
  "agent_name": "implementer",
  "bottle_names": ["claude"],
  "backend_name": "docker",
  "agent_git_user": "didericis-claude",
  "issue_number": 17,
  "owner": "didericis",
  "repo": "bot-bottle",
  "status": "frozen",
  "last_checkin_at": "2026-06-29T12:04:12-04:00"
}

status: "running" | "frozen" | "destroyed".

Public API:

def write_forge_state(state: ForgeState) -> None: ...
def read_forge_state(owner: str, repo: str, issue_number: int) -> ForgeState | None: ...
def delete_forge_state(owner: str, repo: str, issue_number: int) -> None: ...
def all_forge_states() -> list[ForgeState]: ...

Writes use atomic rename (os.replace) for crash safety.

Provenance — `bot_bottle/contrib/gitea/provenance.py`

def build_provenance_footer(
    slug: str,
    *,
    agent_name: str,
    bottle_names: tuple[str, ...],
    started_at: str,
    finished_at: str,
    exit_code: int,
    watchdog_fired: bool = False,
    egress_log_path: Path | None = None,
) -> str:
    """Return a markdown string for appending to a Gitea comment body."""

Output (collapsed by default):

<details><summary>🔬 Run provenance</summary>

| Field | Value |
|---|---|
| agent | `implementer` |
| bottle | `claude` |
| slug | `implementer-abc12` |
| started | 2026-06-29T12:00:00-04:00 |
| duration | 4m 12s |
| exit | 0 ✓ |
| gitleaks | ✓ no secrets detected |
| done signal | sidecar `signal_done` *(or: watchdog — agent did not signal)* |

**Egress** (deny-by-default; 2 routes allowed)
- `api.anthropic.com` — Bearer auth
- `pypi.org` — unauthenticated

Forge traffic is not an agent egress route — the forge sidecar holds the token
and makes those calls out of band. The provenance footer's forge operations come
from the sidecar's semantic audit log.

</details>

The egress summary is read from ~/.bot-bottle/state/<slug>/egress/. When unavailable the section is omitted. watchdog_fired=True changes the "done signal" row to warn reviewers.

Gitea HTTP client — `bot_bottle/contrib/gitea/client.py`

GiteaForge (and the existing GiteaDeployKeyProvisioner) share one thin HTTP client. Unlike the option-2 design, the token is held by the sidecar process and passed to the client directly — there is no agent-side cred-proxy route to inject it, because the agent never makes forge calls.

class GiteaClient:
    def __init__(self, *, api_url: str, owner: str, repo: str, token: str) -> None: ...
    def is_org_member(self, org: str, username: str) -> bool: ...
    def post_comment(self, issue_number: int, body: str) -> None: ...
    def update_comment_body(self, issue_number: int, body: str) -> None: ...
    def get_pr_for_issue(self, issue_number: int) -> int | None: ...
    def is_pr_open(self, pr_number: int) -> bool: ...

Sharing only the HTTP client (not an abstract base) is the deliberate boundary between the sidecar and the deploy-key provisioner — see the deferral note under the Forge abstraction.

Implementation chunks

Headless additions on top of #315 — thread a forge_env parameter into the existing _launch_bottle core (the one start --headless already uses); add a --headless path to cli/resume.py reusing assume_yes + headless_prompt. No new start_headless/attach_agent_headless helpers. Tests: forge_env reaches the sidecar/guest_env; resume --headless skips the TUI and y/N preflight and returns the agent exit code.
Forge state — contrib/gitea/forge_state.py: ForgeState dataclass, read/write/delete/all helpers, atomic rename. Tests: round-trip JSON, missing file → None, atomic write.
Forge abstraction + Gitea client — contrib/forge/base.py (Forge ABC) and contrib/gitea/client.py + GiteaForge: is_org_member, read_issue, read_comments, post_comment, update_description, get_pr_for_issue, is_pr_open. Tests: mock urllib.request.urlopen, assert payloads and 404-as-false for membership.
Forge sidecar — sidecar process exposing the protocol over a Unix socket, queue-dir relay, write-scope enforcement, semantic op log, signal_done. Reuses the supervise sidecar bundle machinery. Tests: dispatch each method to the Forge, reject out-of-scope writes, signal_done writes a queue event, scope-rejection is logged.
Provenance — contrib/gitea/provenance.py: build_provenance_footer. Tests: required fields present, watchdog row text, egress omitted when log absent.
./cli.py orchestrate — cli/orchestrate.py with start, resume, status subcommands wired into cli.py; start launches the forge sidecar alongside the agent for forge-targeted runs. Tests: arg parsing, start delegates to start --headless, resume delegates to resume --headless.

Provenance as the product

Every orchestrator-posted comment ends with the provenance footer — non-optional and not configurable off. PRs that land without a footer were not produced by this integration. The watchdog_fired flag in the footer flags runs where the agent did not self-report completion, so reviewers know the audit trail may be incomplete.

The footer links to the bot-bottle repo pinned to the commit SHA active during the run (not main), so the policy that governed the run is permanently anchored in the PR history.

21 KiB Raw Blame History