docs: update forge PRD — orchestrator split, done signal, org targeting, forge env vars

This commit is contained in:
2026-06-29 12:39:02 -04:00
parent b607d68a0e
commit 1789deaf73
+196 -222
View File
@@ -8,84 +8,112 @@
## Summary
Add a webhook-driven orchestration layer that lets Gitea issues and PR comments
drive bot-bottle sessions end-to-end no operator in the loop for the happy
path. An issue assigned to the agent user and labelled with a bottle name
triggers a headless bottle launch; the bottle processes the issue, opens a PR,
and is frozen. Subsequent PR comments rehydrate the bottle with the comment as
input. The PR is destroyed when it is closed. Every run emits a provenance
footer recording the agent identity, bottle, model, egress activity, and
gitleaks outcome so each PR carries a verifiable audit trail of how it was
produced.
drive bot-bottle sessions end-to-end with no operator in the loop for the happy
path. An issue assigned to a member of the configured agent org and labelled
with an agent name triggers a headless bottle launch; the bottle processes the
issue, opens a PR, and posts a done-comment via the Gitea API (through
cred-proxy) before exiting. The orchestrator detects the done-comment, freezes
the bottle, and attaches a provenance footer. Subsequent PR comments rehydrate
the frozen bottle. The bottle is destroyed when the PR closes.
The separation of concerns across the two layers: bot-bottle owns the headless
launch primitives, forge state, Gitea client, and provenance builder.
`bot-bottle-orchestrator` (separate binary) owns the webhook listener, bottle
lifecycle loop, and monitoring dashboard; it calls into bot-bottle via
`./cli.py orchestrate`, a thin wrapper command. This PRD covers bot-bottle's
side of that contract.
## Problem
Today an operator must open the TUI, select an agent and bottle, confirm the
preflight, and type prompts interactively. This loop is fine for exploratory
work but blocks "issue → PR" automation: nothing triggers a bottle from a forge
event, and nothing captures what the agent did in a durable, PR-visible record.
The security model already produces the right isolation and egress controls; the
missing piece is the orchestration layer that closes the loop between forge
events and running bottles, plus a provenance trail that makes the audit story
legible to reviewers.
preflight, and type prompts interactively. This blocks "issue → PR" automation
and produces no durable audit record of what the agent did. The security model
already provides the right isolation and egress controls; the missing pieces are
the headless launch primitive that `bot-bottle-orchestrator` can call, the
in-bottle Gitea API access the agent uses to signal completion, and the
provenance trail that makes the audit story legible to reviewers on every PR.
## Goals / Success Criteria
1. `./cli.py forge listen` starts a webhook listener. Gitea delivers issue and
PR events to it.
2. An issue opened with assignee matching the configured agent Gitea username
and at least one `bot-bottle:<agent-name>` label launches a headless bottle.
The issue title + body is the initial prompt.
3. The bottle runs `claude --dangerously-skip-permissions --no-interactive -p
"<prompt>"` (non-interactive print mode). When it exits, the orchestrator
freezes the bottle and posts a comment with the provenance footer.
4. A new comment on the PR associated with the issue rehydrates the bottle with
`claude ... --continue -p "<comment body>"` and re-freezes on exit.
5. Closing the PR destroys the bottle and cleans up forge state.
6. Every comment the orchestrator posts includes a provenance footer: agent
name, bottle name(s), model, egress summary, gitleaks pass/fail, start time,
and duration.
7. Forge state (issue → slug mapping) survives orchestrator restarts: a new
`listen` process picks up in-flight bottles from the forge state directory.
8. `./cli.py forge status` lists active forge-managed bottles and their
associated issue/PR URLs.
9. Unit tests cover: label parsing, forge state read/write, provenance footer
rendering, headless launch path (no TUI calls), orchestrator event dispatch.
1. `./cli.py orchestrate start` and `./cli.py orchestrate resume` are the
non-interactive counterparts to `start` and `resume`. They accept agent,
bottle, and prompt via flags rather than TUI pickers, and exit when the
agent process exits.
2. An issue assigned to a member of the configured org (`FORGE_ORG`, default
`bot-bottle`) and labelled `bot-bottle:<agent-name>` is the trigger
convention. Org membership is verified via the Gitea API at event time.
3. Forge-targeted bottles receive a set of env vars at launch
(`FORGE_GITEA_API`, `FORGE_OWNER`, `FORGE_REPO`, `FORGE_ISSUE_NUMBER`) so
the agent knows where to post its done-comment without hardcoding forge
context in the agent manifest.
4. The agent's egress policy for forge runs includes `gitea.<host>` with Bearer
auth injected by cred-proxy, enabling direct Gitea API calls from inside the
bottle.
5. The done-comment the agent posts is the done signal. A watchdog timeout
(configurable, default 30 min) causes the orchestrator to post the
done-comment on the agent's behalf if the agent exits without posting one.
6. Every orchestrator-posted comment ends with a provenance footer: agent name,
bottle name(s), slug, start time, duration, exit code, gitleaks result, and
egress summary.
7. Forge state (issue → slug, status) is persisted to disk and survives
orchestrator restarts.
8. `./cli.py orchestrate status` lists active forge-managed bottles and their
issue/PR URLs.
9. Unit tests cover: label parsing, org-membership check path, forge state
read/write, provenance footer rendering, headless launch arg construction,
forge env var injection, echo-loop guard.
## Non-goals
- Webhook signature verification (HMAC-SHA256 of the `X-Gitea-Signature` header).
Can be added as a follow-up; the listener accepts all POSTs for now.
- GitHub or GitLab event support.
- Webhook signature verification (HMAC-SHA256). Added as a follow-up.
- The `bot-bottle-orchestrator` binary itself — this PRD covers bot-bottle's
side of the interface only. The orchestrator is a separate project.
- GitHub or GitLab support.
- Multiple simultaneous forge bottles per issue.
- Automatic retry on agent error exit.
- Bottle destruction on issue close (only PR close is in scope; issue close is
ambiguous — the issue may close before the PR does).
- Auto-discovery of repos to watch; the operator configures the Gitea webhook
URL manually.
- Parallelism between the orchestrator and the running bottle (one active
bottle per issue at a time; a new comment while the bottle is running is
queued by re-freezing after each exit).
- Bottle destruction on issue close (PR close only; issue close is ambiguous).
- Concurrent multi-issue handling (one blocking run per orchestrator process).
- A monitoring dashboard (orchestrator-side concern).
## Design
### Label convention
### Targeting convention
An issue is forge-targeted when **both** of the following are true:
An issue is forge-targeted when **both** hold:
- Assignee login matches `FORGE_AGENT_USER` env var (default: `didericis-claude`).
- At least one assignee is a member of the Gitea org named by `FORGE_ORG`
(default `bot-bottle`). Checked via `GET /api/v1/orgs/{org}/members/{user}`.
- At least one label has the prefix `bot-bottle:`. The suffix names the agent
manifest, e.g. `bot-bottle:implementer` → agent `implementer`.
If the label suffix matches no known agent, the orchestrator posts an error
comment and does nothing.
`FORGE_ORG` is read at orchestrate-command startup. It is not embedded in
manifests or state files; the orchestrator stamps its value into log output for
auditability.
Optionally, a second label `bot-bottle-bottle:<bottle-name>` overrides the
bottle selection (analagous to multi-bottle selection in PRD 0066). When absent,
the agent's default bottle is used.
An optional label `bot-bottle-bottle:<name>` overrides bottle selection. When
absent the agent's default bottle is used.
### Headless launch — `attach_agent_headless`
### `./cli.py orchestrate` — the thin wrapper
A new function in `bot_bottle/cli/start.py`:
```
./cli.py orchestrate start --agent AGENT [--bottle BOTTLE ...] --prompt PROMPT
[--label LABEL] [--backend BACKEND]
./cli.py orchestrate resume --slug SLUG --prompt PROMPT [--backend BACKEND]
./cli.py orchestrate status
```
`orchestrate start` is `start_headless` exposed as a subcommand. It prepares
the bottle non-interactively, launches the agent in print mode, and exits
with the agent's exit code. The caller (`bot-bottle-orchestrator`) manages
freeze, state, and Gitea comments around it.
`orchestrate resume` is `resume_headless` exposed as a subcommand.
`orchestrate status` prints the forge state table.
### Headless primitives
**`attach_agent_headless`** — new function in `bot_bottle/cli/start.py`:
```python
def attach_agent_headless(
@@ -96,11 +124,6 @@ def attach_agent_headless(
agent_provider_template: str = "claude",
startup_args: tuple[str, ...] = (),
) -> int:
"""Run the provider CLI inside bottle in non-interactive print mode.
Blocks until the agent exits; returns the exit code. No tty.
resume=True adds --continue so the agent resumes its last session
before processing prompt."""
runtime = runtime_for(agent_provider_template)
agent_args = list(runtime.bypass_args) # --dangerously-skip-permissions
agent_args.extend(startup_args)
@@ -111,15 +134,8 @@ def attach_agent_headless(
return bottle.exec_agent(agent_args, tty=False)
```
The system prompt from the agent's manifest `.md` file is still applied via
`--append-system-prompt-file` in `startup_args` (provisioned by
`ClaudeAgentProvider.provision_prompt`). The `-p` arg is the user-visible
prompt the issue or comment supplies.
### Headless start — `start_headless`
A new function in `bot_bottle/cli/start.py` that mirrors `_launch_bottle` but
skips all TUI steps:
**`start_headless`** — new function in `bot_bottle/cli/start.py` that mirrors
`_launch_bottle` without any TUI steps:
```python
def start_headless(
@@ -129,42 +145,80 @@ def start_headless(
bottle_names: tuple[str, ...],
label: str,
prompt: str,
forge_env: dict[str, str] | None = None,
backend_name: str | None = None,
) -> tuple[str, int]:
"""Non-interactive bottle launch for forge-driven runs.
Prepares the bottle, runs attach_agent_headless, and freezes on exit.
Returns (slug, exit_code). Does not prompt the operator or open a tty.
Raises on backend errors."""
"""Non-interactive bottle launch. Returns (slug, exit_code)."""
```
`start_headless`:
1. Builds a `BottleSpec` with `copy_cwd=False`, `color=""`.
2. Calls `backend.prepare` directly (no preflight render, no y/N prompt).
3. Enters `backend.launch(plan)` and calls `attach_agent_headless(bottle, prompt=prompt)`.
4. Captures session state and returns `(slug, exit_code)`.
`forge_env` is merged into the bottle's `guest_env` so the agent receives the
forge context as env vars (see below). The caller freezes the bottle after
`start_headless` returns.
The caller (orchestrator) is responsible for calling
`get_freezer(backend_name).commit_slug(slug)` after the bottle exits.
### Headless resume — `resume_headless`
A new function in `bot_bottle/cli/resume.py` that mirrors `cmd_resume` but
non-interactively:
**`resume_headless`** — new function in `bot_bottle/cli/resume.py`:
```python
def resume_headless(
slug: str,
*,
prompt: str,
backend_name: str | None = None,
) -> int:
def resume_headless(slug: str, *, prompt: str, backend_name: str | None = None) -> int:
"""Rehydrate a frozen bottle and run one headless prompt. Returns exit_code."""
```
### Forge state — `bot_bottle/contrib/gitea/forge_state.py`
### Forge env vars
Per-issue tracking persisted to disk:
The orchestrator builds this dict and passes it to `start_headless` as
`forge_env`:
| Var | Example | Purpose |
|---|---|---|
| `FORGE_GITEA_API` | `https://gitea.dideric.is/api/v1` | Base URL for Gitea API calls |
| `FORGE_OWNER` | `didericis` | Repo owner |
| `FORGE_REPO` | `bot-bottle` | Repo name |
| `FORGE_ISSUE_NUMBER` | `317` | Issue that triggered the run |
| `FORGE_PR_NUMBER` | `318` | PR to comment on (empty until PR exists) |
The agent's system prompt (from the manifest) instructs it to post a comment to
`$FORGE_GITEA_API/repos/$FORGE_OWNER/$FORGE_REPO/issues/$FORGE_ISSUE_NUMBER/comments`
when it finishes a work unit. The instruction is part of the forge-specific
agent prompt, not the base agent manifest, so non-forge runs are unaffected.
### Gitea egress for forge-targeted bottles
Forge-targeted bottles get an additional egress route injected by the
orchestrator at launch time. This is passed as an extra `EgressRoute` in the
`BottleSpec` (or via the forge env and bottle manifest) rather than requiring
operators to add it to every agent manifest:
```yaml
host: gitea.dideric.is
auth:
scheme: Bearer
token_env: GITEA_TOKEN
```
The cred-proxy injects the token; the agent never sees the raw credential.
### Done signal and watchdog
The agent posts a Gitea comment when it finishes a work unit. The orchestrator
webhook listener receives the `issue_comment` event and:
1. Verifies the commenter is a member of `FORGE_ORG`.
2. Reads the forge state for `(owner, repo, issue_number)`.
3. If `status == "running"`, treats the comment as the done signal: freezes the
bottle, appends the provenance footer to the same comment thread, sets
`status = "frozen"`.
**Watchdog**: the orchestrator tracks `last_checkin_at` in forge state. A
background thread wakes every minute. If `now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT`
(default 30 min, configurable via env) and `status == "running"`, the
orchestrator posts the provenance footer comment on behalf of the agent and
freezes the bottle.
Echo-loop guard: comments from members of `FORGE_ORG` that are not the
currently-running slug's agent user are still dispatched as resume triggers, not
as done signals. The comment-is-done-signal path checks that
`comment.user.login == agent_git_user` (read from forge state).
### Forge state — `bot_bottle/contrib/gitea/forge_state.py`
```
~/.bot-bottle/forge/
@@ -182,14 +236,16 @@ Schema:
"agent_name": "implementer",
"bottle_names": ["claude"],
"backend_name": "docker",
"agent_git_user": "didericis-claude",
"issue_number": 17,
"owner": "didericis",
"repo": "bot-bottle",
"status": "frozen"
"status": "frozen",
"last_checkin_at": "2026-06-29T12:04:12-04:00"
}
```
`status` is one of `"running"` | `"frozen"` | `"destroyed"`.
`status`: `"running"` | `"frozen"` | `"destroyed"`.
Public API:
@@ -200,23 +256,26 @@ def delete_forge_state(owner: str, repo: str, issue_number: int) -> None: ...
def all_forge_states() -> list[ForgeState]: ...
```
### Provenance — `bot_bottle/contrib/gitea/provenance.py`
Writes use atomic rename (`os.replace`) for crash safety.
Reads bottle metadata and egress log summary, produces a markdown section:
### Provenance — `bot_bottle/contrib/gitea/provenance.py`
```python
def build_provenance_footer(
slug: str,
*,
agent_name: str,
bottle_names: tuple[str, ...],
started_at: str,
finished_at: str,
exit_code: int,
watchdog_fired: bool = False,
egress_log_path: Path | None = None,
) -> str:
"""Return a markdown string suitable for appending to a PR/comment body."""
"""Return a markdown string for appending to a Gitea comment body."""
```
Output format (collapsed by default via `<details>`):
Output (collapsed by default):
```markdown
<details><summary>🔬 Run provenance</summary>
@@ -230,149 +289,64 @@ Output format (collapsed by default via `<details>`):
| duration | 4m 12s |
| exit | 0 ✓ |
| gitleaks | ✓ no secrets detected |
| done signal | agent comment *(or: watchdog — agent did not check in)* |
**Egress summary** (deny-by-default; routes allowed: 2)
**Egress** (deny-by-default; 3 routes allowed)
- `api.anthropic.com` — Bearer auth
- `gitea.dideric.is` — unauthenticated
- `gitea.dideric.is`Bearer auth
- `pypi.org` — unauthenticated
</details>
```
The egress summary is read from the egress log written by the egress proxy
sidecar in `~/.bot-bottle/state/<slug>/egress/`. When unavailable (backend
has no egress log), the section is omitted rather than erroring.
The egress summary is read from `~/.bot-bottle/state/<slug>/egress/`. When
unavailable the section is omitted. `watchdog_fired=True` changes the
"done signal" row to warn reviewers.
### Gitea API client — `bot_bottle/contrib/gitea/client.py`
Thin stdlib-only HTTP wrapper used by the orchestrator:
### Gitea client — `bot_bottle/contrib/gitea/client.py`
```python
class GiteaClient:
def __init__(self, *, api_url: str) -> None: ...
def is_org_member(self, org: str, username: str) -> bool: ...
def post_comment(self, owner: str, repo: str, issue_number: int, body: str) -> None: ...
def get_pr_for_issue(self, owner: str, repo: str, issue_number: int) -> int | None:
"""Return the PR number whose body references issue_number, or None."""
def close_bottle_is_pr_open(self, owner: str, repo: str, pr_number: int) -> bool: ...
def get_pr_for_issue(self, owner: str, repo: str, issue_number: int) -> int | None: ...
def is_pr_open(self, owner: str, repo: str, pr_number: int) -> bool: ...
```
Authentication is not configured in the client — the egress layer injects the
Gitea token on the way out (same pattern as `GiteaDeployKeyProvisioner`).
Auth is not configured in the client — the egress layer injects the token on
the way out, matching the existing `GiteaDeployKeyProvisioner` pattern.
### Orchestrator — `bot_bottle/contrib/gitea/orchestrator.py`
### Implementation chunks
```python
class ForgeOrchestrator:
def __init__(
self,
*,
manifest: ManifestIndex,
gitea_client: GiteaClient,
agent_user: str,
backend_name: str | None = None,
) -> None: ...
1. **Headless primitives**`attach_agent_headless` + `start_headless` (with
`forge_env` param) in `cli/start.py`; `resume_headless` in `cli/resume.py`.
Tests: no tty, correct arg order, `forge_env` appears in `guest_env`.
def on_issue_opened(self, event: dict) -> None: ...
def on_issue_comment_created(self, event: dict) -> None: ...
def on_pull_request_closed(self, event: dict) -> None: ...
```
2. **Forge state**`contrib/gitea/forge_state.py`: `ForgeState` dataclass,
read/write/delete/all helpers, atomic rename. Tests: round-trip JSON, missing
file → None, atomic write.
`on_issue_opened`:
1. Extract `owner`, `repo`, `issue_number`, `assignees`, `labels`, `title`, `body`.
2. Verify assignee contains `agent_user`. Bail silently if not.
3. Parse `bot-bottle:<agent-name>` label. Post error comment + return if absent or unknown.
4. Parse optional `bot-bottle-bottle:<bottle-name>` label; else `bottle_names = ()`.
5. Build prompt: `f"Issue #{issue_number}: {title}\n\n{body}"`.
6. Call `start_headless(manifest, agent_name=..., bottle_names=..., label=..., prompt=...)`.
7. Write forge state (status=`"running"`).
8. On bottle exit: `get_freezer(backend).commit_slug(slug)`.
9. Update forge state `status="frozen"`, set `pr_number` by querying Gitea for a PR
referencing the issue.
10. Post provenance comment on the PR (or the issue if no PR found).
3. **Gitea client**`contrib/gitea/client.py`: `is_org_member`,
`post_comment`, `get_pr_for_issue`, `is_pr_open`. Tests: mock
`urllib.request.urlopen`, assert payloads and 404-as-false for membership.
`on_issue_comment_created`:
1. Look up forge state by `(owner, repo, issue_number)`. Skip if not found or destroyed.
2. Skip if comment author is `agent_user` (prevents echo loops).
3. Skip if forge state `status == "running"` (already active; queue is out of scope).
4. Update forge state `status="running"`.
5. Call `resume_headless(slug, prompt=comment_body)`.
6. Re-freeze: `get_freezer(backend).commit_slug(slug)`.
7. Update forge state `status="frozen"`.
8. Post provenance comment.
4. **Provenance**`contrib/gitea/provenance.py`: `build_provenance_footer`.
Tests: required fields present, watchdog row text, egress omitted when log
absent.
`on_pull_request_closed`:
1. Match `pr_number` against all forge states for `(owner, repo)`.
2. Destroy the bottle: call the backend's teardown for `slug` and delete the image.
3. Set `status="destroyed"`.
### Webhook listener — `bot_bottle/contrib/gitea/webhook_server.py`
Small `http.server.BaseHTTPRequestHandler` that:
- Accepts `POST /webhook`.
- Reads the `X-Gitea-Event` header to select the handler.
- Deserializes the JSON body and calls the orchestrator's matching `on_*` method.
- Returns HTTP 200 for known events, 204 for unknown (no-op).
Runs in the same thread as the CLI (blocking `serve_forever`). The orchestrator
handlers are synchronous; long-running launches block the listener thread for
the duration. (Concurrent multi-issue handling is out of scope for the MVP.)
### CLI — `bot_bottle/cli/forge.py`
```
./cli.py forge listen [--host HOST] [--port PORT] [--agent-user USER]
./cli.py forge status
```
`listen` defaults: `--host 0.0.0.0 --port 8765 --agent-user $FORGE_AGENT_USER`.
`status` prints a table of active forge bottles (slug, issue URL, PR URL, status).
`forge` is registered in `cli.py` alongside `start`, `resume`, `commit`, etc.
5. **`./cli.py orchestrate`** — `cli/orchestrate.py` with `start`, `resume`,
`status` subcommands wired into `cli.py`. Tests: arg parsing, `start`
delegates to `start_headless`, `resume` delegates to `resume_headless`.
## Provenance as the product
Every comment the orchestrator posts ends with the provenance footer. The footer
is not optional and not configurable off. This is load-bearing: it is the audit
trail that lets human reviewers verify what the agent did, what credentials it
had access to, what it called out to, and whether gitleaks caught anything.
PRs that land without a provenance footer were not opened by the forge
integration.
Every orchestrator-posted comment ends with the provenance footer — non-optional
and not configurable off. PRs that land without a footer were not produced by
this integration. The `watchdog_fired` flag in the footer flags runs where the
agent did not self-report completion, so reviewers know the audit trail may be
incomplete.
The footer also links back to the bot-bottle repo (anchored to the commit SHA
used for the run, not `main`) so the policy that governed the run is pinned in
the PR history.
## Implementation chunks
1. **Headless primitives** — `attach_agent_headless` + `start_headless` in
`cli/start.py`; `resume_headless` in `cli/resume.py`. Tests: assert no tty,
correct arg construction with and without `resume=True`.
2. **Forge state** — `contrib/gitea/forge_state.py`: `ForgeState` dataclass,
`write_forge_state`, `read_forge_state`, `delete_forge_state`,
`all_forge_states`. Tests: round-trip JSON, missing file returns None,
concurrent-write safety via atomic rename.
3. **Gitea client** — `contrib/gitea/client.py`: `post_comment`,
`get_pr_for_issue`. Tests: mock `urllib.request.urlopen` and assert payloads.
4. **Provenance** — `contrib/gitea/provenance.py`: `build_provenance_footer`.
Tests: verify footer contains all required fields; verify graceful omission
when egress log is absent.
5. **Orchestrator** — `contrib/gitea/orchestrator.py`: `ForgeOrchestrator`
with the three `on_*` handlers. Tests: mock `start_headless`,
`resume_headless`, `get_freezer`, `GiteaClient`, `forge_state.*`; assert
correct calls for each event path (happy path, unknown label, echo-loop
prevention, status=running guard).
6. **Webhook listener** — `contrib/gitea/webhook_server.py`. Tests: mock
orchestrator methods; assert correct dispatch per `X-Gitea-Event` value and
correct HTTP status codes.
7. **CLI wiring** — `cli/forge.py` + registration in `cli.py`. Tests:
`cmd_forge_status` tabular output, `cmd_forge_listen` argument parsing.
## Open questions
None.
The footer links to the bot-bottle repo pinned to the commit SHA active during
the run (not `main`), so the policy that governed the run is permanently
anchored in the PR history.