Part I: Forge native integration: PRD + forge library layer #318

Open
didericis-claude wants to merge 11 commits from forge-native-integration into main
Showing only changes of commit ebad90bfa9 - Show all commits
+200 -82
View File
@@ -11,17 +11,29 @@ Add a webhook-driven orchestration layer that lets Gitea issues and PR comments
drive bot-bottle sessions end-to-end with no operator in the loop for the happy drive bot-bottle sessions end-to-end with no operator in the loop for the happy
path. An issue assigned to a member of the configured agent org and labelled path. An issue assigned to a member of the configured agent org and labelled
with an agent name triggers a headless bottle launch; the bottle processes the with an agent name triggers a headless bottle launch; the bottle processes the
issue, opens a PR, and posts a done-comment via the Gitea API (through issue, opens a PR, and interacts with the forge through a **forge sidecar**
cred-proxy) before exiting. The orchestrator detects the done-comment, freezes the agent never touches the Gitea API or its credentials directly. The agent
the bottle, and attaches a provenance footer. Subsequent PR comments rehydrate calls `signal_done(status, summary)` on the sidecar when a work unit is
the frozen bottle. The bottle is destroyed when the PR closes. complete; the sidecar relays that to the orchestrator over a queue dir (the same
pattern as the supervise sidecar), so completion is an unambiguous in-band
signal rather than a comment the orchestrator has to parse. The orchestrator
freezes the bottle and attaches a provenance footer. Subsequent PR comments
rehydrate the frozen bottle. The bottle is destroyed when the PR closes.
The forge sidecar is backed by a `Forge` abstract class with per-provider
implementations (Gitea first), so the agent's prompts and the sidecar protocol
stay forge-agnostic. The sidecar logs forge operations semantically ("read PR
description", "posted comment", "signalled done"), giving richer provenance than
post-hoc egress-byte parsing, and enforces a **read-anywhere / write-scoped**
permission model: the agent may read for context but may only write to the
issue and PRs it was assigned.
The separation of concerns across the two layers: bot-bottle owns the headless The separation of concerns across the two layers: bot-bottle owns the headless
launch primitives, forge state, Gitea client, and provenance builder. launch primitives, the forge sidecar + `Forge` abstraction, forge state, and the
`bot-bottle-orchestrator` (separate binary) owns the webhook listener, bottle provenance builder. `bot-bottle-orchestrator` (separate binary) owns the webhook
lifecycle loop, and monitoring dashboard; it calls into bot-bottle via listener, bottle lifecycle loop, and monitoring dashboard; it calls into
`./cli.py orchestrate`, a thin wrapper command. This PRD covers bot-bottle's bot-bottle via `./cli.py orchestrate`, a thin wrapper command. This PRD covers
side of that contract. bot-bottle's side of that contract.
## Problem ## Problem
@@ -29,9 +41,22 @@ Today an operator must open the TUI, select an agent and bottle, confirm the
preflight, and type prompts interactively. This blocks "issue → PR" automation preflight, and type prompts interactively. This blocks "issue → PR" automation
and produces no durable audit record of what the agent did. The security model and produces no durable audit record of what the agent did. The security model
already provides the right isolation and egress controls; the missing pieces are already provides the right isolation and egress controls; the missing pieces are
the headless launch primitive that `bot-bottle-orchestrator` can call, the the headless launch primitive that `bot-bottle-orchestrator` can call, a
in-bottle Gitea API access the agent uses to signal completion, and the forge-interaction surface the agent uses to read context, post comments, and
provenance trail that makes the audit story legible to reviewers on every PR. signal completion, and the provenance trail that makes the audit story legible
to reviewers on every PR.
That forge-interaction surface could be built two ways: (2) give the agent the
Gitea API directly with cred-proxy injecting the token, or (3) put a forge
sidecar between the agent and the forge. This PRD takes **option 3**. The
deciding factors: a sidecar `signal_done` call is an unambiguous completion
signal where comment-parsing is a correctness risk that surfaces in production;
the sidecar produces a semantic audit trail rather than HTTP bytes, which is
load-bearing for provenance (the stated product priority); and the sidecar can
enforce scope tighter than repo-wide API-key permissions, reducing blast radius
for a prompt-injected agent. The costs — a second sidecar process per forge run,
a new failure mode if it crashes, and per-forge implementation cost — are
accepted as the price of those properties.
## Goals / Success Criteria ## Goals / Success Criteria
@@ -42,16 +67,20 @@ provenance trail that makes the audit story legible to reviewers on every PR.
2. An issue assigned to a member of the configured org (`FORGE_ORG`, default 2. An issue assigned to a member of the configured org (`FORGE_ORG`, default
`bot-bottle`) and labelled `bot-bottle:<agent-name>` is the trigger `bot-bottle`) and labelled `bot-bottle:<agent-name>` is the trigger
convention. Org membership is verified via the Gitea API at event time. convention. Org membership is verified via the Gitea API at event time.
3. Forge-targeted bottles receive a set of env vars at launch 3. Forge-targeted bottles run a **forge sidecar** that exposes a small,
(`FORGE_GITEA_API`, `FORGE_OWNER`, `FORGE_REPO`, `FORGE_ISSUE_NUMBER`) so forge-agnostic API (comment/issue/PR CRUD plus `signal_done`) over the same
the agent knows where to post its done-comment without hardcoding forge queue-dir + HTTP/JSON-RPC machinery as the supervise sidecar. The agent calls
context in the agent manifest. the sidecar; it never sees the forge token or forge-specific endpoints.
4. The agent's egress policy for forge runs includes `gitea.<host>` with Bearer 4. The sidecar is backed by a `Forge` abstract class. Gitea is the first
auth injected by cred-proxy, enabling direct Gitea API calls from inside the concrete implementation; adding a forge means a new subclass, not changes to
bottle. the agent prompt or sidecar protocol. The sidecar enforces a read-anywhere /
5. The done-comment the agent posts is the done signal. A watchdog timeout write-scoped model: writes are limited to the assigned issue and its PRs;
(configurable, default 30 min) causes the orchestrator to post the reads are unrestricted for context.
done-comment on the agent's behalf if the agent exits without posting one. 5. The agent calls `signal_done(status, summary)` on the sidecar when a work
unit is complete; the sidecar relays it to the orchestrator over a queue dir.
This is the done signal — no comment parsing. A watchdog timeout
(configurable, default 30 min) causes the orchestrator to treat the run as
done-without-self-report if the agent exits without signalling.
6. Every orchestrator-posted comment ends with a provenance footer: agent name, 6. Every orchestrator-posted comment ends with a provenance footer: agent name,
bottle name(s), slug, start time, duration, exit code, gitleaks result, and bottle name(s), slug, start time, duration, exit code, gitleaks result, and
egress summary. egress summary.
@@ -61,7 +90,9 @@ provenance trail that makes the audit story legible to reviewers on every PR.
issue/PR URLs. issue/PR URLs.
9. Unit tests cover: label parsing, org-membership check path, forge state 9. Unit tests cover: label parsing, org-membership check path, forge state
read/write, provenance footer rendering, headless launch arg construction, read/write, provenance footer rendering, headless launch arg construction,
forge env var injection, echo-loop guard. forge env var injection, sidecar request dispatch through the `Forge`
abstraction, write-scope enforcement (reject writes outside the assigned
issue/PRs), and `signal_done` queue relay.
## Non-goals ## Non-goals
@@ -74,6 +105,11 @@ provenance trail that makes the audit story legible to reviewers on every PR.
- Bottle destruction on issue close (PR close only; issue close is ambiguous). - Bottle destruction on issue close (PR close only; issue close is ambiguous).
- Concurrent multi-issue handling (one blocking run per orchestrator process). - Concurrent multi-issue handling (one blocking run per orchestrator process).
- A monitoring dashboard (orchestrator-side concern). - A monitoring dashboard (orchestrator-side concern).
- Folding `DeployKeyProvisioner` into the `Forge` abstraction. Deploy-key
provisioning runs at bottle-provision time on the host; the forge sidecar runs
inside the bottle at agent time. The two have different lifecycles and actors,
so coupling them into one class is deferred to a follow-up. This PRD only
shares the Gitea HTTP client between them.
## Design ## Design
@@ -151,9 +187,9 @@ def start_headless(
"""Non-interactive bottle launch. Returns (slug, exit_code).""" """Non-interactive bottle launch. Returns (slug, exit_code)."""
``` ```
`forge_env` is merged into the bottle's `guest_env` so the agent receives the `forge_env` carries the forge context and token to the forge sidecar launched
forge context as env vars (see below). The caller freezes the bottle after alongside the agent (see below); the agent process itself does not receive the
`start_headless` returns. token. The caller freezes the bottle after `start_headless` returns.
**`resume_headless`** — new function in `bot_bottle/cli/resume.py`: **`resume_headless`** — new function in `bot_bottle/cli/resume.py`:
@@ -162,61 +198,124 @@ def resume_headless(slug: str, *, prompt: str, backend_name: str | None = None)
"""Rehydrate a frozen bottle and run one headless prompt. Returns exit_code.""" """Rehydrate a frozen bottle and run one headless prompt. Returns exit_code."""
``` ```
### Forge sidecar
Forge-targeted bottles run a forge sidecar alongside the agent, mirroring the
supervise sidecar: a per-bottle process that exposes an HTTP/JSON-RPC endpoint
over a Unix socket and relays events to the orchestrator through a queue dir.
The agent calls the sidecar; the sidecar holds the forge token and makes the
actual forge API calls. The agent never receives the credential and never sees a
forge-specific endpoint — swapping Gitea for another forge does not change the
agent prompt or the sidecar protocol.
The sidecar is configured at launch from the forge context (owner, repo, issue,
PR) and the token, supplied by the orchestrator — not baked into the agent
manifest. Because the sidecar owns the token, forge traffic does not need a
cred-proxy egress route on the agent; the agent's egress policy is unchanged by
forge targeting.
**Sidecar protocol** (forge-agnostic; each method maps to a `Forge` call):
| Method | Scope | Purpose |
|---|---|---|
| `read_issue(number)` | read-anywhere | Read issue/PR body for context |
| `read_comments(number)` | read-anywhere | Read a thread for context |
| `post_comment(number, body)` | write-scoped | Post to the assigned issue/PR |
| `update_description(number, body)` | write-scoped | Edit the assigned issue/PR body |
| `signal_done(status, summary)` | — | Relay completion to the orchestrator |
**Scope enforcement** is read-anywhere / write-scoped: read methods accept any
issue/PR number for context; write methods are rejected unless the target is the
assigned issue or one of its PRs. This is tighter than Gitea's repo-wide API-key
permissions and bounds the blast radius of a prompt-injected agent. Rejections
are logged semantically (operation, target, reason) so the audit trail records
attempted out-of-scope writes, not just allowed ones.
**Semantic audit**: every sidecar call is logged as a structured operation
("read PR #318 description", "posted comment to #317", "signalled done:
success") rather than as opaque HTTP bytes. This log feeds provenance directly,
with no post-hoc egress-log parsing.
### `Forge` abstraction — `bot_bottle/contrib/forge/`
The sidecar dispatches to a `Forge` abstract class. Each provider implements the
operations behind the sidecar protocol:
```python
class Forge(abc.ABC):
@abc.abstractmethod
def read_issue(self, number: int) -> Issue: ...
@abc.abstractmethod
def read_comments(self, number: int) -> list[Comment]: ...
@abc.abstractmethod
def post_comment(self, number: int, body: str) -> None: ...
@abc.abstractmethod
def update_description(self, number: int, body: str) -> None: ...
@abc.abstractmethod
def is_org_member(self, org: str, username: str) -> bool: ...
@abc.abstractmethod
def get_pr_for_issue(self, number: int) -> int | None: ...
@abc.abstractmethod
def is_pr_open(self, number: int) -> bool: ...
```
`GiteaForge` is the first and only concrete implementation in this PRD. It wraps
the Gitea HTTP client (below). Adding GitHub or GitLab later is a new subclass;
the sidecar, protocol, and agent prompt are untouched.
> **Deferred:** `DeployKeyProvisioner` is *not* folded into `Forge` here.
> Deploy-key provisioning runs on the host at provision time; the sidecar runs
> in the bottle at agent time. They have different lifecycles and actors, so a
> shared abstract base would couple two unrelated auth contexts. For now they
> only share the Gitea HTTP client; a later PRD can revisit unification.
### Forge env vars ### Forge env vars
The orchestrator builds this dict and passes it to `start_headless` as The orchestrator passes forge context to the **sidecar** (not the agent) at
`forge_env`: launch. The agent does not need owner/repo/issue env vars to construct API
calls, since it only names issue/PR numbers to the sidecar:
| Var | Example | Purpose | | Var | Example | Purpose |
|---|---|---| |---|---|---|
| `FORGE_GITEA_API` | `https://gitea.dideric.is/api/v1` | Base URL for Gitea API calls | | `FORGE_GITEA_API` | `https://gitea.dideric.is/api/v1` | Base URL the sidecar calls |
| `FORGE_OWNER` | `didericis` | Repo owner | | `FORGE_OWNER` | `didericis` | Repo owner |
| `FORGE_REPO` | `bot-bottle` | Repo name | | `FORGE_REPO` | `bot-bottle` | Repo name |
| `FORGE_ISSUE_NUMBER` | `317` | Issue that triggered the run | | `FORGE_ISSUE_NUMBER` | `317` | Assigned issue (defines write scope) |
| `FORGE_PR_NUMBER` | `318` | PR to comment on (empty until PR exists) | | `FORGE_PR_NUMBER` | `318` | Assigned PR (empty until PR exists) |
The agent's system prompt (from the manifest) instructs it to post a comment to The agent's forge-specific prompt instructs it to call `signal_done` on the
`$FORGE_GITEA_API/repos/$FORGE_OWNER/$FORGE_REPO/issues/$FORGE_ISSUE_NUMBER/comments` sidecar when a work unit is complete, and to use the sidecar for any
when it finishes a work unit. The instruction is part of the forge-specific comment/description writes. The instruction is forge-agnostic and is part of the
agent prompt, not the base agent manifest, so non-forge runs are unaffected. forge prompt overlay, not the base agent manifest, so non-forge runs are
unaffected.
### Gitea egress for forge-targeted bottles
Forge-targeted bottles get an additional egress route injected by the
orchestrator at launch time. This is passed as an extra `EgressRoute` in the
`BottleSpec` (or via the forge env and bottle manifest) rather than requiring
operators to add it to every agent manifest:
```yaml
host: gitea.dideric.is
auth:
scheme: Bearer
token_env: GITEA_TOKEN
```
The cred-proxy injects the token; the agent never sees the raw credential.
### Done signal and watchdog ### Done signal and watchdog
The agent posts a Gitea comment when it finishes a work unit. The orchestrator The agent calls `signal_done(status, summary)` on the sidecar when it finishes a
webhook listener receives the `issue_comment` event and: work unit. The sidecar writes the event to its queue dir; the orchestrator reads
it and:
1. Verifies the commenter is a member of `FORGE_ORG`. 1. Reads the forge state for `(owner, repo, issue_number)`.
2. Reads the forge state for `(owner, repo, issue_number)`. 2. If `status == "running"`, treats the event as the done signal: freezes the
3. If `status == "running"`, treats the comment as the done signal: freezes the bottle, posts a summary comment with the provenance footer, sets
bottle, appends the provenance footer to the same comment thread, sets
`status = "frozen"`. `status = "frozen"`.
**Watchdog**: the orchestrator tracks `last_checkin_at` in forge state. A Because completion is an explicit `signal_done` call, the orchestrator does not
background thread wakes every minute. If `now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT` parse comment text to detect "done", and intermediate comments the agent posts
(default 30 min, configurable via env) and `status == "running"`, the mid-run cannot be mistaken for completion.
orchestrator posts the provenance footer comment on behalf of the agent and
freezes the bottle.
Echo-loop guard: comments from members of `FORGE_ORG` that are not the **Watchdog**: the orchestrator tracks `last_checkin_at` in forge state, updated
currently-running slug's agent user are still dispatched as resume triggers, not on each sidecar event. A background thread wakes every minute. If
as done signals. The comment-is-done-signal path checks that `now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT` (default 30 min, configurable
`comment.user.login == agent_git_user` (read from forge state). via env) and `status == "running"`, the orchestrator treats the run as
done-without-self-report: it posts the provenance footer (with `watchdog_fired`
set) and freezes the bottle.
**Sidecar-death failure mode**: if the forge sidecar crashes mid-run the agent
loses forge access while the bottle is otherwise healthy. The orchestrator
detects a dead sidecar (socket/queue gone) the same way it detects a stalled
agent and falls back to the watchdog path, posting a footer that flags the
incomplete run.
### Forge state — `bot_bottle/contrib/gitea/forge_state.py` ### Forge state — `bot_bottle/contrib/gitea/forge_state.py`
@@ -289,13 +388,16 @@ Output (collapsed by default):
| duration | 4m 12s | | duration | 4m 12s |
| exit | 0 ✓ | | exit | 0 ✓ |
| gitleaks | ✓ no secrets detected | | gitleaks | ✓ no secrets detected |
| done signal | agent comment *(or: watchdog — agent did not check in)* | | done signal | sidecar `signal_done` *(or: watchdog — agent did not signal)* |
**Egress** (deny-by-default; 3 routes allowed) **Egress** (deny-by-default; 2 routes allowed)
- `api.anthropic.com` — Bearer auth - `api.anthropic.com` — Bearer auth
- `gitea.dideric.is` — Bearer auth
- `pypi.org` — unauthenticated - `pypi.org` — unauthenticated
Forge traffic is not an agent egress route — the forge sidecar holds the token
and makes those calls out of band. The provenance footer's forge operations come
from the sidecar's semantic audit log.
</details> </details>
``` ```
@@ -303,19 +405,26 @@ The egress summary is read from `~/.bot-bottle/state/<slug>/egress/`. When
unavailable the section is omitted. `watchdog_fired=True` changes the unavailable the section is omitted. `watchdog_fired=True` changes the
"done signal" row to warn reviewers. "done signal" row to warn reviewers.
### Gitea client — `bot_bottle/contrib/gitea/client.py` ### Gitea HTTP client — `bot_bottle/contrib/gitea/client.py`
`GiteaForge` (and the existing `GiteaDeployKeyProvisioner`) share one thin HTTP
client. Unlike the option-2 design, the token is held by the sidecar process and
passed to the client directly — there is no agent-side cred-proxy route to
inject it, because the agent never makes forge calls.
```python ```python
class GiteaClient: class GiteaClient:
def __init__(self, *, api_url: str) -> None: ... def __init__(self, *, api_url: str, owner: str, repo: str, token: str) -> None: ...
def is_org_member(self, org: str, username: str) -> bool: ... def is_org_member(self, org: str, username: str) -> bool: ...
def post_comment(self, owner: str, repo: str, issue_number: int, body: str) -> None: ... def post_comment(self, issue_number: int, body: str) -> None: ...
def get_pr_for_issue(self, owner: str, repo: str, issue_number: int) -> int | None: ... def update_comment_body(self, issue_number: int, body: str) -> None: ...
def is_pr_open(self, owner: str, repo: str, pr_number: int) -> bool: ... def get_pr_for_issue(self, issue_number: int) -> int | None: ...
def is_pr_open(self, pr_number: int) -> bool: ...
``` ```
Auth is not configured in the client — the egress layer injects the token on Sharing only the HTTP client (not an abstract base) is the deliberate boundary
the way out, matching the existing `GiteaDeployKeyProvisioner` pattern. between the sidecar and the deploy-key provisioner — see the deferral note under
the `Forge` abstraction.
### Implementation chunks ### Implementation chunks
@@ -327,16 +436,25 @@ the way out, matching the existing `GiteaDeployKeyProvisioner` pattern.
read/write/delete/all helpers, atomic rename. Tests: round-trip JSON, missing read/write/delete/all helpers, atomic rename. Tests: round-trip JSON, missing
file → None, atomic write. file → None, atomic write.
3. **Gitea client**`contrib/gitea/client.py`: `is_org_member`, 3. **`Forge` abstraction + Gitea client** — `contrib/forge/base.py` (`Forge`
`post_comment`, `get_pr_for_issue`, `is_pr_open`. Tests: mock ABC) and `contrib/gitea/client.py` + `GiteaForge`: `is_org_member`,
`urllib.request.urlopen`, assert payloads and 404-as-false for membership. `read_issue`, `read_comments`, `post_comment`, `update_description`,
`get_pr_for_issue`, `is_pr_open`. Tests: mock `urllib.request.urlopen`,
assert payloads and 404-as-false for membership.
4. **Provenance**`contrib/gitea/provenance.py`: `build_provenance_footer`. 4. **Forge sidecar** — sidecar process exposing the protocol over a Unix socket,
queue-dir relay, write-scope enforcement, semantic op log, `signal_done`.
Reuses the supervise sidecar bundle machinery. Tests: dispatch each method to
the `Forge`, reject out-of-scope writes, `signal_done` writes a queue event,
scope-rejection is logged.
Review

Reword this: we will have a provenance api, but we won’t surface it in the pr

Reword this: we will have a provenance api, but we won’t surface it in the pr
5. **Provenance**`contrib/gitea/provenance.py`: `build_provenance_footer`.
Tests: required fields present, watchdog row text, egress omitted when log Tests: required fields present, watchdog row text, egress omitted when log
absent. absent.
5. **`./cli.py orchestrate`** — `cli/orchestrate.py` with `start`, `resume`, 6. **`./cli.py orchestrate`** — `cli/orchestrate.py` with `start`, `resume`,
`status` subcommands wired into `cli.py`. Tests: arg parsing, `start` `status` subcommands wired into `cli.py`; `start` launches the forge sidecar
alongside the agent for forge-targeted runs. Tests: arg parsing, `start`
delegates to `start_headless`, `resume` delegates to `resume_headless`. delegates to `start_headless`, `resume` delegates to `resume_headless`.
## Provenance as the product ## Provenance as the product