refactor(forge): address PR #318 review — PR/Issue split, sqlite state, drop footer

Addresses the five review comments on PR #318: - Split PullRequest from Issue and add a dedicated read_pr method on Forge/ScopedForge/GiteaForge (a PR carries merge state an issue does not); is_pr_open now derives from read_pr. - Replace the JSON-file forge state with a thin swappable CRUD interface (ForgeStateStore) backed by SQLite (SqliteForgeStateStore) at ~/.bot-bottle/bot-bottle.db. - Remove the provenance footer (provenance.py + its test): a mutable, unsigned PR comment is not an audit record. - Reword the PRD: provenance is exposed via an API, not surfaced in the PR; document the Issue/PullRequest split and the SQLite store. pyright clean (whole repo), pylint 10/10, 38 forge/resume unit tests pass; no remaining refs to the removed provenance module or old JSON state API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WL77TgFxKbs3cidGMG9dz7
2026-07-01 08:37:25 -04:00
parent f211ece6bf
commit 42004d37fd
9 changed files with 332 additions and 432 deletions
@@ -17,8 +17,8 @@ calls `signal_done(status, summary)` on the sidecar when a work unit is
 complete; the sidecar relays that to the orchestrator over a queue dir (the same
 pattern as the supervise sidecar), so completion is an unambiguous in-band
 signal rather than a comment the orchestrator has to parse. The orchestrator
-freezes the bottle and attaches a provenance footer. Subsequent PR comments
-rehydrate the frozen bottle. The bottle is destroyed when the PR closes.
+freezes the bottle. Subsequent PR comments rehydrate the frozen bottle. The
+bottle is destroyed when the PR closes.

 The forge sidecar is backed by a `Forge` abstract class with per-provider
 implementations (Gitea first), so the agent's prompts and the sidecar protocol
@@ -28,12 +28,17 @@ post-hoc egress-byte parsing, and enforces a **read-anywhere / write-scoped**
 permission model: the agent may read for context but may only write to the
 issue and PRs it was assigned.

+Run provenance is exposed through a **provenance API** (the sidecar's structured
+operation log plus the run's metadata), not posted back into the forge. We do
+not surface a provenance footer in the PR — the audit record lives behind the
+API where it can be retained and queried, rather than as an editable comment.
+
 The separation of concerns across the two layers: bot-bottle owns the headless
-launch primitives, the forge sidecar + `Forge` abstraction, forge state, and the
-provenance builder. `bot-bottle-orchestrator` (separate binary) owns the webhook
-listener, bottle lifecycle loop, and monitoring dashboard; it calls into
-bot-bottle via `./cli.py orchestrate`, a thin wrapper command. This PRD covers
-bot-bottle's side of that contract.
+launch primitives, the forge sidecar + `Forge` abstraction, and forge state.
+`bot-bottle-orchestrator` (separate binary) owns the webhook listener, bottle
+lifecycle loop, and monitoring dashboard; it calls into bot-bottle via
+`./cli.py orchestrate`, a thin wrapper command. This PRD covers bot-bottle's
+side of that contract.

 ## Problem

@@ -84,18 +89,19 @@ accepted as the price of those properties.
   This is the done signal — no comment parsing. A watchdog timeout
   (configurable, default 30 min) causes the orchestrator to treat the run as
   done-without-self-report if the agent exits without signalling.
-6. Every orchestrator-posted comment ends with a provenance footer: agent name,
-   bottle name(s), slug, start time, duration, exit code, gitleaks result, and
-   egress summary.
-7. Forge state (issue → slug, status) is persisted to disk and survives
-   orchestrator restarts.
+6. Run provenance (agent name, bottle name(s), slug, timing, exit code,
+   gitleaks result, egress summary, and the sidecar's semantic operation log)
+   is available through a provenance API. It is **not** surfaced as a PR footer
+   or any other forge comment.
+7. Forge state (issue → slug, status) is persisted in a local SQLite database
+   under `~/.bot-bottle/` and survives orchestrator restarts.
 8. `./cli.py orchestrate status` lists active forge-managed bottles and their
   issue/PR URLs.
 9. Unit tests cover: label parsing, org-membership check path, forge state
-   read/write, provenance footer rendering, headless launch arg construction,
-   forge env var injection, sidecar request dispatch through the `Forge`
-   abstraction, write-scope enforcement (reject writes outside the assigned
-   issue/PRs), and `signal_done` queue relay.
+   store CRUD (SQLite), headless launch arg construction, forge env var
+   injection, sidecar request dispatch through the `Forge` abstraction,
+   write-scope enforcement (reject writes outside the assigned issue/PRs), and
+   `signal_done` queue relay.

 ## Non-goals

@@ -205,12 +211,16 @@ forge targeting.

 | Method | Scope | Purpose |
 |---|---|---|
-| `read_issue(number)` | read-anywhere | Read issue/PR body for context |
+| `read_issue(number)` | read-anywhere | Read an issue body for context |
+| `read_pr(number)` | read-anywhere | Read a PR (incl. merge state) for context |
 | `read_comments(number)` | read-anywhere | Read a thread for context |
 | `post_comment(number, body)` | write-scoped | Post to the assigned issue/PR |
 | `update_description(number, body)` | write-scoped | Edit the assigned issue/PR body |
 | `signal_done(status, summary)` | — | Relay completion to the orchestrator |

+Issues and PRs are distinct domain objects (`Issue` vs `PullRequest`) read
+through distinct methods; a PR carries merge state an issue does not.
+
 **Scope enforcement** is read-anywhere / write-scoped: read methods accept any
 issue/PR number for context; write methods are rejected unless the target is the
 assigned issue or one of its PRs. This is tighter than Gitea's repo-wide API-key
@@ -233,6 +243,8 @@ class Forge(abc.ABC):
    @abc.abstractmethod
    def read_issue(self, number: int) -> Issue: ...
    @abc.abstractmethod
+    def read_pr(self, number: int) -> PullRequest: ...
+    @abc.abstractmethod
    def read_comments(self, number: int) -> list[Comment]: ...
    @abc.abstractmethod
    def post_comment(self, number: int, body: str) -> None: ...
@@ -246,6 +258,11 @@ class Forge(abc.ABC):
    def is_pr_open(self, number: int) -> bool: ...
 ```

+`Issue` and `PullRequest` are separate frozen dataclasses — a PR adds `merged`.
+`ScopedForge` wraps a concrete `Forge` to enforce the read-anywhere /
+write-scoped model (`post_comment` / `update_description` raise `ForgeScopeError`
+outside the assigned issue and PRs).
+
 `GiteaForge` is the first and only concrete implementation in this PRD. It wraps
 the Gitea HTTP client (below). Adding GitHub or GitLab later is a new subclass;
 the sidecar, protocol, and agent prompt are untouched.
@@ -284,8 +301,8 @@ it and:

 1. Reads the forge state for `(owner, repo, issue_number)`.
 2. If `status == "running"`, treats the event as the done signal: freezes the
-   bottle, posts a summary comment with the provenance footer, sets
-   `status = "frozen"`.
+   bottle and sets `status = "frozen"`. Provenance is recorded via the
+   provenance API — no comment is posted to the forge.

 Because completion is an explicit `signal_done` call, the orchestrator does not
 parse comment text to detect "done", and intermediate comments the agent posts
@@ -295,102 +312,61 @@ mid-run cannot be mistaken for completion.
 on each sidecar event. A background thread wakes every minute. If
 `now - last_checkin_at > FORGE_WATCHDOG_TIMEOUT` (default 30 min, configurable
 via env) and `status == "running"`, the orchestrator treats the run as
-done-without-self-report: it posts the provenance footer (with `watchdog_fired`
-set) and freezes the bottle.
+done-without-self-report and freezes the bottle, flagging the run as incomplete
+in the provenance record.

 **Sidecar-death failure mode**: if the forge sidecar crashes mid-run the agent
 loses forge access while the bottle is otherwise healthy. The orchestrator
 detects a dead sidecar (socket/queue gone) the same way it detects a stalled
-agent and falls back to the watchdog path, posting a footer that flags the
-incomplete run.
+agent and falls back to the watchdog path.

 ### Forge state — `bot_bottle/contrib/gitea/forge_state.py`

-```
-~/.bot-bottle/forge/
-    <owner>/
-        <repo>/
-            issue-<n>.json
-```
+State is stored in a local SQLite database at `~/.bot-bottle/bot-bottle.db`.
+Access goes through a thin CRUD interface, `ForgeStateStore`, so the storage
+location/engine can be swapped without touching callers. `SqliteForgeStateStore`
+is the first implementation.

-Schema:
-
-```json
-{
-  "slug": "implementer-abc12",
-  "pr_number": 42,
-  "agent_name": "implementer",
-  "bottle_names": ["claude"],
-  "backend_name": "docker",
-  "agent_git_user": "didericis-claude",
-  "issue_number": 17,
-  "owner": "didericis",
-  "repo": "bot-bottle",
-  "status": "frozen",
-  "last_checkin_at": "2026-06-29T12:04:12-04:00"
-}
-```
+The `forge_state` table is keyed by `(owner, repo, issue_number)` and carries:
+`slug`, `agent_name`, `bottle_names` (JSON), `backend_name`, `agent_git_user`,
+`pr_number` (nullable), `status`, `last_checkin_at`.

 `status`: `"running"` | `"frozen"` | `"destroyed"`.

-Public API:
+Store interface:

 ```python
-def write_forge_state(state: ForgeState) -> None: ...
-def read_forge_state(owner: str, repo: str, issue_number: int) -> ForgeState | None: ...
-def delete_forge_state(owner: str, repo: str, issue_number: int) -> None: ...
-def all_forge_states() -> list[ForgeState]: ...
+class ForgeStateStore(abc.ABC):
+    def upsert(self, state: ForgeState) -> None: ...
+    def get(self, owner: str, repo: str, issue_number: int) -> ForgeState | None: ...
+    def delete(self, owner: str, repo: str, issue_number: int) -> None: ...
+    def all(self) -> list[ForgeState]: ...
+
+class SqliteForgeStateStore(ForgeStateStore):
+    def __init__(self, db_path: Path | None = None) -> None: ...
 ```

-Writes use atomic rename (`os.replace`) for crash safety.
+`upsert` uses `INSERT OR REPLACE` so a re-run for the same issue overwrites in
+place. The schema is created on first open.

-### Provenance — `bot_bottle/contrib/gitea/provenance.py`
+### Provenance API

-```python
-def build_provenance_footer(
-    slug: str,
-    *,
-    agent_name: str,
-    bottle_names: tuple[str, ...],
-    started_at: str,
-    finished_at: str,
-    exit_code: int,
-    watchdog_fired: bool = False,
-    egress_log_path: Path | None = None,
-) -> str:
-    """Return a markdown string for appending to a Gitea comment body."""
-```
+Run provenance — agent, bottle(s), slug, timing, exit code, gitleaks result,
+egress summary, watchdog-fired flag, and the sidecar's semantic operation log —
+is exposed through a **provenance API**, not posted into the forge. There is no
+provenance footer or run-summary comment.

-Output (collapsed by default):
+The rationale (per the monetization positioning): a PR comment is mutable by any
+maintainer, unsigned, and per-PR, so it is worthless as an audit record and
+invites false trust. The authoritative record therefore lives behind the API,
+where it can be retained, queried, and (eventually) signed. Whether any
+projection of it ever appears in the forge is a separate, out-of-scope decision;
+this PR does not build one.

-```markdown
-<details><summary>🔬 Run provenance</summary>
-
-| Field | Value |
-|---|---|
-| agent | `implementer` |
-| bottle | `claude` |
-| slug | `implementer-abc12` |
-| started | 2026-06-29T12:00:00-04:00 |
-| duration | 4m 12s |
-| exit | 0 ✓ |
-| gitleaks | ✓ no secrets detected |
-| done signal | sidecar `signal_done` *(or: watchdog — agent did not signal)* |
-
-**Egress** (deny-by-default; 2 routes allowed)
- `api.anthropic.com` — Bearer auth
- `pypi.org` — unauthenticated
-
-Forge traffic is not an agent egress route — the forge sidecar holds the token
-and makes those calls out of band. The provenance footer's forge operations come
-from the sidecar's semantic audit log.
-
-</details>
-```
-
-The egress summary is read from `~/.bot-bottle/state/<slug>/egress/`. When
-unavailable the section is omitted. `watchdog_fired=True` changes the
-"done signal" row to warn reviewers.
+The API surface itself (schema, transport, signing, retention) is **out of scope
+for this PRD** and belongs with the orchestrator / control-plane work. bot-bottle
+here only produces the raw material: the sidecar's semantic operation log and the
+run metadata the orchestrator collects.

 ### Gitea HTTP client — `bot_bottle/contrib/gitea/client.py`

@@ -403,15 +379,17 @@ inject it, because the agent never makes forge calls.
 class GiteaClient:
    def __init__(self, *, api_url: str, owner: str, repo: str, token: str) -> None: ...
    def is_org_member(self, org: str, username: str) -> bool: ...
-    def post_comment(self, issue_number: int, body: str) -> None: ...
-    def update_comment_body(self, issue_number: int, body: str) -> None: ...
-    def get_pr_for_issue(self, issue_number: int) -> int | None: ...
-    def is_pr_open(self, pr_number: int) -> bool: ...
+    def get_issue(self, number: int) -> dict: ...
+    def get_comments(self, number: int) -> list[dict]: ...
+    def post_comment(self, number: int, body: str) -> None: ...
+    def patch_issue_body(self, number: int, body: str) -> None: ...
+    def get_pull(self, number: int) -> dict: ...
 ```

-Sharing only the HTTP client (not an abstract base) is the deliberate boundary
-between the sidecar and the deploy-key provisioner — see the deferral note under
-the `Forge` abstraction.
+`GiteaForge` adapts this client to the `Forge` surface (mapping raw JSON to
+`Issue` / `PullRequest` / `Comment`). Sharing only the HTTP client (not an
+abstract base) is the deliberate boundary between the sidecar and the deploy-key
+provisioner — see the deferral note under the `Forge` abstraction.

 ### Implementation chunks

@@ -423,14 +401,17 @@ the `Forge` abstraction.
   the TUI and y/N preflight and returns the agent exit code.

 2. **Forge state** — `contrib/gitea/forge_state.py`: `ForgeState` dataclass,
-   read/write/delete/all helpers, atomic rename. Tests: round-trip JSON, missing
-   file → None, atomic write.
+   `ForgeStateStore` CRUD interface, `SqliteForgeStateStore`. Tests: round-trip,
+   missing → None, `INSERT OR REPLACE` upsert, delete idempotent, `all()`
+   ordering, persistence across store instances.

 3. **`Forge` abstraction + Gitea client** — `contrib/forge/base.py` (`Forge`
-   ABC) and `contrib/gitea/client.py` + `GiteaForge`: `is_org_member`,
-   `read_issue`, `read_comments`, `post_comment`, `update_description`,
+   ABC, `ScopedForge`, `Issue` / `PullRequest` / `Comment`) and
+   `contrib/gitea/client.py` + `GiteaForge`: `is_org_member`, `read_issue`,
+   `read_pr`, `read_comments`, `post_comment`, `update_description`,
   `get_pr_for_issue`, `is_pr_open`. Tests: mock `urllib.request.urlopen`,
-   assert payloads and 404-as-false for membership.
+   assert payloads and 404-as-false for membership; `ScopedForge` write-scope
+   enforcement.

 4. **Forge sidecar** — sidecar process exposing the protocol over a Unix socket,
   queue-dir relay, write-scope enforcement, semantic op log, `signal_done`.
@@ -438,23 +419,21 @@ the `Forge` abstraction.
   the `Forge`, reject out-of-scope writes, `signal_done` writes a queue event,
   scope-rejection is logged.

-5. **Provenance** — `contrib/gitea/provenance.py`: `build_provenance_footer`.
-   Tests: required fields present, watchdog row text, egress omitted when log
-   absent.
-
-6. **`./cli.py orchestrate`** — `cli/orchestrate.py` with `start`, `resume`,
+5. **`./cli.py orchestrate`** — `cli/orchestrate.py` with `start`, `resume`,
   `status` subcommands wired into `cli.py`; `start` launches the forge sidecar
   alongside the agent for forge-targeted runs. Tests: arg parsing, `start`
   delegates to `start --headless`, `resume` delegates to `resume --headless`.

-## Provenance as the product
+## Provenance

-Every orchestrator-posted comment ends with the provenance footer — non-optional
-and not configurable off. PRs that land without a footer were not produced by
-this integration. The `watchdog_fired` flag in the footer flags runs where the
-agent did not self-report completion, so reviewers know the audit trail may be
+Run provenance is captured (sidecar semantic operation log + run metadata) and
+exposed through a provenance API. It is deliberately **not** surfaced in the
+forge — no footer, no run-summary comment. A mutable, unsigned PR comment is not
+an audit record; the authoritative record lives behind the API where it can be
+retained and signed. The `watchdog_fired` flag marks runs where the agent did
+not self-report completion so consumers of the API know the record may be
 incomplete.

-The footer links to the bot-bottle repo pinned to the commit SHA active during
-the run (not `main`), so the policy that governed the run is permanently
-anchored in the PR history.
+The provenance API's schema, transport, signing, and retention are out of scope
+for this PRD (control-plane work); bot-bottle here produces the raw material
+only.