Forge native integration #317

Open
opened 2026-06-29 11:58:31 -04:00 by didericis-claude · 7 comments
Collaborator

Enable agents to receive work from and report back to Gitea natively, without manual prompting.

Flow

  1. Issue is created on Gitea forge
  2. Issue is assigned to an agent user and tagged with the bot-bottle to use
  3. Webhook is caught by an orchestrator service
  4. Orchestrator spins up a new bot-bottle with the issue body as the prompt
  5. When work finishes, the bottle is frozen (suspended) until a comment comes in
  6. Bottle is rehydrated with the new comment as the prompt
  7. Bottle is destroyed when the PR is closed

Components

  • Webhook receiver - listens for Gitea issue/PR/comment events and routes them to the orchestrator
  • Orchestrator service - manages bottle lifecycle: spin up, freeze, rehydrate, destroy
  • Freeze/rehydrate protocol - mechanism for persisting and restoring a bottle state between prompts
  • Bot-bottle label/assignee convention - how an issue signals which bottle type to use

Open questions

  • What does freeze/rehydrate look like concretely? (snapshot filesystem + conversation state? just conversation transcript?)
  • Should the orchestrator be a standalone service or part of the bot-bottle runtime?
  • How does the agent signal it is done with a work unit (vs. stuck or errored)?
  • What label/assignee convention identifies an issue as agent-targeted?
Enable agents to receive work from and report back to Gitea natively, without manual prompting. ## Flow 1. Issue is created on Gitea forge 2. Issue is assigned to an agent user and tagged with the bot-bottle to use 3. Webhook is caught by an orchestrator service 4. Orchestrator spins up a new bot-bottle with the issue body as the prompt 5. When work finishes, the bottle is frozen (suspended) until a comment comes in 6. Bottle is rehydrated with the new comment as the prompt 7. Bottle is destroyed when the PR is closed ## Components - **Webhook receiver** - listens for Gitea issue/PR/comment events and routes them to the orchestrator - **Orchestrator service** - manages bottle lifecycle: spin up, freeze, rehydrate, destroy - **Freeze/rehydrate protocol** - mechanism for persisting and restoring a bottle state between prompts - **Bot-bottle label/assignee convention** - how an issue signals which bottle type to use ## Open questions - What does freeze/rehydrate look like concretely? (snapshot filesystem + conversation state? just conversation transcript?) - Should the orchestrator be a standalone service or part of the bot-bottle runtime? - How does the agent signal it is done with a work unit (vs. stuck or errored)? - What label/assignee convention identifies an issue as agent-targeted?
didericis added the Kind/Feature
Priority
Critical
1
labels 2026-06-29 12:04:17 -04:00
Owner

@didericis-claude

What does freeze/rehydrate look like concretely? (snapshot filesystem + conversation state? just conversation transcript?)

Freeze should look call the commit command from the cli

Should the orchestrator be a standalone service or part of the bot-bottle runtime?

Good question: for the best of both worlds (convenience from a single wrapper and separation of dependencies/codebases), think it probably makes sense to have a bot-bottle orchestrate command in the bot-bottle cli that calls out to a separate orchestrator (probably just called bot-bottle-orchestrator) that will spin up bot-bottles on the host system. (Also might make sense to split out the supervisor...)

How does the agent signal it is done with a work unit (vs. stuck or errored)?

Via a comment. Not sure how best to do that integration though: think having the orchestrator send the comment if the agent doesn't respond or check in probably makes sense to account for agent not being able to send info? Also probably want some sort of dashboard the orchestrator serves so users can monitor whether an agent is stuck or see what they're doing.

What label/assignee convention identifies an issue as agent-targeted?

Should be assigned a user which is a member of a bot-bottle org (or equivalent on different forges)

@didericis-claude > What does freeze/rehydrate look like concretely? (snapshot filesystem + conversation state? just conversation transcript?) Freeze should look call the `commit` command from the cli > Should the orchestrator be a standalone service or part of the bot-bottle runtime? Good question: for the best of both worlds (convenience from a single wrapper and separation of dependencies/codebases), think it probably makes sense to have a `bot-bottle orchestrate` command in the bot-bottle cli that calls out to a separate orchestrator (probably just called `bot-bottle-orchestrator`) that will spin up bot-bottles on the host system. (Also might make sense to split out the supervisor...) > How does the agent signal it is done with a work unit (vs. stuck or errored)? Via a comment. Not sure how best to do that integration though: think having the orchestrator send the comment if the agent doesn't respond or check in probably makes sense to account for agent not being able to send info? Also probably want some sort of dashboard the orchestrator serves so users can monitor whether an agent is stuck or see what they're doing. > What label/assignee convention identifies an issue as agent-targeted? Should be assigned a user which is a member of a `bot-bottle` org (or equivalent on different forges)
Author
Collaborator

Thanks for the clarifications — a few of these change the PRD design in #318 meaningfully, flagging the deltas:

Freeze → commit: Already what the PRD calls. ✓

Orchestrator as a separate binary: The PRD currently puts everything in bot_bottle/contrib/gitea/ and wires it under ./cli.py forge listen. The new shape would be:

  • bot-bottle-orchestrator — separate process/repo, owns bottle lifecycle + webhook listener + dashboard
  • ./cli.py orchestrate — thin shim that launches or delegates to the orchestrator

This is the right call for dependency isolation and lets the orchestrator evolve independently. The PRD needs to be updated to reflect this split — the contrib/gitea/ pieces stay in bot-bottle (Gitea API client, provenance builder, forge state), but the orchestration loop and webhook server live in the separate binary.

Done signal via comment + watchdog: Makes sense. The agent posts a comment when it finishes a work unit; the orchestrator detects this via webhook and triggers freeze. The watchdog (orchestrator posts on behalf of the agent if no check-in within a timeout) closes the stuck-agent gap. One question: how does the agent post the Gitea comment from inside the bottle? Options:

  1. The orchestrator exposes an MCP tool (forge-checkin) the agent calls — the orchestrator then posts on its behalf
  2. The agent has Gitea API egress + credentials injected and calls the API directly
  3. A new forge-sidecar (analogous to the supervise sidecar) that wraps the Gitea API

Option 1 seems cleanest — it keeps credentials out of the bottle and gives the orchestrator full visibility. But option 2 fits the existing cred-proxy pattern. Which do you prefer?

Org-based targeting: Makes sense — check assignee is a member of the bot-bottle org rather than matching a hardcoded username. Does the org name need to be configurable or is bot-bottle a fixed convention?

Thanks for the clarifications — a few of these change the PRD design in #318 meaningfully, flagging the deltas: **Freeze → `commit`**: Already what the PRD calls. ✓ **Orchestrator as a separate binary**: The PRD currently puts everything in `bot_bottle/contrib/gitea/` and wires it under `./cli.py forge listen`. The new shape would be: - `bot-bottle-orchestrator` — separate process/repo, owns bottle lifecycle + webhook listener + dashboard - `./cli.py orchestrate` — thin shim that launches or delegates to the orchestrator This is the right call for dependency isolation and lets the orchestrator evolve independently. The PRD needs to be updated to reflect this split — the `contrib/gitea/` pieces stay in bot-bottle (Gitea API client, provenance builder, forge state), but the orchestration loop and webhook server live in the separate binary. **Done signal via comment + watchdog**: Makes sense. The agent posts a comment when it finishes a work unit; the orchestrator detects this via webhook and triggers freeze. The watchdog (orchestrator posts on behalf of the agent if no check-in within a timeout) closes the stuck-agent gap. One question: how does the agent post the Gitea comment from inside the bottle? Options: 1. The orchestrator exposes an MCP tool (`forge-checkin`) the agent calls — the orchestrator then posts on its behalf 2. The agent has Gitea API egress + credentials injected and calls the API directly 3. A new `forge-sidecar` (analogous to the supervise sidecar) that wraps the Gitea API Option 1 seems cleanest — it keeps credentials out of the bottle and gives the orchestrator full visibility. But option 2 fits the existing cred-proxy pattern. Which do you prefer? **Org-based targeting**: Makes sense — check assignee is a member of the `bot-bottle` org rather than matching a hardcoded username. Does the org name need to be configurable or is `bot-bottle` a fixed convention?
Author
Collaborator

Got it — direct Gitea API via cred-proxy. PRD implications:

  • The bottle manifest for forge-targeted runs gets a gitea.dideric.is egress route with Bearer auth injection (cred-proxy handles the token)
  • The orchestrator stamps forge context into the bottle as env vars at launch (FORGE_GITEA_API, FORGE_OWNER, FORGE_REPO, FORGE_ISSUE_NUMBER, FORGE_PR_NUMBER) so the agent knows where to post without it being hardcoded in the agent manifest
  • The agent prompt includes an instruction to post a done-comment when the work unit is complete; the orchestrator webhook listener detects that comment and triggers freeze
  • The watchdog fires if no comment arrives within a configurable timeout and posts on the agent's behalf

Agrees with the cred-proxy-over-MCP tradeoff — you get Gitea integration for free from the bottle's egress config, and adding a second forge later is just a new cred-proxy route rather than a new sidecar. Will update the PRD to reflect this.

Got it — direct Gitea API via cred-proxy. PRD implications: - The bottle manifest for forge-targeted runs gets a `gitea.dideric.is` egress route with Bearer auth injection (cred-proxy handles the token) - The orchestrator stamps forge context into the bottle as env vars at launch (`FORGE_GITEA_API`, `FORGE_OWNER`, `FORGE_REPO`, `FORGE_ISSUE_NUMBER`, `FORGE_PR_NUMBER`) so the agent knows where to post without it being hardcoded in the agent manifest - The agent prompt includes an instruction to post a done-comment when the work unit is complete; the orchestrator webhook listener detects that comment and triggers freeze - The watchdog fires if no comment arrives within a configurable timeout and posts on the agent's behalf Agrees with the cred-proxy-over-MCP tradeoff — you get Gitea integration for free from the bottle's egress config, and adding a second forge later is just a new cred-proxy route rather than a new sidecar. Will update the PRD to reflect this.
Owner

@didericis-claude

Org-based targeting: Makes sense — check assignee is a member of the bot-bottle org rather than matching a hardcoded username. Does the org name need to be configurable or is bot-bottle a fixed convention?

Should be configurable, but don't expect it vast majority of users to change it. Can be done via an env var when starting up the orchestrator for now, may evolve to live in some orchestrator level db when that comes into being later.

@didericis-claude > Org-based targeting: Makes sense — check assignee is a member of the bot-bottle org rather than matching a hardcoded username. Does the org name need to be configurable or is bot-bottle a fixed convention? Should be configurable, but don't expect it vast majority of users to change it. Can be done via an env var when starting up the orchestrator for now, may evolve to live in some orchestrator level db when that comes into being later.
Author
Collaborator

Makes sense — FORGE_ORG env var, default bot-bottle. The PRD has been updated to reflect this along with all the other decisions from this thread:

  • Org membership check via GET /api/v1/orgs/{org}/members/{user} at event time; org name from FORGE_ORG (not embedded in manifests or state)
  • Orchestrator-as-separate-binary architecture; bot-bottle exposes ./cli.py orchestrate start/resume/status as the interface the orchestrator calls into
  • Done signal via agent Gitea comment (cred-proxy handles auth); watchdog fires after FORGE_WATCHDOG_TIMEOUT (default 30 min) if the agent exits without posting
  • Forge env vars (FORGE_GITEA_API, FORGE_OWNER, FORGE_REPO, FORGE_ISSUE_NUMBER) injected at launch so the agent knows where to post
  • gitea.<host> egress route with Bearer auth added to forge-targeted bottles at launch time

PRD: https://gitea.dideric.is/didericis/bot-bottle/src/branch/forge-native-integration/docs/prds/prd-new-forge-native-integration.md

Makes sense — `FORGE_ORG` env var, default `bot-bottle`. The PRD has been updated to reflect this along with all the other decisions from this thread: - Org membership check via `GET /api/v1/orgs/{org}/members/{user}` at event time; org name from `FORGE_ORG` (not embedded in manifests or state) - Orchestrator-as-separate-binary architecture; bot-bottle exposes `./cli.py orchestrate start/resume/status` as the interface the orchestrator calls into - Done signal via agent Gitea comment (cred-proxy handles auth); watchdog fires after `FORGE_WATCHDOG_TIMEOUT` (default 30 min) if the agent exits without posting - Forge env vars (`FORGE_GITEA_API`, `FORGE_OWNER`, `FORGE_REPO`, `FORGE_ISSUE_NUMBER`) injected at launch so the agent knows where to post - `gitea.<host>` egress route with Bearer auth added to forge-targeted bottles at launch time PRD: https://gitea.dideric.is/didericis/bot-bottle/src/branch/forge-native-integration/docs/prds/prd-new-forge-native-integration.md
Owner

@didericis-claude

Thinking about this a little more... for better provenance and less risk of becoming too deeply coupled to a particular forge, I'm thinking option 3 might make more sense. RE option 1, am wary of adding too many MCP servers/ending up with context bloat, so am mostly deciding between 2 and 3.

Implementation sketch for option 3:

  • A Forge abstract class (should also subsume DeployKeyProvisioner/maybe have a factory method to create deploy key provisioners)
  • Method on the abstract class for what the agent needs to interact with forge (naive initial list includes crud methods for handling comments and issue/pr descriptions)
    • Maybe too restrictive, but limiting agent to interacting with a specific issue and set of PRs might be good, and orchestrator has wider permissions?
  • Sidecar which provides endpoints for interacting with forge that calls the abstract class
    • Can add hooks here for auditing.

Pros and cons of the above:

Pros:

  • Restricts agents beyond what API key permissions limit
  • Allows for uniform audit logs across different forge providers
  • Cleaner signal to orchestrator about what to do next
  • More explicit shape requirements for needed forge permissions and expected forge interactions

Cons:

  • Increased overhead per forge (inevitable due to webhooks/deploy keys regardless, but increases it)
  • Duplicate egress auditing/treats forge egress as "special"

Extend the pros and cons list for me/let me know if I'm missing anything.

@didericis-claude Thinking about this a little more... for better provenance and less risk of becoming too deeply coupled to a particular forge, I'm thinking option 3 might make more sense. RE option 1, am wary of adding too many MCP servers/ending up with context bloat, so am mostly deciding between 2 and 3. Implementation sketch for option 3: - A `Forge` abstract class (should also subsume `DeployKeyProvisioner`/maybe have a factory method to create deploy key provisioners) - Method on the abstract class for what the agent needs to interact with forge (naive initial list includes crud methods for handling comments and issue/pr descriptions) - Maybe too restrictive, but limiting agent to interacting with a specific issue and set of PRs might be good, and orchestrator has wider permissions? - Sidecar which provides endpoints for interacting with forge that calls the abstract class - Can add hooks here for auditing. Pros and cons of the above: ## Pros: - Restricts agents beyond what API key permissions limit - Allows for uniform audit logs across different forge providers - Cleaner signal to orchestrator about what to do next - More explicit shape requirements for needed forge permissions and expected forge interactions ## Cons: - Increased overhead per forge (inevitable due to webhooks/deploy keys regardless, but increases it) - Duplicate egress auditing/treats forge egress as "special" Extend the pros and cons list for me/let me know if I'm missing anything.
Author
Collaborator

Extended pros/cons:

Additional pros:

  • Unambiguous done signal: With cred-proxy the orchestrator has to parse comments to detect completion — fragile if the agent posts intermediate comments before it is actually done. A sidecar signal_done(status, summary) call is unambiguous and can relay directly to the orchestrator via a queue dir (same pattern as the supervise sidecar), no comment-parsing heuristics needed.
  • Audit trail at the semantic level: The egress proxy sees HTTP bytes. The forge sidecar sees semantic operations — "read PR description", "posted comment", "signalled done" — and can log them in structured form. Provenance becomes richer without post-hoc log parsing.
  • Scope enforcement stronger than API key scoping: API key permissions are typically repo-wide. The sidecar can enforce that the agent only touches the specific issue/PRs it was assigned. That is a meaningful reduction in blast radius for a misbehaving or prompt-injected agent.
  • Forge-agnostic agent prompts: If the agent only calls the sidecar (not forge APIs directly), system prompts do not need to mention Gitea-specific endpoints, auth headers, or payload shapes. Swapping forges does not require prompt changes.
  • Composable with existing sidecar infrastructure: Supervise already uses queue dirs and HTTP+JSON-RPC per-bottle. A forge sidecar fits the same mental model for operators and reuses the same sidecar bundle machinery.
  • Rate-limit isolation: Forge API calls can be queued and throttled inside the sidecar without touching pipelock or cred-proxy for the rest of the agent's traffic.

Additional cons:

  • Yet another sidecar per forge run: Supervise is already one extra process per bottle. Forge adds a second for forge-targeted runs. Compose spec grows; more moving parts to start, monitor, and tear down.
  • Sidecar death is a new failure mode: If the forge sidecar crashes mid-run, the agent loses forge access even though the bottle is healthy. With cred-proxy, forge is just another egress route — same failure mode as api.anthropic.com going down.
  • Scope restriction may block legitimate cross-issue reads: If the agent needs to look at a related issue for context, the sidecar's scope enforcement becomes an obstacle. Either the scope rules grow complex (read-only for other issues, read-write only for assigned) or the agent hits a wall. Worth thinking about whether the scope should be read-anywhere/write-scoped rather than fully scoped.
  • DeployKeyProvisioner subsumption may be premature: Deploy key provisioning runs at bottle provision time on the host; forge sidecar runs inside the bottle at agent time. Folding them into one Forge abstract class couples two different auth contexts. Might be cleaner to keep them separate and just share the Gitea HTTP client.
  • Upfront implementation cost per forge: Every new operation needs a method on the abstract class, a sidecar endpoint, and N implementations. Cred-proxy largely gets new forges for free once auth headers are mapped.

My read: option 3 is the right call, primarily because the unambiguous done signal and semantic audit trail are load-bearing for provenance (the stated priority of the product). Comment-parsing heuristics for the done signal are a correctness risk that will surface in production. The scope-restriction con is real but solvable with a read-anywhere/write-scoped model. The DeployKeyProvisioner subsumption I would leave for a follow-up rather than coupling it into this PRD — the forge sidecar and deploy key provisioner have different lifecycles and actors.

Extended pros/cons: **Additional pros:** - **Unambiguous done signal**: With cred-proxy the orchestrator has to parse comments to detect completion — fragile if the agent posts intermediate comments before it is actually done. A sidecar `signal_done(status, summary)` call is unambiguous and can relay directly to the orchestrator via a queue dir (same pattern as the supervise sidecar), no comment-parsing heuristics needed. - **Audit trail at the semantic level**: The egress proxy sees HTTP bytes. The forge sidecar sees semantic operations — "read PR description", "posted comment", "signalled done" — and can log them in structured form. Provenance becomes richer without post-hoc log parsing. - **Scope enforcement stronger than API key scoping**: API key permissions are typically repo-wide. The sidecar can enforce that the agent only touches the specific issue/PRs it was assigned. That is a meaningful reduction in blast radius for a misbehaving or prompt-injected agent. - **Forge-agnostic agent prompts**: If the agent only calls the sidecar (not forge APIs directly), system prompts do not need to mention Gitea-specific endpoints, auth headers, or payload shapes. Swapping forges does not require prompt changes. - **Composable with existing sidecar infrastructure**: Supervise already uses queue dirs and HTTP+JSON-RPC per-bottle. A forge sidecar fits the same mental model for operators and reuses the same sidecar bundle machinery. - **Rate-limit isolation**: Forge API calls can be queued and throttled inside the sidecar without touching pipelock or cred-proxy for the rest of the agent's traffic. **Additional cons:** - **Yet another sidecar per forge run**: Supervise is already one extra process per bottle. Forge adds a second for forge-targeted runs. Compose spec grows; more moving parts to start, monitor, and tear down. - **Sidecar death is a new failure mode**: If the forge sidecar crashes mid-run, the agent loses forge access even though the bottle is healthy. With cred-proxy, forge is just another egress route — same failure mode as api.anthropic.com going down. - **Scope restriction may block legitimate cross-issue reads**: If the agent needs to look at a related issue for context, the sidecar's scope enforcement becomes an obstacle. Either the scope rules grow complex (read-only for other issues, read-write only for assigned) or the agent hits a wall. Worth thinking about whether the scope should be read-anywhere/write-scoped rather than fully scoped. - **`DeployKeyProvisioner` subsumption may be premature**: Deploy key provisioning runs at bottle provision time on the host; forge sidecar runs inside the bottle at agent time. Folding them into one `Forge` abstract class couples two different auth contexts. Might be cleaner to keep them separate and just share the Gitea HTTP client. - **Upfront implementation cost per forge**: Every new operation needs a method on the abstract class, a sidecar endpoint, and N implementations. Cred-proxy largely gets new forges for free once auth headers are mapped. **My read**: option 3 is the right call, primarily because the unambiguous done signal and semantic audit trail are load-bearing for provenance (the stated priority of the product). Comment-parsing heuristics for the done signal are a correctness risk that will surface in production. The scope-restriction con is real but solvable with a read-anywhere/write-scoped model. The `DeployKeyProvisioner` subsumption I would leave for a follow-up rather than coupling it into this PRD — the forge sidecar and deploy key provisioner have different lifecycles and actors.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: didericis/bot-bottle#317