refactor(backend): hoist guest_home to BottlePlan base

Per PR review feedback (review #132): guest_home shouldn't be buried inside workspace_plan / read from a hardcoded literal in each provision module. It's a cross-cutting bottle property — the backend's prepare step knows it, and every downstream consumer (contrib providers, git provisioning, gitconfig path) should read it from one place. - Adds guest_home: str to BottlePlan base dataclass. - Both backends' prepare steps populate plan.guest_home. - contrib/{claude,codex}/agent_provider.py read plan.guest_home (was plan.workspace_plan.guest_home). - bot_bottle/backend/docker/provision/git.py reads plan.guest_home for the gitconfig destination (was hardcoded "/home/node"). - bot_bottle/backend/smolmachines/provision/git.py drops the _GUEST_HOME / _guest_home() helpers and reads plan.guest_home. - Tests that construct BottlePlan subclasses directly pass guest_home="/home/node" explicitly.
refactor(agent_provider): drop GUEST_HOME default, backend drives guest_home
2026-06-03 21:35:41 -04:00 · 2026-06-03 21:35:41 -04:00 · 2026-06-03 21:35:41 -04:00 · 2026-06-03 21:35:41 -04:00 · 2026-06-03 21:35:41 -04:00 · 2026-06-03 21:35:41 -04:00
3 changed files with 0 additions and 712 deletions
@@ -1,283 +0,0 @@
-# PRD 0049: Named / Labelled Agents
-
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-06-03
- **Issue:** #171
-
-## Summary
-
-At agent launch time, prompt the operator for a short human-readable label
-(defaulting to the manifest agent key) and an optional color from the 16-color
-ANSI palette. Store both in the bottle's `metadata.json`. Display the label —
-rendered in the chosen color — in the dashboard's active-agents pane, replacing
-the bare manifest key. Inject the label and color into the in-container
-`claude.json` as `name` / `color` so Claude Code can surface them in its own
-harness when upstream support lands.
-
-## Problem
-
-The dashboard's agents pane identifies each running instance by its manifest
-agent key (e.g., `implementer`) plus a random slug suffix. When an operator
-runs three `implementer` bottles simultaneously — one each for three different
-repos — the pane shows:
-
-```
-  [docker] a3f9  implementer  started 14:02:11  [egress,pipelock]
-  [docker] b81c  implementer  started 14:03:45  [egress,pipelock]
-  [docker] d220  implementer  started 14:05:01  [egress,pipelock]
-```
-
-There is no way to tell which bottle is working on which task without attaching
-to each one in turn. The slug is opaque; the manifest key is shared. Operators
-working a multi-bottle session resort to keeping a mental map of slug→task,
-which breaks the moment they switch windows.
-
-## Goals / Success Criteria
-
-1. After the operator selects an agent name (dashboard picker or CLI argument),
-   they are prompted for a label. The prompt suggests the manifest key as the
-   default; pressing Enter (or providing no input) accepts it. The label may
-   contain any printable characters up to 64 bytes.
-2. After the label prompt, the operator is optionally prompted for a color from
-   the 16-color ANSI palette (names: `black`, `red`, `green`, `yellow`, `blue`,
-   `magenta`, `cyan`, `white`, `bright-black`, `bright-red`, `bright-green`,
-   `bright-yellow`, `bright-blue`, `bright-magenta`, `bright-cyan`,
-   `bright-white`). Pressing Enter without a selection skips color entirely.
-3. `label` and `color` are stored in `BottleMetadata` and written to the
-   bottle's `metadata.json`. Both fields default to `""` (empty / unset).
-4. `ActiveAgent` carries `label` and `color`; `enumerate_active()` reads them
-   from `metadata.json`.
-5. `_format_agent_row` uses the label when non-empty (falling back to
-   `agent_name`). If a non-empty color is set and the terminal supports it, the
-   label substring is rendered in that color.
-6. `BottleSpec` carries `label` and `color`; the docker backend's `prepare`
-   step copies them into `BottleMetadata`.
-7. `agent_provider.py` writes `label` → `"name"` and `color` → `"color"` into
-   the generated `claude.json`, alongside the existing fields. Fields are
-   omitted when empty.
-8. The dashboard's `_new_agent_flow` (PRD 0020) includes the label+color step
-   between agent selection and the backend picker.
-9. `cmd_start` (CLI) includes the label+color step after argument validation
-   and before prepare-with-preflight.
-10. All existing unit tests stay green; no new tests are required for this
-    change (the label/color fields are thin plumbing with no branching logic
-    worth unit-testing beyond the already-tested metadata read/write path).
-
-## Non-goals
-
- Showing the agent label inside the Claude Code TUI (status line, terminal
-  title, custom header). That requires upstream Claude Code / codex support.
-  Writing to `claude.json` is best-effort scaffolding for when that lands.
- Per-bottle color affecting anything outside the dashboard agents pane (e.g.,
-  proposal-pane highlights, log prefixes).
- Validating or constraining label content beyond the 64-byte printable cap.
- Persisting color-pair state across dashboard restarts (color pairs are
-  initialized fresh each session).
- Editing the label or color of an already-running bottle.
- Exposing label/color via `./cli.py list` (out of scope for v1; trivial to
-  add later since the field will be in metadata).
-
-## Design
-
-### Data flow
-
-```
-operator input
-     │
-     ▼
-BottleSpec.label, BottleSpec.color
-     │
-     ├─► docker/prepare.py → BottleMetadata.label / .color → metadata.json
-     │
-     └─► agent_provider.py → claude.json {"name": label, "color": color}
-                                              (omitted when empty)
-
-dashboard refresh
-     │
-     ▼
-enumerate_active() → read_metadata(slug) → ActiveAgent.label / .color
-     │
-     ▼
-_format_agent_row → label (colored) in the row string
-```
-
-### BottleSpec changes
-
-```python
-@dataclass(frozen=True)
-class BottleSpec:
-    manifest: Manifest
-    agent_name: str
-    copy_cwd: bool
-    user_cwd: str
-    identity: str = ""
-    label: str = ""   # operator-chosen display name; defaults to agent_name at render time
-    color: str = ""   # one of the 16 ANSI color names, or "" for terminal default
-```
-
-`label` and `color` default to `""` so all existing callers remain valid with
-no changes.
-
-### BottleMetadata changes
-
-Add two new fields with backward-compatible defaults:
-
-```python
-@dataclass
-class BottleMetadata:
-    identity: str
-    agent_name: str
-    cwd: str
-    copy_cwd: bool
-    started_at: str
-    compose_project: str
-    backend: str
-    label: str = ""
-    color: str = ""
-```
-
-`metadata.json` written by older bot-bottle versions won't have these keys;
-`read_metadata` already uses `dict.get` with defaults, so existing slugs load
-cleanly with `label=""`, `color=""`.
-
-### ActiveAgent changes
-
-```python
-@dataclass(frozen=True)
-class ActiveAgent:
-    backend_name: str
-    slug: str
-    agent_name: str
-    started_at: str
-    services: tuple[str, ...]
-    label: str = ""
-    color: str = ""
-```
-
-`enumerate_active()` copies `label` and `color` out of `BottleMetadata` when
-constructing each `ActiveAgent`. The smolmachines backend gets the same
-additions for symmetry; it reads from its own metadata path.
-
-### Dashboard row rendering
-
-`_format_agent_row` already falls through cleanly on missing fields. The
-change is:
-
-```python
-display_name = a.label if a.label else a.agent_name
-```
-
-Color rendering uses the existing `_try_init_green()` pattern as a model.
-A `_color_pair_for(color_name)` helper initialises a fresh curses color pair
-for the requested named color and returns its attr (or 0 on failure). Each
-unique color in the active agent list gets its own pair index. Color pairs are
-allocated lazily and cached in a `dict[str, int]` that lives for the duration
-of the dashboard session.
-
-The 16 ANSI color name → curses constant mapping:
-
-| Name | curses constant |
-|------|----------------|
-| `black` | `curses.COLOR_BLACK` |
-| `red` | `curses.COLOR_RED` |
-| `green` | `curses.COLOR_GREEN` |
-| `yellow` | `curses.COLOR_YELLOW` |
-| `blue` | `curses.COLOR_BLUE` |
-| `magenta` | `curses.COLOR_MAGENTA` |
-| `cyan` | `curses.COLOR_CYAN` |
-| `white` | `curses.COLOR_WHITE` |
-| `bright-*` | same constant + `curses.A_BOLD` |
-
-Terminals that don't support color fall back to plain text (the helper returns
-0, which ORed in is a no-op — same pattern as `_try_init_green`).
-
-### Label + color prompt — dashboard
-
-In `_new_agent_flow`, after `_picker_modal` returns a non-None name and before
-`_backend_picker_modal`:
-
-```python
-label, color = _label_color_modal(stdscr, default_label=picked)
-```
-
-`_label_color_modal` uses `curses.endwin()` → text-mode prompts → restore
-(the same drop-and-resume pattern as the existing editor flow and preflight
-Y/N). Two sequential prompts:
-
-```
-bot-bottle: agent label [implementer]: <operator types>
-bot-bottle: color (red/green/blue/… or Enter to skip): <operator types>
-```
-
-Invalid color names are silently ignored (treated as empty). The function
-returns `(label, color)` — both strings, both possibly `""`.
-
-### Label + color prompt — CLI
-
-In `cmd_start`, after argument parsing and before `_launch_bottle`:
-
-```python
-label = _text_prompt_label(args.name)
-color = _text_prompt_color()
-```
-
-`_text_prompt_label(default)` writes `"bot-bottle: agent label [{default}]: "`
-to stderr and returns the stripped input (or `default` if blank).
-`_text_prompt_color()` writes the color prompt and returns the stripped input
-(or `""` if blank or invalid).
-
-Both use `read_tty_line()` (already in `start.py`) for the read.
-
-### Claude Code config injection
-
-In `agent_provider.py`, where `claude_config.write_text(...)` is called,
-expand the JSON dict conditionally:
-
-```python
-payload = {
-    "hasCompletedOnboarding": True,
-    "theme": "dark",
-    "bypassPermissionsModeAccepted": True,
-    "projects": claude_projects,
-}
-if spec.label:
-    payload["name"] = spec.label
-if spec.color:
-    payload["color"] = spec.color
-claude_config.write_text(json.dumps(payload, indent=2) + "\n")
-```
-
-`spec` here is the `AgentProvisionSpec` (or equivalent) that `agent_provider`
-already receives; it needs `label` and `color` threaded in from `BottleSpec`
-through whatever plan/provision object the provider operates on.
-
-## Implementation chunks
-
-Two PRs, each independently mergeable.
-
-### Chunk 1 — schema + storage
-
- Add `label: str = ""` and `color: str = ""` to `BottleSpec`,
-  `BottleMetadata`, and `ActiveAgent`.
- `docker/prepare.py`: copy `spec.label` / `spec.color` into `BottleMetadata`.
- `docker/enumerate.py`: copy `metadata.label` / `metadata.color` into
-  `ActiveAgent`.
- `agent_provider.py` (or the plan object it reads): thread label/color through
-  to `claude.json` write.
- Smolmachines backend: parallel changes to metadata read/write and
-  `ActiveAgent` construction.
- No prompt changes; no UI changes. All existing behavior is identical.
-
-### Chunk 2 — prompts + display
-
- `start.py`: add `_text_prompt_label` and `_text_prompt_color`; call them in
-  `cmd_start` before `_launch_bottle`; pass `label` / `color` into `BottleSpec`.
- `dashboard.py`: add `_label_color_modal` (drop-and-resume); call it in
-  `_new_agent_flow`; pass label/color into `BottleSpec`; add
-  `_color_pair_for` helper; update `_format_agent_row` to use `a.label` with
-  color rendering.
-
-## Open questions
-
-None.
@@ -1,151 +0,0 @@
-# Gitea Webhook Agent Dispatch
-
-## Question
-
-How should bot-bottle spawn and manage agents in response to Gitea PR events — and how do we reuse the same agent (with its full session context) across every event in a PR's lifecycle?
-
-## Summary
-
-A lightweight webhook receiver maps Gitea PR events to `cli.py` invocations. Spawning is straightforward: the existing work on non-interactive run mode (see [host-dispatch-to-container-agents.md](host-dispatch-to-container-agents.md)) is the missing piece. Session continuity is harder: it requires tracking two identifiers per open PR — the **bottle identity** (bot-bottle's slug for the container state dir) and the **Claude session ID** (the UUID Claude writes to its JSONL transcript). The transcript snapshot mechanism already used by capability-block is the right foundation; it just needs a non-interactive path and a PR-keyed store.
-
-## Gitea Webhook Events for PR Lifecycle
-
-Gitea fires `X-Gitea-Event: pull_request` (with an `action` field) for most PR state changes. The payload always includes `pull_request.number`, which is the stable key for correlating events to a running agent.
-
-| `X-Gitea-Event` value | Relevant `action` values | When it fires |
-|---|---|---|
-| `pull_request` | `opened`, `reopened`, `closed`, `synchronized` | PR created, closed, or pushed to |
-| `pull_request_comment` | `created`, `edited` | Timeline comment posted |
-| `pull_request_review_approved` | — | Review submitted with approval |
-| `pull_request_review_rejected` | — | Review submitted requesting changes |
-| `pull_request_review_comment` | — | Inline code review comment |
-| `pull_request_sync` | — | New commits pushed to the PR branch |
-
-`pull_request` with `action: synchronized` and `pull_request_sync` both fire on push; they carry the same information but are separate subscriptions in the webhook config UI. Subscribe to `pull_request` and `pull_request_review` (the umbrella) plus `pull_request_comment` to cover the full lifecycle.
-
-The webhook receiver validates the `X-Gitea-Signature-256` HMAC header (SHA-256 of the raw body, keyed by the configured secret) before dispatching.
-
-## Spawning an Agent From a Webhook
-
-### What we need from bot-bottle
-
-The current `cli.py start` is interactive — it prompts y/N and attaches a tty. A webhook handler needs a non-interactive mode that:
-
-1. Starts the container for a named agent.
-2. Runs `claude -p "<task>" --output-format json --dangerously-skip-permissions` inside it (no tty, no session picker).
-3. Captures stdout as JSON, extracts `session_id`.
-4. Blocks until Claude exits, then tears down.
-
-The [host-dispatch-to-container-agents](host-dispatch-to-container-agents.md) research proposes `cli.py run <agent> <task>` for exactly this. That command is the prerequisite for everything below. It should return the Claude JSON output so callers can extract `session_id`.
-
-### Webhook receiver sketch
-
-The receiver is a small HTTP service (Flask, FastAPI, or a Go net/http handler) running alongside bot-bottle on the host. It:
-
-1. Validates the HMAC signature.
-2. Extracts `pull_request.number` and `X-Gitea-Event` / `action`.
-3. Looks up whether a bottle already exists for this PR number.
-4. Spawns or resumes accordingly (see next section).
-5. Optionally posts a comment back to the PR via Gitea API once Claude finishes.
-
-The receiver does not need to be async or queue-based for a single-repo bot, but should at minimum serialize events for the same PR number (a per-PR lock) to avoid two concurrent sessions clobbering each other's transcript.
-
-## Reusing the Same Agent Across a PR
-
-This is the harder problem. Two separate identities need to be tracked and connected:
-
-### Identity 1: bottle identity (bot-bottle slug)
-
-The slug is the per-bottle state directory name (`~/.bot-bottle/state/<slug>/`). It's what `cli.py resume <slug>` uses to relaunch a container and mount the preserved state — including the transcript snapshot. This already works for the capability-block flow.
-
-### Identity 2: Claude session ID
-
-Claude Code's `--output-format json` response includes a `session_id` UUID. Passing `--resume <session_id>` on a subsequent non-interactive run makes Claude continue from exactly that conversation, with full memory of prior tool calls. `--continue` (which maps to `resume_args` in `agent_provider.py`) only picks up the *most recent* session in the project directory — unsafe when multiple sessions may be running concurrently.
-
-The session JSONL lives at `~/.claude/projects/<encoded-cwd>/<session_id>.jsonl` inside the container guest. The transcript snapshot (`snapshot_transcript(slug)` in `capability_apply.py`) copies all of `~/.claude` out of the container before teardown, so the JSONL is preserved in `~/.bot-bottle/state/<slug>/transcript/.claude/`. When the bottle is relaunched and the transcript remounted, `claude --resume <session_id>` can find the JSONL at the right path.
-
-### Per-PR session registry
-
-The receiver needs a small persistent map:
-
-```
-PR number → { bottle_identity: str, claude_session_id: str, agent_name: str }
-```
-
-The simplest implementation is a JSON file at `~/.bot-bottle/pr-sessions.json`, written after each successful first-run and updated with each resume. A sqlite database is better if concurrent multi-repo support is needed.
-
-### Full lifecycle flow
-
-```
-PR opened
-  → webhook: action=opened
-  → no entry in pr-sessions.json
-  → cli.py run <agent> "Review PR #N: <title>\n<diff URL>"
-      → starts container, runs claude -p ... --output-format json
-      → on success: captures session_id from JSON output
-      → snapshot_transcript(slug)
-      → tears down container
-  → write pr-sessions.json: { pr: N, slug: <slug>, session_id: <uuid> }
-
-PR gets new commit
-  → webhook: action=synchronized OR pull_request_sync
-  → look up pr-sessions.json: found slug + session_id
-  → cli.py run-resume <slug> --claude-session <session_id> "New commits pushed. Review the diff."
-      → relaunches container with transcript snapshot mounted
-      → runs claude -p ... --resume <session_id> --output-format json
-      → captures new session_id (same or rotated)
-      → snapshot_transcript(slug) again
-  → update pr-sessions.json with latest session_id
-
-Comment @-mentions bot
-  → webhook: pull_request_comment, action=created
-  → extract comment body, check for bot mention
-  → same resume flow as above with comment as the prompt
-
-PR closed / merged
-  → webhook: action=closed
-  → cli.py cleanup <slug> (or equivalent)
-  → remove from pr-sessions.json
-```
-
-### What needs to be built
-
-| Piece | Status | Notes |
-|---|---|---|
-| `cli.py run <agent> <task>` | Missing | Non-interactive start; see host-dispatch research |
-| `cli.py run-resume <slug> --claude-session <id> <task>` | Missing | Like `resume` but non-interactive, passes `--resume <id>` to claude |
-| `snapshot_transcript` on clean exit | Exists (PRD 0012) | Already called from `start.py`'s session-end path |
-| Transcript remount on resume | Exists | `bottle_state.py::transcript_snapshot_dir` → docker cp in on launch |
-| PR session registry | Missing | Needs to be designed; `~/.bot-bottle/pr-sessions.json` is the simplest start |
-| Webhook receiver service | Missing | New service; needs to be a declared bottle or run as a host process |
-
-## Known Rough Edges
-
-**Session ID is not available from within the session.** The ID is only in the `--output-format json` result, readable after the process exits. There is no env var or hook that exposes it mid-session ([upstream issue #44607](https://github.com/anthropics/claude-code/issues/44607)). For the webhook bot this is fine — the outer receiver reads it from the subprocess result.
-
-**`--continue` vs `--resume <id>`:** The existing `resume_args = ("--continue",)` in `agent_provider.py` picks up the *most recent* session. For an interactive single-user resume this is fine. For a webhook bot that may have multiple open PRs, it is not safe — two PRs' transcripts would collide if they share a project directory encoding. Use `--resume <session_id>` explicitly.
-
-**Project directory encoding.** Claude stores sessions keyed by the absolute cwd, encoded as a path. Inside the container the cwd is always `/home/node` or a subdir. As long as every run for the same PR uses the same cwd, `--resume <session_id>` will find the right JSONL. The cwd should be pinned per PR entry in the session registry.
-
-**Concurrent events for the same PR.** If two webhooks arrive close together (e.g., push + CI comment), the receiver must serialize them. A per-PR asyncio lock or a simple file lock on the session registry entry is enough.
-
-**Context window growth.** Each resume appends to the same session. A PR with many round trips will eventually hit the context limit. Mitigation options: start a fresh Claude session (new `cli.py run`) periodically and carry forward a summary; or rely on Claude's built-in compaction. The session registry could include a turn count to trigger rotation.
-
-**Webhook delivery ordering.** Gitea does not guarantee ordered delivery or exactly-once delivery. The receiver should be idempotent (same PR event processed twice should not create two bottles) and should ignore events for closed PRs.
-
-## Relationship to Existing Bot-Bottle Infrastructure
-
-The transcript snapshot + bottle identity system (PRD 0012, `capability_apply.py`) was designed for the capability-block flow: an operator-triggered resume after a security event. The webhook flow is the same mechanism on a faster loop driven by Gitea events instead of operator action. The implementation delta is:
-
-1. Non-interactive run mode (the `cli.py run` gap already identified in host-dispatch research).
-2. Passing `--resume <session_id>` explicitly rather than `--continue`.
-3. A PR-keyed registry to connect PR numbers to bottle identities and session IDs.
-4. A webhook receiver to drive the loop.
-
-These are additive changes that sit on top of the existing transcript preservation machinery without altering it.
-
-## Recommendation
-
-Start with the non-interactive run mode (`cli.py run`) since everything else depends on it. Once that exists, the webhook receiver and session registry are straightforward glue. The receiver should run as a host process (not inside a bottle) since it needs to call `cli.py` and manage the session registry file. Serialize per-PR to avoid concurrency bugs. Use `--resume <session_id>` (not `--continue`) for all resume paths.
-
-The PR session registry is deliberately minimal to start — a JSON file is fine. If multi-repo or multi-agent scenarios appear, migrating to sqlite is a one-file change.
@@ -1,278 +0,0 @@
-# Local Ollama: Deployment Topology, Harness Selection, and Model Sizing
-
-Research notes on running Ollama locally for a bot-bottle coding agent workflow.
-Covers the native-vs-VM question, which harness integrates best with an agent loop,
-and which models make sense on an RTX 3070 (8 GB VRAM / 30 GB RAM) machine.
-
---
-
-## 1. Deployment topology: native, container, or VM?
-
-The core question is whether running Ollama in a VM significantly degrades inference
-performance. The short answer: a full KVM/QEMU VM with GPU passthrough adds roughly
-2–5% overhead, Docker on Linux adds roughly 1–2%, and LXC containers add sub-1%. None
-of these are significant for interactive coding use.
-
-### Native (bare metal)
-
-Zero overhead, immediate GPU access, simplest setup. The right default for a solo
-developer doing inference on their own workstation.
-
-### Docker containers on Linux + NVIDIA
-
-With `nvidia-container-toolkit` and `--gpus all`, containerized Ollama runs at
-essentially native speed (~1–2% overhead on Linux). The dramatic exception is macOS,
-where Docker Desktop runs a Linux VM with no access to Apple's Metal/GPU — inference
-is 5–6× slower. On Linux/Windows with NVIDIA hardware, Docker is fine.
-
-Common pitfall: if `docker exec ollama ollama ps` shows 0 GPU layers, the container
-fell back to CPU. Usual causes: stale VRAM allocation, missing `nvidia-container-toolkit`,
-or a host driver too old for the container's CUDA version.
-
-### KVM/QEMU VM with full PCIe passthrough
-
-Full GPU passthrough makes the GPU invisible to the host while the VM owns it. Overhead
-from the IOMMU translation layer and virtualized PCIe bus is ~2–5%. This is viable if
-you need VM-level isolation (snapshotting, migration, separate kernel). Setup complexity
-is non-trivial: BIOS IOMMU, IOMMU group management, VFIO driver binding. Once configured
-it is stable.
-
-**Critical gotcha:** set the VM's CPU type to `host`. If left at the default
-(`x86-64-v2-AES` / "QEMU Virtual CPU version 2.5+"), Ollama may silently disable GPU
-support even when drivers appear correct.
-
-### LXC containers (Proxmox et al.)
-
-The sweet spot for isolation without overhead. Sub-1% performance difference from bare
-metal because LXC shares the host kernel; GPU device files are bind-mounted into the
-container. The tradeoff is weaker isolation (shared kernel) and the requirement that
-host and container driver versions match. Not suitable if you need VM-level snapshots
-or live migration.
-
-### Summary
-
-| Topology | GPU overhead | Isolation | Complexity |
-|---|---|---|---|
-| Native | 0% | None | Low |
-| Docker (Linux) | ~1–2% | Process | Low |
-| LXC | <1% | Namespace | Medium |
-| KVM passthrough | 2–5% | Full VM | High |
-| VM no passthrough | CPU-only | Full VM | Medium |
-
-Running Ollama in a VM will **not** significantly slow inference as long as GPU passthrough
-is configured. Without passthrough (software rendering / CPU fallback) performance
-collapses — that is what the user is rightly worried about.
-
-### Local vs. remote server
-
-| Factor | Local machine | Remote server |
-|---|---|---|
-| Latency | Near-zero | Network round-trip; cumulative in agent loops |
-| Cost | Zero after hardware | Per-token or subscription |
-| Privacy | 100% on-device | Data leaves the machine |
-| Model size ceiling | VRAM-limited | No hard limit (671B+ feasible) |
-| Offline use | Yes | No |
-| Concurrency under load | Sequential by default | Scales horizontally |
-
-For agentic coding workflows making 20–50 tool calls per session, network latency
-accumulates quickly. Local inference eliminates this. A practical hybrid pattern:
-use the local GPU for routine coding loops; route only to a remote API for tasks
-requiring a 70B+ model or very long context (>128K tokens).
-
---
-
-## 2. Harness selection
-
-The landscape in 2026 has settled into three categories: IDE plugins, terminal agents,
-and chat UIs.
-
-### Continue.dev — recommended IDE plugin
-
-Open-source VS Code / JetBrains / Zed / Vim extension. Routes autocomplete, chat, and
-refactoring commands to any configured LLM backend (Ollama, cloud APIs). The recommended
-setup uses two models: a small FIM-capable model for inline autocomplete (Qwen2.5-Coder 7B)
-and a larger model for chat/edit. Handles inline completions, multi-file edits, and
-codebase-aware chat. No API key, no data leaving the machine.
-
-### Aider — recommended for git-native terminal workflows
-
-Terminal-based coding agent. Builds a codebase map before editing, makes changes
-directly, and auto-commits to git with readable messages. Every change is one
-`git revert` away. Supports 100+ languages; connects to any Ollama-served model
-via the OpenAI-compatible API. Best for terminal-first developers who want
-version-controlled agent interactions. Does not do inline autocomplete.
-
-### OpenCode — recommended for bot-bottle–style agent loops
-
-Terminal-based coding agent with 15 built-in tools (bash execution, file read/write/edit,
-grep, glob, web fetch, MCP support) and connections to 75+ model providers including
-local Ollama models. This is the closest open-source equivalent to a Claude Code–style
-plan → tool-call → execute → observe → loop. Native Ollama integration.
-
-**Critical setup note:** Ollama defaults to a 4096-token context window, which is
-completely insufficient for an agent loop carrying conversation history, tool schemas,
-a system prompt, and code simultaneously. Configure at least 64K tokens explicitly
-in the model's context settings.
-
-### Cline — agentic VS Code assistant
-
-VS Code extension that operates as an autonomous agent: plans, edits files, runs commands
-in a loop, connects to Ollama's local endpoint. Compared to OpenCode it lives inside the
-IDE rather than the terminal; compared to Continue.dev it is a full agent rather than a
-plugin. Its system prompt overhead is higher (~7,000–10,000 tokens) than minimal harnesses.
-
-### Open WebUI / Jan / LM Studio — chat UIs, not coding harnesses
-
-These are browser or desktop chat interfaces useful for ad-hoc conversations (explaining
-APIs, drafting documentation, exploring ideas) but without IDE integration, autocomplete,
-or git integration. LM Studio offers the smoothest onboarding (visual model browser with
-VRAM estimates). Jan is the most privacy-auditable (fully open-source, Apache 2.0, no
-telemetry). Neither is a replacement for a coding harness.
-
-### Harness comparison
-
-| Harness | Type | Autocomplete | Agent loop | Ollama | Git integration |
-|---|---|---|---|---|---|
-| Continue.dev | IDE plugin | Yes (FIM) | Basic | Native | No |
-| Aider | Terminal agent | No | Multi-turn | Via API | Auto-commit |
-| OpenCode | Terminal agent | No | Full tools | Native | Via bash |
-| Cline | IDE agent | No | Full tools | Via API | Via bash |
-| Open WebUI | Chat UI | No | No | Native | No |
-| Jan | Chat UI | No | No | Native | No |
-
-For a bot-bottle workflow (an isolated sandbox running an agentic loop with tool access),
-**OpenCode** is the closest open-source match. For an IDE-first developer who wants
-autocomplete + chat, **Continue.dev + Qwen2.5-Coder 7B** is the recommended pair.
-
---
-
-## 3. Model selection: RTX 3070 (8 GB VRAM / 30 GB RAM)
-
-### VRAM hard limits at Q4_K_M quantization
-
-| Model size | Approx. VRAM (Q4_K_M) | Fits in 8 GB? | Tokens/sec (RTX 3070) |
-|---|---|---|---|
-| 3–4B | 2.5–3.5 GB | Yes, with headroom | 60–90 |
-| 7–8B | 5–6 GB | Yes | 35–55 |
-| 12–14B | 7.5–9 GB | Edge / RAM offload | 8–18 |
-| 22B+ | 14+ GB | No | — |
-
-The RTX 3070 has high memory bandwidth for its VRAM tier and consistently outperforms
-the newer RTX 4060 Ti on token generation speed. Bandwidth matters more than raw compute
-for inference.
-
-### Does Gemma 4 exist?
-
-Yes. Google released **Gemma 4** on 2 April 2026 (Apache 2.0). The family includes
-E2B (2B), E4B (4B), a 26B MoE, and a 31B Dense. A 12B multimodal variant was announced
-2026-06-04. The 31B scores 80.0% on LiveCodeBench v6 — a major jump from Gemma 3 27B
-at 29.1%. However, only the E4B fits comfortably within 8 GB VRAM:
-
-| Variant | VRAM (approx.) | Fits? |
-|---|---|---|
-| Gemma 4 E2B | ~2 GB | Yes |
-| Gemma 4 E4B | ~5 GB | Yes |
-| Gemma 4 12B | ~8–9 GB (Q4) | Edge |
-| Gemma 4 26B MoE | 14–18 GB | No |
-| Gemma 4 31B Dense | ~20 GB | No |
-
-### Model-by-model evaluation
-
-**Qwen2.5-Coder 7B — primary recommendation**
-
-The strongest purpose-built coding model that fits fully within 8 GB VRAM. Leads
-HumanEval among 7–8B-class models. Strong on Python, JavaScript, TypeScript. Has
-FIM (fill-in-the-middle) support for inline autocomplete. 35–55 tok/sec on RTX 3070.
-
-```
-ollama pull qwen2.5-coder:7b
-```
-
-**Qwen2.5-Coder 14B — secondary, with RAM offloading**
-
-At Q4_K_M this needs ~8.7 GB, just over the 8 GB limit. With 30 GB system RAM, Ollama
-automatically offloads the overflow layers to CPU. Performance drops to ~8–18 tok/sec
-versus 35–55 tok/sec for the 7B fully in VRAM. Quality is noticeably better for complex
-multi-file reasoning. Viable for chat-based coding tasks where quality matters more than
-speed; too slow for live autocomplete. Keep context window at 8K tokens to minimize
-VRAM pressure during offloaded inference.
-
-```
-ollama pull qwen2.5-coder:14b
-```
-
-**Gemma 4 E4B (~5 GB VRAM)**
-
-Fits comfortably with 3 GB to spare. Strong on reasoning, multimodal, and general-purpose
-tasks. Less specialized for coding than Qwen2.5-Coder 7B. Good choice for one model that
-covers coding + general reasoning + image analysis. The E4B outperforms Gemma 3 equivalents
-significantly on coding benchmarks.
-
-```
-ollama pull gemma4:e4b
-```
-
-**Phi-4 Mini 3.8B (~3 GB VRAM)**
-
-Best reasoning-per-VRAM model; leaves ~5 GB free for other applications. Strong on math,
-logic, and structured output. Good for agentic sub-tasks requiring tight reasoning. Not the
-strongest at raw code synthesis but excellent for reasoning-heavy parts of a coding loop.
-Viable as the autocomplete model in a two-model Continue.dev setup.
-
-```
-ollama pull phi4-mini
-```
-
-**DeepSeek-R1 8B (~5–6 GB VRAM)**
-
-Strong reasoning model for logic-heavy code (algorithms, correctness proofs). The full
-DeepSeek-Coder-V2 (236B MoE) is impractical here — only the 8B distilled variants are
-relevant. Outperforms Gemma 4 E4B on reasoning-heavy benchmarks; weaker on raw code
-generation than Qwen2.5-Coder 7B.
-
-**Codestral — not viable at 8 GB**
-
-The top FIM autocomplete model on HumanEval-FIM benchmarks, but requires 12–16 GB VRAM
-minimum. Not an option here. Worth revisiting if upgrading to a 12 GB+ card (RTX 4070
-Super or newer).
-
-### RAM offloading: does 30 GB help?
-
-Yes, meaningfully. Ollama automatically splits layers between GPU and system RAM when
-VRAM is exceeded. With 30 GB RAM, models up to ~14B at Q4_K_M run with partial offloading.
-The tradeoff is a 2–5× throughput penalty (8–18 tok/sec vs 35–55 tok/sec). Acceptable
-for batch tasks (reviewing a PR, generating an algorithm); too slow for live autocomplete.
-
-### Recommended setup
-
-**Autocomplete (fast, always-in-VRAM):** `qwen2.5-coder:7b`
- Configure in Continue.dev as the tab-completion model
- FIM-capable; 35–55 tok/sec; fits with 2–3 GB VRAM to spare
-
-**Chat / agent loop (quality-first):** `qwen2.5-coder:14b` or `gemma4:e4b`
- 14B for strongest multi-file coding; expect 8–18 tok/sec with RAM offload
- Gemma 4 E4B if you want vision + general reasoning + coding in one model; ~60 tok/sec
-
-**Two-model Continue.dev config (lower VRAM pressure):**
-`phi4-mini` (autocomplete) + `qwen2.5-coder:7b` (chat) — both fit simultaneously with
-~1–2 GB to spare, keeping the OS and IDE from contending for VRAM.
-
---
-
-## Sources
-
- [Ollama on Proxmox: GPU Passthrough for LXC and VM AI Workloads](https://linuxprofessional.ie/article.php?slug=ollama-proxmox-gpu-passthrough-lxc-vm)
- [Run Ollama with NVIDIA GPU in Proxmox VMs and LXC containers](https://www.virtualizationhowto.com/2025/05/run-ollama-with-nvidia-gpu-in-proxmox-vms-and-lxc-containers/)
- [Ollama Performance Tuning: Getting Maximum Speed from Local LLMs](https://dasroot.net/posts/2026/01/ollama-performance-tuning-gpu-acceleration-model-quantization/)
- [Pros and Cons: Containerized Ollama vs. Local Setup](https://alain-airom.medium.com/pros-and-cons-using-containerized-ollama-vs-local-setup-d9bdf225bbb5)
- [Best Local Coding Models Ranked: Every VRAM Tier (2026)](https://insiderllm.com/guides/best-local-coding-models-2026/)
- [Best Local LLMs for RTX 4060, RTX 3070, and RTX 5060](https://aiagentskit.com/blog/best-local-llms-rtx-4060-3070-5060/)
- [Best Local LLMs for 8GB VRAM: Real Hardware Benchmarks (2026)](https://localllm.in/blog/best-local-llms-8gb-vram-2025)
- [Self-Hosted AI Coding Agent: Ollama + Continue + Open WebUI Setup in 2026](https://www.web3aiblog.com/blog/self-hosted-ai-coding-agent-ollama-continue-2026)
- [Best Local-First AI Coding Tools 2026: 14 Compared](https://nimbalyst.com/blog/best-local-first-ai-coding-tools-2026/)
- [OpenCode + Ollama: Private Local AI Coding Agent Setup](https://lushbinary.com/blog/opencode-ollama-local-ai-coding-privacy-guide/)
- [Gemma 4: Google DeepMind](https://deepmind.google/models/gemma/gemma-4/)
- [Running Gemma 4 Locally: VRAM Requirements](https://knightli.com/en/2026/05/01/gemma-4-local-vram-quantization-table/)
- [Phi-4 Mini vs. Gemma 3 vs. Qwen 2.5: Best SLM for Coding Tasks in 2026](https://botmonster.com/ai/phi-4-mini-vs-gemma-3-vs-qwen-25-best-slm-coding-2026/)
- [Qwen2.5-Coder 14B VRAM Requirements Guide](https://willitrunai.com/blog/qwen-2-5-coder-14b-vram-requirements)
- [Comparing AI Harnesses: OpenCode, Ollama, LM Studio, Claude Code, Open WebUI, and VS Code](https://jace.pro/blog/comparing-ai-harnesses-opencode-ollama-lm-studio-claude-code-open-webui-and-vs-code/)
Author	SHA1	Message	Date
didericis-claude	266013095e	refactor(backend): hoist guest_home to BottlePlan base test / unit (pull_request) Successful in 39s Details test / integration (pull_request) Successful in 49s Details Per PR review feedback (review #132): guest_home shouldn't be buried inside workspace_plan / read from a hardcoded literal in each provision module. It's a cross-cutting bottle property — the backend's prepare step knows it, and every downstream consumer (contrib providers, git provisioning, gitconfig path) should read it from one place. - Adds guest_home: str to BottlePlan base dataclass. - Both backends' prepare steps populate plan.guest_home. - contrib/{claude,codex}/agent_provider.py read plan.guest_home (was plan.workspace_plan.guest_home). - bot_bottle/backend/docker/provision/git.py reads plan.guest_home for the gitconfig destination (was hardcoded "/home/node"). - bot_bottle/backend/smolmachines/provision/git.py drops the _GUEST_HOME / _guest_home() helpers and reads plan.guest_home. - Tests that construct BottlePlan subclasses directly pass guest_home="/home/node" explicitly.	2026-06-03 21:35:41 -04:00
didericis-claude	df1091113c	refactor(agent_provider): drop GUEST_HOME default, backend drives guest_home Per PR review feedback (review #130): the GUEST_HOME = '/home/node' default in agent_provider.py was driving the wrong direction — the agent provider shouldn't ship its own opinion about the guest home, the backend should. - Removes the GUEST_HOME constant. - Makes guest_home a required kwarg on AgentProvider.provision_plan and the agent_provision_plan shim (no default). - Drops module-level _SKILLS_DIR / _PROMPT_PATH constants from contrib/{claude,codex}/agent_provider.py; both providers now derive the in-guest paths from plan.workspace_plan.guest_home at call time, which the backend's prepare step populated. - Updates tests/unit/test_agent_provider.py callers to pass guest_home explicitly. The backend prepare paths already pass it; no production-code call sites changed.	2026-06-03 21:35:41 -04:00
didericis-claude	df0f1ad980	refactor(contrib): inline provision steps per-provider, drop shared apply module Each AgentProvider now owns its skills / prompt / provision / supervise_mcp end-to-end. The base ABC declares all four as abstract; ClaudeAgentProvider and CodexAgentProvider each carry their own copy loop. Per PR review feedback (review #128): the shared _provision_apply.py abstraction was weak — Claude and Codex harnesses already diverge (codex's dummy-auth + login-status verify has no claude analogue) and forcing both onto one helper just postpones the split. Duplication is intentional. Deletes bot_bottle/_provision_apply.py and consolidates testing under tests/unit/test_contrib_{claude,codex}_provider.py (one file per provider, covering all four methods).	2026-06-03 21:35:41 -04:00
didericis-claude	970c5066d7	feat(agent_provider): migrate tests, drop guest-home/skills-dir env knobs, activate PRD 0050 - tests/unit/test_provision_apply.py covers the new shared apply helpers (apply_skills / apply_prompt / apply_provision) that replace the per-backend modules deleted in the prior commit. - tests/unit/test_contrib_supervise_mcp.py covers both providers' provision_supervise_mcp behavior — confirms the codex bottle now runs `codex mcp add` symmetrically with claude. - tests/unit/test_smolmachines_provision.py drops the four test classes whose subjects moved (TestProvisionPrompt / TestProvisionProviderAuth / TestProvisionSkills / TestProvisionSupervise); the backend-side CA / git / workspace classes stay. - tests/unit/test_docker_provision_provider_auth.py removed; its coverage now lives in tests/unit/test_provision_apply.py (apply_provision is backend-agnostic, one test file suffices). Drops the BOT_BOTTLE_CONTAINER_HOME, BOT_BOTTLE_GUEST_HOME, BOT_BOTTLE_CONTAINER_SKILLS_DIR, and BOT_BOTTLE_GUEST_SKILLS_DIR env knobs the deleted provision modules used to read. /home/node is hardcoded everywhere the knobs lived; the values were effectively constants today and removing them keeps the PRD-0050 surface area honest. Flips PRD 0050 Status: Draft → Active. Closes #177 on merge.	2026-06-03 21:35:41 -04:00
didericis-claude	8c45016aa2	refactor(backend): move per-provider provisioning onto AgentProvider BottleBackend.provision now resolves the provider plugin from the plan and dispatches prompt / skills / declarative-apply / supervise-mcp through it. The four hooks the docker + smolmachines backends used to override (provision_skills, provision_prompt, provision_provider_auth, provision_supervise) are gone — the duplicated 50-line implementations under backend/{docker,smolmachines}/provision/{skills,prompt, provider_auth,supervise}.py are deleted. Each backend gains a small supervise_mcp_url(plan) override so the provider plugin can run `claude mcp add` / `codex mcp add` against the right URL: docker returns http://{SUPERVISE_HOSTNAME}:{SUPERVISE_PORT}/ on the compose network alias; smolmachines returns plan.agent_supervise_url which launch.py already pins to a host-loopback port. Removes tests/unit/test_provision_supervise.py — the URL it asserted on now lives on the backend, with no equivalent standalone surface to test against (it's covered by the broader plan / launch integration tests).	2026-06-03 21:35:41 -04:00
didericis-claude	1443376268	refactor(agent_provider): introduce AgentProvider ABC + contrib plugins Lift the provider-specific blocks of agent_provision_plan into contrib/claude/agent_provider.py and contrib/codex/agent_provider.py, behind a new AgentProvider ABC and a lazy get_provider() registry (mirrors PRD 0048's contrib convention). agent_provision_plan and runtime_for stay as thin shims so existing callers in backend/{docker,smolmachines}/prepare.py and cli/start.py keep working without per-call edits — the shipping diff in this commit is purely 'who owns the producer'. Adds bot_bottle/_provision_apply.py — the backend-agnostic skills / prompt / declarative-plan apply loops the per-provider default methods will dispatch through in the next commit.	2026-06-03 21:35:41 -04:00
didericis-claude	2c6f248cda	docs(prd): draft PRD 0050 — move provider logic into contrib	2026-06-03 21:35:41 -04:00