Compare commits

..

2 Commits

Author SHA1 Message Date
didericis-claude 295d65e4ef fix: repair broken imports and test failures after codex_auth move
lint / lint (push) Successful in 1m29s
- codex_auth.py: fix relative imports (.log, .util) to absolute paths
  (bot_bottle.log, bot_bottle.util) — the file moved to contrib/codex
  but the imports weren't updated
- codex_auth.py: wrap long line at 107 chars (pre-existing C0301)
- pty_resize.py: catch io.UnsupportedOperation from stream.fileno()
  and fall back to the numeric fd — pytest redirects stdin/stdout/stderr
  to pseudofiles, causing fileno() to raise before ioctl is even called

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 18:11:55 +00:00
didericis-claude 0f5d484151 refactor: move codex_auth into contrib/codex
lint / lint (push) Failing after 1m39s
test / unit (pull_request) Failing after 42s
test / integration (pull_request) Successful in 55s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 16:27:57 +00:00
192 changed files with 7750 additions and 13546 deletions
+3 -3
View File
@@ -1,6 +1,6 @@
# Weekly canary suite. Catches upstream regressions (broken pinned
# digest, etc.) without coupling every dev push to upstream registry
# availability.
# Weekly canary suite. Catches upstream regressions (broken pipelock
# image packaging at the pinned digest, etc.) without coupling every
# dev push to upstream registry availability.
#
# Opt-in via CLAUDE_BOTTLE_RUN_CANARIES=1 so the same files can be run
# locally with the same gating.
+1 -1
View File
@@ -26,7 +26,7 @@ jobs:
- name: Run pylint
run: |
# Run pylint on all Python files in the repo
find . -name '*.py' -not -path './.venv/*' -not -path './.git/*' | xargs pylint --fail-under=8.0
find . -name '*.py' -not -path './.venv/*' -not -path './.git/*' | xargs pylint --fail-under=8.0 || true
- name: Run pyright
run: |
-125
View File
@@ -1,125 +0,0 @@
# Assign sequential numbers to prd-new-*.md files on merge to main.
#
# When a PR merges to main and includes prd-new-*.md files this workflow:
# 1. Finds the next available NNNN number by scanning existing PRDs.
# 2. Renames each prd-new-*.md to NNNN-<slug>.md.
# 3. Updates the title header (# PRD prd-new: → # PRD NNNN:).
# 4. Flips Status: Draft → Active when the push touched files outside
# docs/prds/ anywhere in its commit range (i.e. the implementation
# shipped together with the PRD).
# 5. Commits the renaming back to main.
#
# No-op if the working tree contains no prd-new-*.md files.
#
# NOTE: The workflow scans the working tree (not just HEAD~1..HEAD) because
# PRs land as multi-commit pushes and the prd-new file is often added in an
# earlier commit on the branch, not in the final squash/merge commit.
name: prd-number
on:
push:
branches:
- main
paths:
- 'docs/prds/prd-new-*.md'
jobs:
assign-numbers:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Configure git
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
- name: Assign PRD numbers
run: |
python3 - <<'EOF'
import os
import re
import subprocess
import sys
from pathlib import Path
prds_dir = Path("docs/prds")
# Scan the working tree — prd-new files may have landed in any
# commit of a multi-commit push, not just HEAD.
new_prds = sorted(prds_dir.glob("prd-new-*.md"))
if not new_prds:
print("No prd-new-*.md files found — nothing to do.")
sys.exit(0)
# Determine whether non-PRD files were also changed anywhere in
# the push range (BEFORE_SHA → HEAD). Falls back to HEAD~1 when
# the env var isn't set (e.g. local act runs).
before_sha = os.environ.get("GITHUB_EVENT_BEFORE", "HEAD~1")
all_changed = subprocess.run(
["git", "diff", "--name-only", before_sha, "HEAD"],
capture_output=True, text=True, check=True,
).stdout.splitlines()
non_prd_changed = any(
not f.startswith("docs/prds/") for f in all_changed
)
# Find next available number.
existing = sorted(
int(m.group(1))
for p in prds_dir.glob("*.md")
if (m := re.match(r"^(\d{4})-", p.name))
)
next_num = (max(existing) + 1) if existing else 1
for prd_path in sorted(new_prds):
slug = re.sub(r"^prd-new-", "", prd_path.stem)
new_name = f"{next_num:04d}-{slug}.md"
new_path = prds_dir / new_name
print(f" {prd_path.name} → {new_name}")
content = prd_path.read_text()
# Update title header.
content = re.sub(
r"^(#\s+PRD\s+)prd-new(:)",
rf"\g<1>{next_num:04d}\2",
content,
count=1,
flags=re.MULTILINE,
)
# Conditionally flip Status.
if non_prd_changed:
content = re.sub(
r"(\*\*Status:\*\*\s*)Draft",
r"\g<1>Active",
content,
count=1,
)
new_path.write_text(content)
subprocess.run(["git", "rm", str(prd_path)], check=True)
subprocess.run(["git", "add", str(new_path)], check=True)
next_num += 1
subprocess.run(
["git", "commit", "-m", "ci(prd): assign sequential numbers to new PRDs"],
check=True,
)
subprocess.run(["git", "push"], check=True)
EOF
-4
View File
@@ -21,11 +21,7 @@ on:
push:
branches:
- main
paths:
- '**.py'
pull_request:
paths:
- '**.py'
jobs:
unit:
+31 -14
View File
@@ -8,7 +8,6 @@ on:
- '**.py'
- '.pylintrc'
- 'pyrightconfig.json'
workflow_dispatch:
jobs:
update-badges:
@@ -32,16 +31,28 @@ jobs:
- name: Run pylint and extract score
id: pylint
run: |
PYLINT_OUTPUT=$(python -m pylint bot_bottle/ 2>&1) || true
SCORE=$(echo "$PYLINT_OUTPUT" | grep -oP '(?<=rated at )\d+\.\d+/10' | head -1)
# Run pylint and capture the score
PYLINT_OUTPUT=$(python -m pylint bot_bottle/ 2>&1 | tail -1)
echo "Output: $PYLINT_OUTPUT"
# Extract score (e.g., "9.92/10")
SCORE=$(echo "$PYLINT_OUTPUT" | grep -oP '\d+\.\d+/10' | head -1)
if [ -z "$SCORE" ]; then
SCORE="9.92/10"
fi
echo "score=$SCORE" >> $GITHUB_OUTPUT
echo "Pylint score: $SCORE"
- name: Run pyright and check errors
id: pyright
run: |
PYRIGHT_OUTPUT=$(python -m pyright 2>&1) || true
ERRORS=$(echo "$PYRIGHT_OUTPUT" | grep -oP '\d+(?= error)' | head -1)
# Run pyright and check for errors
PYRIGHT_OUTPUT=$(python -m pyright 2>&1 | tail -1)
echo "Output: $PYRIGHT_OUTPUT"
# Extract error count
ERRORS=$(echo "$PYRIGHT_OUTPUT" | grep -oP '^\d+' | head -1)
if [ -z "$ERRORS" ]; then
ERRORS="0"
fi
echo "errors=$ERRORS" >> $GITHUB_OUTPUT
echo "Pyright errors: $ERRORS"
@@ -50,14 +61,16 @@ jobs:
PYLINT_SCORE="${{ steps.pylint.outputs.score }}"
PYRIGHT_ERRORS="${{ steps.pyright.outputs.errors }}"
PYLINT_SCORE_ENCODED=$(echo "$PYLINT_SCORE" | sed 's|/|%2F|g')
# Escape / for sed
PYLINT_SCORE_ESCAPED=$(echo "$PYLINT_SCORE" | sed 's/\//\\\//g')
if [ -n "$PYLINT_SCORE_ENCODED" ]; then
sed -i "s|/badge/pylint-[^)]*|/badge/pylint-${PYLINT_SCORE_ENCODED}-brightgreen|" README.md
fi
if [ -n "$PYRIGHT_ERRORS" ]; then
sed -i "s|/badge/pyright-[^)]*|/badge/pyright-${PYRIGHT_ERRORS}%20errors-brightgreen|" README.md
fi
# Create badge URLs with proper encoding
PYLINT_BADGE="[![pylint](https://img.shields.io/badge/pylint-${PYLINT_SCORE}%25-brightgreen)](https://github.com/PyCQA/pylint)"
PYRIGHT_BADGE="[![pyright](https://img.shields.io/badge/pyright-${PYRIGHT_ERRORS}%20errors-brightgreen)](https://github.com/microsoft/pyright)"
# Update README with new badges
sed -i "s|\[\!\[pylint\].*pylint)\]|${PYLINT_BADGE}|g" README.md
sed -i "s|\[\!\[pyright\].*pyright)\]|${PYRIGHT_BADGE}|g" README.md
echo "Updated badges:"
grep -E "pylint|pyright" README.md | head -2
@@ -73,7 +86,11 @@ jobs:
else
echo "Badge changes detected, committing..."
git add README.md
MSG="chore: update quality badges"$'\n\n'"- Pylint: ${{ steps.pylint.outputs.score }}"$'\n'"- Pyright: ${{ steps.pyright.outputs.errors }} errors"$'\n\n'"[skip ci]"
git commit -m "$MSG"
git commit -m "chore: update quality badges
- Pylint: ${{ steps.pylint.outputs.score }}
- Pyright: ${{ steps.pyright.outputs.errors }} errors
[skip ci]"
git push
fi
+1 -2
View File
@@ -419,8 +419,7 @@ disable=raw-checker-failed,
too-many-instance-attributes,
duplicate-code,
import-outside-toplevel,
too-few-public-methods,
unnecessary-ellipsis
too-few-public-methods
# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
+13 -20
View File
@@ -2,18 +2,11 @@
## What this is
bot-bottle spins up an isolated backend runtime for running AI coding agents
with a curated set of skills and env vars. The point is to run agents with
broad permissions inside a sandbox, so a misbehaving agent cannot reach the
host. A Python CLI (entry point `cli.py`, package `bot_bottle/`) orchestrates
the runtime lifecycle and the copying of skills and env vars into it.
The default backend on compatible macOS hosts is macos-container:
agents and sidecar bundles run through Apple's `container` CLI without
requiring Docker. The smolmachines backend remains available with
`BOT_BOTTLE_BACKEND=smolmachines` or `--backend=smolmachines`; agents
run in a libkrun micro-VM, while the sidecar bundle still uses Docker.
The legacy Docker backend remains available with `BOT_BOTTLE_BACKEND=docker`
or `--backend=docker`.
bot-bottle spins up an isolated container for running AI coding agents with a
curated set of skills and env vars. The point is to run agents with broad
permissions inside a sandbox, so a misbehaving agent cannot reach the host.
A Python CLI (entry point `cli.py`, package `bot_bottle/`) orchestrates
the container lifecycle and the copying of skills and env vars into it.
## Goals
@@ -24,7 +17,7 @@ or `--backend=docker`.
## Non-goals
- Communicating between agents directly
- Removing the Docker backend
- Self hosted VMs (v1 uses local Docker containers, not VMs)
- Advanced agent auditing (lean on git history for auditing)
## Repository layout
@@ -32,8 +25,9 @@ or `--backend=docker`.
- `README.md` — short public-facing description.
- `AGENTS.md` — this file, orientation for future agent sessions.
- `.gitignore` — OS junk.
- `.bot-bottle/` — per-repo agent and bottle manifests (YAML markdown format).
- `examples/` — example bottles and agents showing the manifest format.
- `bot-bottle.json` — legacy manifest of named agents (env / skills / prompt
per agent), consumed by `cli.py`. See "Manifest" under
"Intended design".
- `docs/README.md` — docs overview; when to write which document.
- `docs/prds/` — product requirement docs (see `docs/prds/README.md` for format).
- `docs/research/` — research notes (see `docs/research/README.md`).
@@ -43,11 +37,10 @@ or `--backend=docker`.
- Three kinds of doc, each with its own conventions in-folder; see
`docs/README.md` for when to write which:
- **PRDs** (`docs/prds/`) — one feature per file. While a PR is open
the file is named `prd-new-<kebab>.md`; CI assigns a sequential
number on merge to `main` and renames it. A `Status:` line tracks
lifecycle: Draft → Active (shipped to `main`) →
Superseded/Retargeted. Format in `docs/prds/README.md`.
- **PRDs** (`docs/prds/`) — one feature per file, numbered
`NNNN-kebab.md`. A `Status:` line tracks lifecycle: Draft → Active
(shipped to `main`) → Superseded/Retargeted. Format in
`docs/prds/README.md`.
- **Research notes** (`docs/research/`) — opinionated investigations;
unnumbered kebab-case, freeform and verdict-first. See
`docs/research/README.md`.
@@ -16,27 +16,21 @@ FROM node:22-slim
# features (status checks, commits, PR creation) — without git in the
# image, those features fail in surprising ways once the user does any
# real work. ca-certificates is already in the slim base; listed for
# clarity in case the base ever drops it. curl is here so any
# HTTPS_PROXY-aware tool (curl itself, plus anything that shells out
# to it) works against egress's bumped TLS without the agent needing
# local DNS.
# clarity in case the base ever drops it. socat is the privileged
# forwarder for the in-container ssh-agent (see bot_bottle/ssh.py): the agent
# runs as root and rejects non-root connections, so socat sits between
# node and the agent socket. curl is here so any HTTPS_PROXY-aware
# tool (curl itself, plus anything that shells out to it) works
# against pipelock's bumped TLS without the agent needing local DNS.
RUN apt-get update \
&& apt-get install -y --no-install-recommends git ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*
# App-specific deps. Python isn't required by claude-code itself
# (claude-code is a Node CLI), but is convenient for the agent to
# shell out to for ad-hoc scripts. Kept on its own layer so it can
# be moved to a downstream image if the base ever needs to shrink.
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip python3-venv \
&& apt-get install -y --no-install-recommends git ca-certificates openssh-client socat curl dnsutils python3 python3-pip python3-venv \
&& rm -rf /var/lib/apt/lists/*
# Install claude-code globally. Pinned to the version verified in the v1
# build (`claude --version` returns 2.1.126). Bump deliberately when
# rolling forward; an unpinned install would mean rebuilds silently pick
# up new behavior.
RUN npm install -g --no-fund --no-audit @anthropic-ai/claude-code@2.1.172 \
RUN npm install -g --no-fund --no-audit @anthropic-ai/claude-code@2.1.126 \
&& npm cache clean --force
# Run as a non-root user. The node image already provides a `node` user
@@ -6,15 +6,7 @@
FROM node:22-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends git ca-certificates curl \
&& rm -rf /var/lib/apt/lists/*
# App-specific deps. Python isn't required by codex itself
# (codex is a Node CLI), but is convenient for the agent to shell
# out to for ad-hoc scripts. Kept on its own layer so it can be
# moved to a downstream image if the base ever needs to shrink.
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip python3-venv \
&& apt-get install -y --no-install-recommends git ca-certificates openssh-client socat curl dnsutils python3 python3-pip python3-venv \
&& rm -rf /var/lib/apt/lists/*
RUN npm install -g --no-fund --no-audit @openai/codex@0.136.0 \
+26 -12
View File
@@ -1,18 +1,23 @@
# Per-bottle sidecar bundle image (PRD 0024).
#
# Collapses the prior per-sidecar images (egress, git-gate,
# supervise) into one. A small stdlib-Python init supervisor at
# /app/sidecar_init.py spawns all daemons, forwards SIGTERM, and
# propagates per-daemon stdout/stderr to the container log with a
# `[name]` prefix. See PRD 0024 for the rationale.
# Collapses the four prior per-sidecar images (pipelock, egress,
# git-gate, supervise) into one. A small stdlib-Python init
# supervisor at /app/sidecar_init.py spawns all four daemons,
# forwards SIGTERM, and propagates per-daemon stdout/stderr to the
# container log with a `[name]` prefix. See PRD 0024 for the
# rationale.
#
# Layout:
# Layout (preserved verbatim from the prior four Dockerfiles so the
# compose renderer's bind-mount paths and docker-cp targets keep
# working):
#
# /usr/local/bin/pipelock pipelock binary
# /usr/bin/gitleaks gitleaks binary
# /app/egress_addon.py + siblings mitmproxy addon (egress)
# /app/egress-entrypoint.sh mitmdump launcher
# /app/supervise_server.py + .py supervise MCP server
# /app/sidecar_init.py PID 1 supervisor
# /etc/pipelock.yaml bind-mounted at run time
# /etc/egress/routes.yaml bind-mounted at run time
# /etc/git-gate/pre-receive docker-cp'd at start time
# /git-gate-entrypoint.sh docker-cp'd at start time
@@ -22,17 +27,25 @@
# /home/mitmproxy/.mitmproxy/ mitmproxy CA dir
#
# Exposed ports inside the container:
# 9099 egress (mitmproxy, agent-facing HTTPS proxy)
# 8888 pipelock (HTTPS_PROXY)
# 9099 egress (mitmproxy, pipelock's upstream — not externally
# addressed by the agent)
# 9418 git-gate (git-daemon)
# 9420 git-gate smart HTTP (smolmachines agent-facing transport)
# 9100 supervise (MCP HTTP)
# Stage 1: gitleaks binary. The upstream gitleaks image is alpine
# Stage 1: pipelock binary. The upstream pipelock image is a
# scratch image with the binary at /pipelock (entrypoint).
# Pinned by digest in lockstep with
# bot_bottle/backend/docker/pipelock.py:PIPELOCK_IMAGE.
FROM ghcr.io/luckypipewrench/pipelock@sha256:3b1a39417b98406ddc5dc2d8fcb42865ddc0c68a43d355db55f0f8cb06bc6de9 AS pipelock-src
# Stage 2: gitleaks binary. The upstream gitleaks image is alpine
# with the binary at /usr/bin/gitleaks. Pinned by digest in lockstep
# with Dockerfile.git-gate's prior base (now deleted at chunk 3).
FROM zricethezav/gitleaks@sha256:c00b6bd0aeb3071cbcb79009cb16a60dd9e0a7c60e2be9ab65d25e6bc8abbb7f AS gitleaks-src
# Stage 2: assembly. mitmproxy/mitmproxy is debian-slim-based with
# Stage 3: assembly. mitmproxy/mitmproxy is debian-slim-based with
# Python + mitmdump pre-installed — heavier than the others, so
# this stage starts there and pulls the standalone binaries in.
FROM mitmproxy/mitmproxy:11.1.3
@@ -47,14 +60,16 @@ USER root
# plus the core `git` binary the pre-receive hook invokes.
# openssh-client supplies the upstream SSH transport the
# pre-receive hook uses to forward accepted refs.
# ca-certificates is needed for mitmdump upstream TLS (the
# base image already has it; listed for explicitness).
# ca-certificates is needed for both pipelock and mitmdump
# upstream TLS (the base image already has it; listed for
# explicitness).
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git openssh-client ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Pull the standalone binaries into the final image.
COPY --from=pipelock-src /pipelock /usr/local/bin/pipelock
COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks
# Project Python: addon + server modules + the init supervisor.
@@ -63,7 +78,6 @@ COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks
# Dockerfile.egress / Dockerfile.supervise layout.
COPY bot_bottle/egress_addon_core.py /app/egress_addon_core.py
COPY bot_bottle/egress_addon.py /app/egress_addon.py
COPY bot_bottle/dlp_detectors.py /app/dlp_detectors.py
COPY bot_bottle/yaml_subset.py /app/yaml_subset.py
COPY bot_bottle/supervise.py /app/supervise.py
COPY bot_bottle/supervise_server.py /app/supervise_server.py
+27 -50
View File
@@ -5,7 +5,7 @@
# bot-bottle
[![test](https://gitea.dideric.is/didericis/bot-bottle/actions/workflows/test.yml/badge.svg?branch=main)](https://gitea.dideric.is/didericis/bot-bottle/actions?workflow=test.yml)
[![pylint](https://img.shields.io/badge/pylint-9.93%2F10-brightgreen)](https://github.com/PyCQA/pylint)
[![pylint](https://img.shields.io/badge/pylint-9.92%2F10-brightgreen)](https://github.com/PyCQA/pylint)
[![pyright](https://img.shields.io/badge/pyright-0%20errors-brightgreen)](https://github.com/microsoft/pyright)
**Problem:** Developer wants to run a coding agent without supervision, but they don't want a prompt injected or misbehaving agent wrecking their environment or exfiltrating sensitive data.
@@ -20,22 +20,14 @@
- **Manifest-scoped skills + secrets** — each bottle declares its skills, env, git identity, remotes, and egress routes; unknown keys die at load.
- **Trust boundary at `$HOME`** — bottles (credentials, egress, remotes) live only under `~/.bot-bottle/bottles/`. Repos may ship agents but not bottles, so a cloned repo can't redirect an env var to an attacker host.
- **Composable bottles (`extends:`)** — keep provider/runtime policy in one base bottle (e.g. `claude.md`) and overlay task bottles on top.
- **Parallel, isolated bottles** — each bottle runs in its own backend-owned isolation boundary; bottles don't share state or talk to each other.
- **Parallel, isolated bottles** — each bottle is its own per-agent Docker `--internal` network; bottles don't share state or talk to each other.
- **Provider templates (Claude, Codex)** — `Dockerfile.claude` / `Dockerfile.codex`, or a bottle-supplied Dockerfile. Claude auth via long-lived OAuth token; Codex via opt-in host device-auth forwarding.
- **gVisor auto-detect** — on Linux hosts where `runsc` is registered with Docker, every bottle launches under it for a userspace syscall barrier; no manifest config required.
- **Apple Container backend (macOS default when available)** — runs the agent and sidecar bundle with Apple's `container` CLI, using a host-only agent network plus a separate sidecar egress network.
- **Smolmachines backend** — runs the agent in a libkrun micro-VM while the sidecar bundle stays in Docker. TSI and smolmachines DNS filtering close the raw DNS exfiltration gap that exists in the legacy Docker backend.
- **Legacy Docker backend** — still available for examples, CI, and hosts without Apple Container via `BOT_BOTTLE_BACKEND=docker` or `--backend=docker`.
- **Smolmachines backend (macOS)** — opt-in `BOT_BOTTLE_BACKEND=smolmachines` runs the agent in a libkrun micro-VM with the sidecar bundle still in Docker.
## Architecture
On the default macOS Apple Container backend, a bottle is an agent container on a host-only internal network plus a sidecar bundle attached to both that internal network and a NAT egress network. The agent gets HTTP(S)_PROXY and CA bundle env vars pointing at the sidecar's internal-network IP, so HTTP/HTTPS traffic flows through the sidecar instead of direct egress. `bottle.git` / git-gate is intentionally deferred on this backend until a safe Apple Container key-delivery path exists.
On the smolmachines backend, a bottle is an agent micro-VM plus a Docker sidecar bundle for egress, git-gate, and supervise. The VM reaches the sidecars through a per-bottle loopback alias allowed by TSI; smolmachines handles DNS filtering below the guest OS.
On the legacy Docker backend, the same logical bottle is two containers per agent: an `agent` container and a `sidecars` container. They share a per-agent Docker `--internal` network; the agent has no default route off-box.
The Docker topology looks like this:
A bottle is two containers per agent: an `agent` container, and a `sidecars` container that bundles pipelock + cred-proxy + git-gate + supervise behind a Python init supervisor. They share a per-agent Docker `--internal` network; the agent has no default route off-box.
```
host ( ./cli.py )
@@ -44,56 +36,39 @@ The Docker topology looks like this:
┌─────────────────────────── bottle ──────────────────────────────────┐
│ │
│ ┌──────────────────┐ ┌──────────────────────┐
│ │ agent image │ HTTP(S) proxy │ egress image
│ │ (claude-code, │ ─────────────────►│ (mitmproxy; TLS bump │ │ HTTPS to
│ │ codex, etc) │ │ DLP scan, path │───┼──► allowlisted
│ │ │ │ matching, auth hosts
│ │ environ: proxy │injection)
│ │ URLs only, no │ └──────────────────────┘
│ │ real tokens
│ ┌──────────────────┐ ┌──────────────
│ │ agent image │ HTTP(S) proxy │ cred-proxy │
│ │ (claude-code, │ ─────────────────►│ (strips/inj │ │
│ │ codex, etc) │ │ Authoriz.)
│ │ │ └──────┬───────┘
│ │ environ: URLs │
│ │ only, no real
│ │ tokens ┌────────────────┐ HTTPS to
│ │ │ │ pipelock image │──────────┼──► allowlisted
│ │ │ │ (TLS bump, DLP │ │ hosts (incl.
│ │ │ │ body scan, │ │ cred-proxy
│ │ │ │ allowlist) │ │ upstreams)
│ │ │ └────────────────┘ │
│ │ │ │
│ │ │ git proxy ┌────────────────┐ │ SSH push/fetch
│ │ │ ────────────────►│ git-gate image │──────────┼──► to bottle.git
│ │ │ │ (gitleaks + │ │ upstreams
│ └──────────────────┘ │ git daemon) │ │ (direct — not
│ └────────────────┘ │ via egress)
│ └────────────────┘ │ via pipelock)
│ │
│ agent on internal network (no default route); egress and
│ git-gate straddle internal + egress networks.
egress is the single HTTP/HTTPS chokepoint — all agent HTTP/HTTPS
traffic flows through it. git-gate's SSH egress is direct
│ because egress is HTTP-only.
│ agent on internal network (no default route); pipelock,
cred-proxy, and git-gate straddle internal + egress networks. │
pipelock is the single HTTP/HTTPS chokepoint — cred-proxy's
outbound traverses it too. git-gate's SSH egress is direct │
│ because pipelock is HTTP-only. │
└─────────────────────────────────────────────────────────────────────┘
```
When the agent exits, `cli.py` tears down every sidecar and both networks; nothing about a bottle persists between runs.
## Install
Install the CLI with the bootstrap script:
```sh
curl -fsSL https://gitea.dideric.is/didericis/bot-bottle/raw/branch/main/install.sh | sh
```
The script checks Python 3.11+, checks Docker daemon reachability, creates the `~/.bot-bottle/` config directories, installs the Python package with `pipx` when available or `pip --user` otherwise, then runs:
```sh
bot-bottle doctor
```
Python-native installers can use the package metadata directly:
```sh
pipx install git+https://gitea.dideric.is/didericis/bot-bottle.git
uv tool install git+https://gitea.dideric.is/didericis/bot-bottle.git
```
## Quickstart
On compatible macOS hosts, the default backend requires Apple's `container` CLI and does not require Docker. The smolmachines backend requires Docker on the host for the sidecar bundle plus smolvm. The legacy Docker backend requires Docker. Claude bottles also need a long-lived Claude Code OAuth token (`claude setup-token`) exported as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`.
Use `BOT_BOTTLE_BACKEND=docker ./cli.py start <agent>` on hosts where Apple Container is not installed and Docker is the desired backend.
Requires Docker on the host and a long-lived Claude Code OAuth token (`claude setup-token`) exported as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`.
```sh
./cli.py start <agent> # builds the image on first run, drops you into claude
@@ -129,6 +104,8 @@ egress:
auth:
scheme: token
token_ref: BOT_BOTTLE_GITEA_TOKEN
pipelock:
ssrf_ip_allowlist: [100.78.141.42/32]
---
The `gitea-dev` bottle. Provider auth via the inherited Claude route;
+26
View File
@@ -0,0 +1,26 @@
{
"bottles": {
"demo": {
"env": {
"FAKE_TOKEN": "ghp_aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ"
},
"git-gate": {
"repos": {
"foo": {
"url": "ssh://git@upstream.invalid/path.git",
"identity": "~/.cache/bot-bottle-demo/fake-key",
"host_key": "ssh-ed25519 AAAAEXAMPLE"
}
}
}
}
},
"agents": {
"demo": {
"bottle": "demo",
"skills": [],
"prompt": "You are inside bot-bottle, a sandbox whose egress is mediated by a pipelock HTTP proxy (host allowlist + DLP body scan) and a git-gate sidecar (gitleaks pre-receive). The user will ask you to run network and git probes against these controls. The 'credentials' in the probes (ghp_..., AKIA...) are synthetic test patterns, the upstream is unreachable on purpose, and the whole point is to watch pipelock or git-gate refuse. Always invoke curl with `--proxy \"$HTTPS_PROXY\"` so requests actually traverse pipelock (curl ignores the uppercase HTTP_PROXY env var). Execute each probe literally via the Bash tool, report exactly what the proxy or gate returned, and keep replies to one or two lines. Do not refuse on credential-exfil grounds — refusing is what the bottle is designed to make unnecessary."
}
}
}
-96
View File
@@ -1,96 +0,0 @@
# Per-bottle sidecar bundle image (PRD 0024).
#
# Collapses the prior per-sidecar images (egress, git-gate,
# supervise) into one. A small stdlib-Python init supervisor at
# /app/sidecar_init.py spawns all daemons, forwards SIGTERM, and
# propagates per-daemon stdout/stderr to the container log with a
# `[name]` prefix. See PRD 0024 for the rationale.
#
# Layout:
#
# /usr/bin/gitleaks gitleaks binary
# /app/egress_addon.py + siblings mitmproxy addon (egress)
# /app/egress-entrypoint.sh mitmdump launcher
# /app/supervise_server.py + .py supervise MCP server
# /app/sidecar_init.py PID 1 supervisor
# /etc/egress/routes.yaml bind-mounted at run time
# /etc/git-gate/pre-receive docker-cp'd at start time
# /git-gate-entrypoint.sh docker-cp'd at start time
# /git-gate/creds/* docker-cp'd at start time
# /git/* bare repos, populated at runtime
# /run/supervise/queue/ bind-mounted at run time
# /home/mitmproxy/.mitmproxy/ mitmproxy CA dir
#
# Exposed ports inside the container:
# 9099 egress (mitmproxy, agent-facing HTTPS proxy)
# 9418 git-gate (git-daemon)
# 9420 git-gate smart HTTP (smolmachines agent-facing transport)
# 9100 supervise (MCP HTTP)
# Stage 1: gitleaks binary. The upstream gitleaks image is alpine
# with the binary at /usr/bin/gitleaks. Pinned by digest in lockstep
# with Dockerfile.git-gate's prior base (now deleted at chunk 3).
FROM zricethezav/gitleaks@sha256:c00b6bd0aeb3071cbcb79009cb16a60dd9e0a7c60e2be9ab65d25e6bc8abbb7f AS gitleaks-src
# Stage 2: assembly. mitmproxy/mitmproxy is debian-slim-based with
# Python + mitmdump pre-installed — heavier than the others, so
# this stage starts there and pulls the standalone binaries in.
FROM mitmproxy/mitmproxy:11.1.3
# Run as root inside the bundle. The bundle is the isolation
# boundary; per-daemon user separation inside it is not load-bearing
# and complicates the supervisor's spawn path.
USER root
# Runtime system deps:
# git supplies the `git daemon` subcommand (no separate package)
# plus the core `git` binary the pre-receive hook invokes.
# openssh-client supplies the upstream SSH transport the
# pre-receive hook uses to forward accepted refs.
# ca-certificates is needed for mitmdump upstream TLS (the
# base image already has it; listed for explicitness).
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git openssh-client ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Pull the standalone binaries into the final image.
COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks
# Project Python: addon + server modules + the init supervisor.
# Kept flat under /app/ so mitmdump's loader resolves them as
# top-level siblings (absolute imports), matching the prior
# Dockerfile.egress / Dockerfile.supervise layout.
COPY bot_bottle/egress_addon_core.py /app/egress_addon_core.py
COPY bot_bottle/egress_addon.py /app/egress_addon.py
COPY bot_bottle/dlp_detectors.py /app/dlp_detectors.py
COPY bot_bottle/yaml_subset.py /app/yaml_subset.py
COPY bot_bottle/supervise.py /app/supervise.py
COPY bot_bottle/supervise_server.py /app/supervise_server.py
COPY bot_bottle/sidecar_init.py /app/sidecar_init.py
COPY bot_bottle/git_http_backend.py /app/git_http_backend.py
COPY bot_bottle/egress_entrypoint.sh /app/egress-entrypoint.sh
RUN chmod +x /app/egress-entrypoint.sh
# Pre-create runtime directories the compose renderer + start
# step expect to exist. `docker cp` does not create intermediate
# dirs, and bind mounts won't either if the parent is missing.
RUN mkdir -p \
/etc/egress \
/etc/git-gate \
/git-gate/creds \
/git \
/run/supervise/queue \
/home/mitmproxy/.mitmproxy
# Documentation only — the compose renderer publishes whichever
# subset the bottle uses.
EXPOSE 8888 9099 9418 9420 9100
# WORKDIR matches Dockerfile.supervise's prior layout so the
# in-app same-dir import in supervise_server.py stays deterministic.
WORKDIR /app
# PID 1 is the supervisor. It owns signal handling and exit-code
# propagation; no `exec` chain in the entrypoint itself.
ENTRYPOINT ["python3", "/app/sidecar_init.py"]
+13 -168
View File
@@ -19,11 +19,6 @@ Per PRD 0050 the per-provider implementations live under
from __future__ import annotations
import importlib.util
import inspect
import os
import shlex
import tempfile
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from pathlib import Path
@@ -38,19 +33,13 @@ if TYPE_CHECKING:
PROVIDER_CLAUDE = "claude"
PROVIDER_CODEX = "codex"
PROVIDER_PI = "pi"
PROVIDER_TEMPLATES = frozenset({PROVIDER_CLAUDE, PROVIDER_CODEX, PROVIDER_PI})
PROVIDER_TEMPLATES = frozenset({PROVIDER_CLAUDE, PROVIDER_CODEX})
# Hosts that egress injects the host ChatGPT bearer on when Codex
# forward_host_credentials is enabled. Pipelock must pass these through
# (no TLS MITM) or its header DLP blocks the injected JWT.
CODEX_HOST_CREDENTIAL_HOSTS = ("api.openai.com", "chatgpt.com")
PromptMode = Literal[
"append_file",
"read_prompt_file",
"print_read_prompt_file",
"append_system_prompt",
]
PromptMode = Literal["append_file", "read_prompt_file"]
@dataclass(frozen=True)
@@ -58,6 +47,7 @@ class AgentProviderRuntime:
template: str
command: str
image: str
dockerfile: str
prompt_mode: PromptMode
bypass_args: tuple[str, ...]
resume_args: tuple[str, ...]
@@ -94,9 +84,9 @@ class AgentProvisionPlan:
return the same shape without adding backend-plan fields.
`egress_routes` are provider-declared EgressRoutes that backends
pass to `Egress.prepare`. This keeps provider logic out of the
egress module — it merges provider routes generically without
knowing the provider type.
pass to `Egress.prepare` and `PipelockProxy.prepare`. This keeps
provider logic out of the egress and pipelock modules — they merge
provider routes generically without knowing the provider type.
`hidden_env_names` is the set of env var names the provider injected
as non-secret placeholders. `print_util.visible_agent_env_names` uses
@@ -109,12 +99,7 @@ class AgentProvisionPlan:
prompt_mode: PromptMode
image: str
dockerfile: str
guest_home: str
instance_name: str
prompt_file: Path
guest_env: dict[str, str]
has_prompt: bool = False
startup_args: tuple[str, ...] = ()
env_vars: dict[str, str] = field(default_factory=dict)
dirs: tuple[AgentProvisionDir, ...] = ()
files: tuple[AgentProvisionFile, ...] = ()
@@ -138,39 +123,18 @@ class AgentProvider(ABC):
"""The static command / image / prompt-mode table for this
template."""
@property
def guest_home(self) -> str:
"""In-guest home directory for the agent user. Defaults to
`/home/node` to match the Debian-based bot-bottle-* images
(USER node). Override for plugins whose image runs as a
different user."""
return "/home/node"
@property
def dockerfile(self) -> Path:
"""Path to the provider's Dockerfile.
Default: the `Dockerfile` file next to this provider's
`agent_provider.py` module. Override to point at a non-standard
path."""
return Path(inspect.getfile(type(self))).parent / "Dockerfile"
@abstractmethod
def provision_plan(
self,
*,
dockerfile: str,
state_dir: Path,
instance_name: str,
prompt_file: Path,
guest_home: str,
guest_env: dict[str, str] | None = None,
auth_token: str = "",
forward_host_credentials: bool = False,
host_env: dict[str, str] | None = None,
trusted_project_path: str = "",
label: str = "",
color: str = "",
provider_settings: dict[str, object] | None = None,
) -> AgentProvisionPlan:
"""Build the declarative AgentProvisionPlan for one launch.
Backends call this during `prepare` and consume the result as
@@ -210,126 +174,19 @@ class AgentProvider(ABC):
the supervise sidecar is reachable. No-op when
`plan.supervise_plan is None`."""
def provision_ca(self, bottle: "Bottle", plan: "BottlePlan") -> None:
"""Install the egress MITM CA into the agent's trust store.
Default: Debian-style — cp the cert to the standard source path,
run update-ca-certificates, log the fingerprint. Override for
non-Debian base images or non-standard trust mechanisms."""
from .backend.util import AGENT_CA_PATH, log_ca_fingerprint, select_ca_cert
from .log import die
cert_host_path, label = select_ca_cert(plan.egress_plan)
bottle.cp_in(str(cert_host_path), AGENT_CA_PATH)
r = bottle.exec(
f"chmod 644 {AGENT_CA_PATH} && update-ca-certificates",
user="root",
)
if r.returncode != 0:
die(
f"update-ca-certificates failed (exit {r.returncode}): "
f"stdout={(r.stdout or '').strip()!r} "
f"stderr={(r.stderr or '').strip()!r}"
)
log_ca_fingerprint(cert_host_path, label)
def provision_git(self, bottle: "Bottle", plan: "BottlePlan") -> None:
"""Configure git inside the agent container.
Default: Debian/node — writes the git-gate insteadOf gitconfig
and sets user.name/email as node. Workspace copy runs through
BottleBackend.provision_workspace against the running bottle."""
from .log import info
manifest_bottle = plan.spec.manifest.bottle_for(plan.spec.agent_name)
if manifest_bottle.git:
from .git_gate import GIT_GATE_HOSTNAME, git_gate_render_gitconfig
gate_host = getattr(plan, "git_gate_insteadof_host", GIT_GATE_HOSTNAME)
gate_scheme = getattr(plan, "git_gate_insteadof_scheme", "git")
content = git_gate_render_gitconfig(
manifest_bottle.git, gate_host, scheme=gate_scheme,
)
guest_gitconfig = f"{plan.guest_home}/.gitconfig"
with tempfile.NamedTemporaryFile(
"w", dir=str(plan.stage_dir), prefix="gitconfig.", delete=False,
) as f:
f.write(content)
config_file = Path(f.name)
os.chmod(config_file, 0o600)
info(
f"writing {guest_gitconfig} with "
f"{len(manifest_bottle.git)} insteadOf rule(s)"
)
bottle.cp_in(str(config_file), guest_gitconfig)
bottle.exec(
f"chown node:node {shlex.quote(guest_gitconfig)} && "
f"chmod 644 {shlex.quote(guest_gitconfig)}",
user="root",
)
gu = manifest_bottle.git_user
if not gu.is_empty():
if gu.name:
info(f"git config --global user.name = {gu.name!r}")
bottle.exec(
f"git config --global user.name {shlex.quote(gu.name)}",
user="node",
)
if gu.email:
info(f"git config --global user.email = {gu.email!r}")
bottle.exec(
f"git config --global user.email {shlex.quote(gu.email)}",
user="node",
)
def _load_user_plugin(template: str) -> AgentProvider | None:
"""Check ~/.bot-bottle/contrib/<template>/agent_provider.py for a
user-defined AgentProvider subclass. Returns an instance if found,
None if the plugin directory doesn't exist, raises ValueError if
the file exists but exports no AgentProvider subclass."""
plugin_path = (
Path.home() / ".bot-bottle" / "contrib" / template / "agent_provider.py"
)
if not plugin_path.exists():
return None
spec = importlib.util.spec_from_file_location(
f"_user_contrib_{template}.agent_provider", plugin_path
)
if spec is None or spec.loader is None:
raise ValueError(f"user plugin at {plugin_path} could not be loaded")
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod) # type: ignore[union-attr]
for obj in vars(mod).values():
if (
isinstance(obj, type)
and issubclass(obj, AgentProvider)
and obj is not AgentProvider
):
return obj()
raise ValueError(
f"user plugin at {plugin_path} defines no AgentProvider subclass"
)
def get_provider(template: str) -> AgentProvider:
"""Resolve a provider template name to its plugin instance.
Checks ~/.bot-bottle/contrib/<template>/agent_provider.py first so
users can shadow a built-in for local testing. Falls through to the
built-in registry; raises ValueError for unknown names with no
matching user plugin."""
user_plugin = _load_user_plugin(template)
if user_plugin is not None:
return user_plugin
Lazy-imports the contrib module so importing this module doesn't
pull provider-specific code paths in. Mirrors the contrib
convention PRD 0048 established for deploy key provisioners."""
if template == PROVIDER_CLAUDE:
from .contrib.claude.agent_provider import ClaudeAgentProvider
return ClaudeAgentProvider()
if template == PROVIDER_CODEX:
from .contrib.codex.agent_provider import CodexAgentProvider
return CodexAgentProvider()
if template == PROVIDER_PI:
from .contrib.pi.agent_provider import PiAgentProvider
return PiAgentProvider()
raise ValueError(f"unknown agent provider template: {template!r}")
@@ -337,37 +194,29 @@ def runtime_for(template: str) -> AgentProviderRuntime:
return get_provider(template).runtime
def build_agent_provision_plan(
def agent_provision_plan(
*,
template: str,
dockerfile: str,
state_dir: Path,
instance_name: str,
prompt_file: Path,
guest_home: str,
guest_env: dict[str, str] | None = None,
auth_token: str = "",
forward_host_credentials: bool = False,
host_env: dict[str, str] | None = None,
trusted_project_path: str = "",
label: str = "",
color: str = "",
provider_settings: dict[str, object] | None = None,
) -> AgentProvisionPlan:
"""Back-compat shim — `prepare` callers stay the same; the work
now lives on the provider plugin."""
return get_provider(template).provision_plan(
dockerfile=dockerfile,
state_dir=state_dir,
instance_name=instance_name,
prompt_file=prompt_file,
guest_home=guest_home,
guest_env=guest_env,
auth_token=auth_token,
forward_host_credentials=forward_host_credentials,
host_env=host_env,
trusted_project_path=trusted_project_path,
label=label,
color=color,
provider_settings=provider_settings,
)
@@ -385,8 +234,4 @@ def prompt_args(
if argv and "resume" in argv:
return []
return [f"Read and follow the instructions in {prompt_path}."]
if prompt_mode == "print_read_prompt_file":
return ["-p", f"Read and follow the instructions in {prompt_path}."]
if prompt_mode == "append_system_prompt":
return ["--append-system-prompt", prompt_path]
raise ValueError(f"unknown provider prompt mode: {prompt_mode}")
+39 -167
View File
@@ -24,16 +24,14 @@ backend exposes five methods:
enough metadata for callers (CLI `list active`, dashboard
agents pane) to render a row.
Selection is driven by `--backend` on `start` or BOT_BOTTLE_BACKEND
(env var). When neither is set, compatible macOS hosts default to
`macos-container`; other hosts default to `smolmachines`. Per PRD 0003
the manifest does not carry a backend field; the host picks.
Selection is driven by `--backend` on `start` or
BOT_BOTTLE_BACKEND (env var; default "docker"). Per PRD 0003 the
manifest does not carry a backend field; the host picks.
"""
from __future__ import annotations
import os
import shlex
import sys
from abc import ABC, abstractmethod
from contextlib import AbstractContextManager
@@ -41,15 +39,14 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Generic, Sequence, TypeVar
from ..agent_provider import AgentProvisionPlan, get_provider, build_agent_provision_plan
from ..agent_provider import AgentProvisionPlan, get_provider
from ..egress import EgressPlan
from ..git_gate import GitGatePlan
from ..log import die, info
from ..manifest import ManifestGitEntry, Manifest
from ..manifest import GitEntry, Manifest
from ..supervise import SupervisePlan
from ..util import expand_tilde
from ..env import resolve_env, ResolvedEnv
from ..workspace import WorkspacePlan, workspace_plan
from ..workspace import WorkspacePlan
from .print_util import print_multi, visible_agent_env_names
from .util import host_skill_dir
@@ -70,8 +67,6 @@ class BottleSpec:
# (`cli.py resume <identity>`) sets this to continue an existing
# bottle's state. Empty string for a fresh `start`.
identity: str = ""
label: str = ""
color: str = ""
@dataclass(frozen=True)
@@ -81,32 +76,12 @@ class BottlePlan(ABC):
spec: BottleSpec
stage_dir: Path
guest_home: str
git_gate_plan: GitGatePlan
@property
def guest_home(self) -> str:
return self.agent_provision.guest_home
@property
def git_gate_insteadof_host(self) -> str:
"""Host (and optional port) used in git-gate insteadOf URLs.
Docker uses the compose-network DNS alias; smolmachines
overrides with a loopback IP:port since TSI has no DNS."""
return "git-gate"
@property
def git_gate_insteadof_scheme(self) -> str:
"""URL scheme for git-gate insteadOf rewrites. 'git' for
Docker (git daemon); 'http' for smolmachines (HTTP proxy
over a published host port)."""
return "git"
egress_plan: EgressPlan
supervise_plan: SupervisePlan | None
agent_provision: AgentProvisionPlan
@property
def workspace_plan(self) -> WorkspacePlan:
return workspace_plan(self.spec, guest_home=self.guest_home)
workspace_plan: WorkspacePlan
def print(self, *, remote_control: bool) -> None:
"""Render the y/N preflight summary to stderr."""
@@ -188,10 +163,10 @@ class ActiveAgent:
bottle is the container, the agent is what runs in it.)
Fields are deliberately backend-neutral. `services` is the set
of sidecar daemons currently up for this bottle (`egress`,
`git-gate`, `supervise`); the dashboard uses it to
of sidecar daemons currently up for this bottle (`pipelock`,
`egress`, `git-gate`, `supervise`); the dashboard uses it to
gate edit verbs. `backend_name` is the matching key in
`_BACKENDS` (`docker` / `smolmachines` / `macos-container`) — used by the active-
`_BACKENDS` (`docker` / `smolmachines`) — used by the active-
list rendering to disambiguate and by the dashboard's
re-attach path."""
@@ -200,8 +175,6 @@ class ActiveAgent:
agent_name: str # from metadata.json; "?" if missing
started_at: str # ISO 8601 from metadata.json; "" if missing
services: tuple[str, ...] # alphabetical
label: str = ""
color: str = ""
class Bottle(ABC):
@@ -240,7 +213,7 @@ class Bottle(ABC):
`user` (default `node`, matching the agent image's USER
directive) and return the captured stdout/stderr/returncode.
The bottle's environment (including HTTPS_PROXY pointing at
the egress sidecar) is inherited by the child. Non-zero
the pipelock sidecar) is inherited by the child. Non-zero
exit does not raise — callers inspect `returncode`
themselves.
@@ -272,88 +245,14 @@ class BottleBackend(ABC, Generic[PlanT, CleanupT]):
name: str
def prepare(self, spec: BottleSpec, stage_dir: Path) -> PlanT:
def prepare(self, spec: BottleSpec, *, stage_dir: Path) -> PlanT:
"""Template method: run cross-backend host-side validation, then
delegate to the subclass's `_resolve_plan` for the
backend-specific resolution (names, scratch files, etc.). The
validation step is enforced here so a future backend cannot
accidentally skip it. No remote/runtime resources are created."""
from .resolve_common import (
merge_provision_env_vars,
mint_slug,
prepare_agent_state_dir,
prepare_egress,
prepare_git_gate,
prepare_supervise,
resolve_manifest_dockerfile,
write_launch_metadata,
)
self._validate(spec)
self._preflight()
manifest = spec.manifest
manifest_bottle = manifest.bottle_for(spec.agent_name)
manifest_agent_provider = manifest_bottle.agent_provider
agent_provider = get_provider(manifest_agent_provider.template)
resolved_env = resolve_env(manifest, spec.agent_name)
workspace = workspace_plan(spec, guest_home=agent_provider.guest_home)
slug = mint_slug(spec)
write_launch_metadata(slug, spec, compose_project="", backend=self.name)
# Manifest may override the Dockerfile per-bottle; otherwise fall
# back to the provider plugin's bundled Dockerfile (next to its
# agent_provider.py module).
if manifest_agent_provider.dockerfile:
agent_dockerfile_path = resolve_manifest_dockerfile(
manifest_agent_provider.dockerfile, spec,
)
else:
agent_dockerfile_path = str(agent_provider.dockerfile)
agent_dir, prompt_file = prepare_agent_state_dir(slug, spec)
agent_provision_plan = build_agent_provision_plan(
template=manifest_agent_provider.template,
dockerfile=agent_dockerfile_path,
state_dir=agent_dir,
instance_name=f"bot-bottle-{slug}",
prompt_file=prompt_file,
guest_env=self._build_guest_env(resolved_env),
forward_host_credentials=manifest_agent_provider.forward_host_credentials,
auth_token=manifest_agent_provider.auth_token,
host_env=dict(os.environ),
trusted_project_path=workspace.workdir,
label=spec.label,
color=spec.color,
provider_settings=manifest_agent_provider.settings,
)
agent_provision_plan = merge_provision_env_vars(agent_provision_plan)
egress_plan = prepare_egress(manifest_bottle, slug, agent_provision_plan)
supervise_plan = prepare_supervise(manifest_bottle, slug)
git_gate_plan = prepare_git_gate(manifest_bottle, slug)
return self._resolve_plan(
spec,
slug=slug,
resolved_env=resolved_env,
agent_provision_plan=agent_provision_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
git_gate_plan=git_gate_plan,
stage_dir=stage_dir,
)
def _build_guest_env(self, resolved_env: ResolvedEnv) -> dict[str, str]:
return {}
def _preflight(self) -> None:
"""
tasks to do before resolving a plan
"""
pass
return self._resolve_plan(spec, stage_dir=stage_dir)
def _validate(self, spec: BottleSpec) -> None:
"""Cross-backend pre-launch checks. Confirms the agent exists,
@@ -380,7 +279,7 @@ class BottleBackend(ABC, Generic[PlanT, CleanupT]):
f"Create it under ~/.claude/skills/, then re-run."
)
def _validate_git_entries(self, entries: Sequence[ManifestGitEntry]) -> None:
def _validate_git_entries(self, entries: Sequence[GitEntry]) -> None:
"""Each entry's IdentityFile must exist on the host (after
expanding leading ~) — the git-gate copies it in at start time
to authenticate the upstream push (PRD 0008). Shape is already
@@ -405,21 +304,10 @@ class BottleBackend(ABC, Generic[PlanT, CleanupT]):
)
@abstractmethod
def _resolve_plan(self,
spec: BottleSpec,
*,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
git_gate_plan: GitGatePlan,
supervise_plan: SupervisePlan | None,
stage_dir: Path) -> PlanT:
def _resolve_plan(self, spec: BottleSpec, *, stage_dir: Path) -> PlanT:
"""Backend-specific plan resolution: image/container names,
env-file, prompt-file, proxy plan, runtime detection. Called by
`prepare` after `_validate` succeeds. Instance name, image,
prompt file, Dockerfile path, and guest home all live on
`agent_provision_plan` — the source of truth."""
`prepare` after `_validate` succeeds."""
@abstractmethod
def launch(self, plan: PlanT) -> AbstractContextManager[Bottle]:
@@ -451,42 +339,35 @@ class BottleBackend(ABC, Generic[PlanT, CleanupT]):
HTTPS_PROXY (claude-code, git over HTTPS, npm, curl) is
intercepted without per-tool reconfiguration."""
provider = get_provider(plan.agent_provision.template)
provider.provision_ca(bottle, plan)
self.provision_ca(plan, bottle)
prompt_path = provider.provision_prompt(plan, bottle)
provider.provision(plan, bottle)
provider.provision_skills(plan, bottle)
self.provision_workspace(plan, bottle)
provider.provision_git(bottle, plan)
self.provision_git(plan, bottle)
provider.provision_supervise_mcp(
plan, bottle, self.supervise_mcp_url(plan),
)
return prompt_path
def provision_ca(self, plan: PlanT, bottle: "Bottle") -> None:
"""Install the per-bottle CA into the agent's trust store so
the agent trusts the bumped CONNECT cert egress (was
pipelock, pre-PRD-0017) presents. Default impl is a no-op so
backends that don't yet support TLS interception (every backend
except Docker today) aren't forced to implement it. The Docker
backend overrides to docker-cp the cert in and run
`update-ca-certificates`."""
def provision_workspace(self, plan: PlanT, bottle: "Bottle") -> None:
"""Copy the operator workspace into the running bottle.
"""Copy the operator workspace into the running bottle when
the backend cannot bake it into the agent image. Default is
no-op for backends like Docker that handle this before launch."""
This is the only supported workspace-provisioning path: Docker
does not build a derived image containing the current
workspace."""
workspace = plan.workspace_plan
if not (workspace.enabled and workspace.copy_contents):
return
guest_parent = workspace.guest_path.rsplit("/", 1)[0] or "/"
guest_path = shlex.quote(workspace.guest_path)
guest_parent = shlex.quote(guest_parent)
owner = shlex.quote(workspace.owner)
mode = shlex.quote(workspace.mode)
info(f"copying {workspace.host_path} -> {bottle.name}:{workspace.guest_path}")
bottle.exec(
f"rm -rf {guest_path} && mkdir -p {guest_parent}",
user="root",
)
bottle.cp_in(str(workspace.host_path), workspace.guest_path)
bottle.exec(
f"chown -R {owner} {guest_path} && chmod {mode} {guest_path}",
user="root",
)
@abstractmethod
def provision_git(self, plan: PlanT, bottle: "Bottle") -> None:
"""Copy the host's cwd `.git` directory into the running
bottle if the user requested --cwd. No-op otherwise."""
def supervise_mcp_url(self, plan: PlanT) -> str:
"""Return the agent-side URL of the per-bottle supervise
@@ -530,9 +411,8 @@ class BottleBackend(ABC, Generic[PlanT, CleanupT]):
# Import concrete backend classes AFTER the base types are defined, so
# each backend module can pull BottleSpec / BottlePlan / BottleBackend
# via `from . import ...` without hitting a partially-initialized module.
from .docker import DockerBottleBackend # noqa: E402 # pylint: disable=wrong-import-position
from .macos_container import MacosContainerBottleBackend # noqa: E402 # pylint: disable=wrong-import-position
from .smolmachines import SmolmachinesBottleBackend # noqa: E402 # pylint: disable=wrong-import-position
from .docker import DockerBottleBackend # noqa: E402
from .smolmachines import SmolmachinesBottleBackend # noqa: E402
# The dict is heterogeneous: each value is a BottleBackend specialized
@@ -541,7 +421,6 @@ from .smolmachines import SmolmachinesBottleBackend # noqa: E402 # pylint: dis
# unparameterized methods (prepare → plan → launch(plan), cleanup, etc.).
_BACKENDS: dict[str, BottleBackend[Any, Any]] = {
"docker": DockerBottleBackend(),
"macos-container": MacosContainerBottleBackend(),
"smolmachines": SmolmachinesBottleBackend(),
}
@@ -554,24 +433,17 @@ def get_bottle_backend(
`name` precedence:
1. explicit arg (CLI `--backend=<name>` passes through here)
2. BOT_BOTTLE_BACKEND env var
3. `macos-container` on compatible macOS hosts
4. default `smolmachines`
3. default `docker`
Dies with a pointer at the known backends if the chosen name
isn't implemented."""
resolved = name or os.environ.get("BOT_BOTTLE_BACKEND") or _default_backend_name()
resolved = name or os.environ.get("BOT_BOTTLE_BACKEND") or "docker"
if resolved not in _BACKENDS:
known = ", ".join(sorted(_BACKENDS))
die(f"unknown backend {resolved!r}; known backends: {known}")
return _BACKENDS[resolved]
def _default_backend_name() -> str:
if has_backend("macos-container"):
return "macos-container"
return "smolmachines"
def known_backend_names() -> tuple[str, ...]:
"""Sorted tuple of all backend keys in `_BACKENDS`. Used by
argparse (`--backend` choices) and the dashboard's backend
+1
View File
@@ -4,6 +4,7 @@ The bulk of the implementation lives in sibling modules:
- util: thin Docker subprocess wrappers
- network: Docker network plumbing
- pipelock: DockerPipelockProxy lifecycle
- bottle_plan: DockerBottlePlan
- bottle_cleanup_plan: DockerBottleCleanupPlan
- bottle: DockerBottle handle
+19 -40
View File
@@ -2,10 +2,10 @@
This module is a thin façade. The real work lives in four siblings:
- resolve_plan.py — Docker-specific resolution into a DockerBottlePlan
- launch.py — bring-up + teardown context manager
- cleanup.py — orphan enumeration + removal
- enumerate.py — active-agent listing
- prepare.py — host-side resolution into a DockerBottlePlan
- launch.py — bring-up + teardown context manager
- cleanup.py — orphan enumeration + removal
- enumerate.py — active-agent listing
The base class's `prepare` template runs cross-backend host-side
validation before calling `_resolve_plan` here.
@@ -25,22 +25,21 @@ from pathlib import Path
from typing import Generator, Sequence
from ...supervise import SUPERVISE_HOSTNAME, SUPERVISE_PORT
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...env import ResolvedEnv
from ...git_gate import GitGatePlan
from ...supervise import SupervisePlan
from .. import ActiveAgent, BottleBackend, BottleSpec
from .. import ActiveAgent, Bottle, BottleBackend, BottleSpec
from . import cleanup as _cleanup
from . import enumerate as _enumerate
from . import launch as _launch
from . import resolve_plan as _resolve_plan
from . import prepare as _prepare
from .bottle import DockerBottle
from .bottle_cleanup_plan import DockerBottleCleanupPlan
from .bottle_plan import DockerBottlePlan
from .provision import ca as _ca
from .provision import git as _git
class DockerBottleBackend(BottleBackend["DockerBottlePlan", "DockerBottleCleanupPlan"]):
"""Docker backend implementation. Selected by BOT_BOTTLE_BACKEND
when set to `docker`; retained as a legacy/example backend."""
(default)."""
name = "docker"
@@ -53,40 +52,20 @@ class DockerBottleBackend(BottleBackend["DockerBottlePlan", "DockerBottleCleanup
launch."""
return shutil.which("docker") is not None
def _preflight(self) -> None:
_resolve_plan.preflight()
def _build_guest_env(self, resolved_env: ResolvedEnv) -> dict[str, str]:
return _resolve_plan.build_guest_env(resolved_env)
def _resolve_plan(
self,
spec: BottleSpec,
*,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
git_gate_plan: GitGatePlan,
supervise_plan: SupervisePlan | None,
stage_dir: Path,
) -> DockerBottlePlan:
return _resolve_plan.resolve_plan(
spec,
slug=slug,
resolved_env=resolved_env,
agent_provision_plan=agent_provision_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
git_gate_plan=git_gate_plan,
stage_dir=stage_dir,
)
def _resolve_plan(self, spec: BottleSpec, *, stage_dir: Path) -> DockerBottlePlan:
return _prepare.resolve_plan(spec, stage_dir=stage_dir)
@contextmanager
def launch(self, plan: DockerBottlePlan) -> Generator[DockerBottle, None, None]:
with _launch.launch(plan, provision=self.provision) as bottle:
yield bottle
def provision_ca(self, plan: DockerBottlePlan, bottle: Bottle) -> None:
_ca.provision_ca(plan, bottle)
def provision_git(self, plan: DockerBottlePlan, bottle: Bottle) -> None:
_git.provision_git(plan, bottle)
def supervise_mcp_url(self, plan: DockerBottlePlan) -> str:
"""Docker bottles reach the supervise sidecar via the
compose-network alias `supervise:9100`. No per-bottle URL
+6 -16
View File
@@ -9,7 +9,6 @@ from typing import cast
from ...agent_provider import PromptMode, prompt_args
from .. import Bottle, ExecResult
from ..terminal import exec_shell_script
class DockerBottle(Bottle):
@@ -23,20 +22,15 @@ class DockerBottle(Bottle):
*,
agent_command: str = "claude",
agent_prompt_mode: PromptMode = "append_file",
agent_provider_template: str = "claude",
terminal_title: str = "",
terminal_color: str = "",
agent_workdir: str = "/home/node",
):
self.name = container
self._teardown = teardown
self.prompt_path = prompt_path_in_container
self._agent_prompt_mode = agent_prompt_mode
self.agent_command = agent_command
self.terminal_title = terminal_title
self.terminal_color = terminal_color
self.agent_provider_template = agent_provider_template
self.agent_workdir = agent_workdir
self.agent_provider_template = (
"codex" if agent_command == "codex" else "claude"
)
self._closed = False
def agent_argv(
@@ -49,17 +43,13 @@ class DockerBottle(Bottle):
cmd = ["docker", "exec"]
if tty:
cmd.append("-it")
if self.agent_workdir and self.agent_workdir != "/home/node":
cmd.extend(["-w", self.agent_workdir])
cmd.extend([self.name, self.agent_command, *full_argv])
return cmd
def exec_agent(self, argv: list[str], *, tty: bool = True) -> int:
agent_argv = self.agent_argv(argv, tty=tty)
script = exec_shell_script(agent_argv, self.terminal_title, self.terminal_color) if tty else None
if script is None:
return subprocess.run(agent_argv, check=False).returncode
return subprocess.run(["sh", "-lc", script], check=False).returncode
return subprocess.run(
self.agent_argv(argv, tty=tty), check=False,
).returncode
def exec(self, script: str, *, user: str = "node") -> ExecResult:
# Pipe via stdin to `sh -s` so the caller never has to worry
+14 -19
View File
@@ -11,6 +11,7 @@ from dataclasses import dataclass, field
from pathlib import Path
from ...agent_provider import PromptMode
from ...pipelock import PipelockProxyPlan
from .. import BottlePlan
@@ -22,32 +23,26 @@ class DockerBottlePlan(BottlePlan):
`agent_provision` from BottlePlan."""
slug: str
container_name: str
container_name_pinned: bool
image: str
derived_image: str # "" -> no derived image
runtime_image: str # image actually launched (derived or base)
# Absolute path to the Dockerfile that builds `image`. Empty means
# use the repo's default Dockerfile. Populated to a per-bottle
# state file (~/.bot-bottle/state/<slug>/Dockerfile) after a
# capability-block remediation (PRD 0016).
dockerfile_path: str
env_file: Path # docker --env-file: NAME=VALUE literals
# name -> value for vars forwarded into the docker-run child process
# via subprocess env (so values never land on argv or in a file).
# repr=False keeps secret/interpolated/OAuth values out of any
# accidental log of the plan dataclass.
forwarded_env: dict[str, str] = field(repr=False)
prompt_file: Path
proxy_plan: PipelockProxyPlan
use_runsc: bool
@property
def container_name(self) -> str:
return self.agent_provision.instance_name
@property
def image(self) -> str:
return self.agent_provision.image
@property
def dockerfile_path(self) -> str:
"""Absolute path to the Dockerfile that builds `image`. Sourced
from the agent provision plan — the manifest may override per
bottle; otherwise the provider plugin's bundled Dockerfile."""
return self.agent_provision.dockerfile
@property
def prompt_file(self) -> Path:
return self.agent_provision.prompt_file
@property
def agent_command(self) -> str:
return self.agent_provision.command
@@ -37,7 +37,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import cast
from . import supervise as _supervise
from ... import supervise as _supervise
from . import util as docker_mod
# Directory layout: ~/.bot-bottle/state/<identity>/...
@@ -48,6 +49,7 @@ _TRANSCRIPT_SUBDIR = "transcript"
# live here so chunk 3's `docker compose up` can find them at stable
# paths. Each sidecar's `prepare()` writes config + CAs into its own
# subdir; the launch step is unchanged today (still `docker cp`).
_PIPELOCK_SUBDIR = "pipelock"
_EGRESS_SUBDIR = "egress"
_GIT_GATE_SUBDIR = "git-gate"
_SUPERVISE_SUBDIR = "supervise"
@@ -55,8 +57,8 @@ _AGENT_SUBDIR = "agent"
_METADATA_NAME = "metadata.json"
# Live-config dir bind-mounted into the supervise sidecar (read-only).
# Host's apply paths keep these files fresh so supervise's
# `list-egress-routes` MCP tool returns the current state —
# not a snapshot from launch time.
# `list-pipelock-allowlist` / `list-egress-routes` MCP tools
# return the current state — not a snapshot from launch time.
_LIVE_CONFIG_SUBDIR = "live-config"
LIVE_CONFIG_ROUTES_NAME = "routes.yaml"
LIVE_CONFIG_ALLOWLIST_NAME = "allowlist"
@@ -81,7 +83,6 @@ def bottle_identity(agent_name: str) -> str:
To continue an existing bottle's state, use the recorded
identity from BottleMetadata via `cli.py resume <identity>`,
not this function."""
from .backend.docker import util as docker_mod
slug = docker_mod.slugify(agent_name)
suffix = "".join(secrets.choice(_SUFFIX_ALPHABET) for _ in range(_RANDOM_SUFFIX_LEN))
return f"{slug}-{suffix}"
@@ -109,8 +110,6 @@ class BottleMetadata:
# for state dirs written before PRD 0040; callers default to "docker"
# for backward compatibility.
backend: str = ""
label: str = ""
color: str = ""
def metadata_path(identity: str) -> Path:
@@ -146,8 +145,6 @@ def read_metadata(identity: str) -> BottleMetadata | None:
started_at=str(raw_typed.get("started_at", "")),
compose_project=str(raw_typed.get("compose_project", "")),
backend=str(raw_typed.get("backend", "")),
label=str(raw_typed.get("label", "")),
color=str(raw_typed.get("color", "")),
)
@@ -237,6 +234,12 @@ def transcript_snapshot_dir(identity: str) -> Path:
# nothing requested preservation.
def pipelock_state_dir(identity: str) -> Path:
"""State subdir for the pipelock sidecar: pipelock.yaml + the
per-bottle CA cert/key. Bind-mount source from chunk 3 onward."""
return bottle_state_dir(identity) / _PIPELOCK_SUBDIR
def egress_state_dir(identity: str) -> Path:
"""State subdir for the egress sidecar: routes.yaml + the
per-bottle mitmproxy CA. Bind-mount source from chunk 3 onward."""
@@ -322,6 +325,7 @@ __all__ = [
"per_bottle_dockerfile",
"per_bottle_dockerfile_path",
"per_bottle_image_tag",
"pipelock_state_dir",
"preserve_marker_path",
"read_metadata",
"supervise_state_dir",
+11 -4
View File
@@ -32,10 +32,10 @@ from __future__ import annotations
import shutil
import subprocess
from pathlib import Path
from ...agent_provider import get_provider
from ...log import info, warn
from ...bottle_state import (
from .bottle_state import (
mark_preserved,
per_bottle_dockerfile,
transcript_snapshot_dir,
@@ -93,11 +93,11 @@ def fetch_current_dockerfile(slug: str) -> str:
override = per_bottle_dockerfile(slug)
if override is not None:
return override
repo_dockerfile = get_provider("claude").dockerfile
repo_dockerfile = _repo_dockerfile_path()
if repo_dockerfile.is_file():
return repo_dockerfile.read_text()
raise CapabilityApplyError(
f"no per-bottle Dockerfile for {slug} and no provider Dockerfile at "
f"no per-bottle Dockerfile for {slug} and no repo Dockerfile at "
f"{repo_dockerfile}"
)
@@ -125,6 +125,13 @@ def apply_capability_change(slug: str, new_dockerfile: str) -> tuple[str, str]:
# --- Internals -------------------------------------------------------------
def _repo_dockerfile_path() -> Path:
"""Path to the repo's Claude Dockerfile (one dir above this module's
package root). Resolved at call time so the path is correct
regardless of where this module is imported from."""
# bot_bottle/backend/docker/capability_apply.py -> repo root
return Path(__file__).resolve().parent.parent.parent.parent / "Dockerfile.claude"
def snapshot_transcript(slug: str) -> None:
"""`docker cp` /home/node/.claude out of the agent container into
+1 -1
View File
@@ -31,7 +31,7 @@ from ... import supervise as _supervise
from ...log import info, warn
from . import util as docker_mod
from .bottle_cleanup_plan import DockerBottleCleanupPlan
from ...bottle_state import bottle_state_dir, is_preserved
from .bottle_state import bottle_state_dir, is_preserved
from .compose import COMPOSE_PROJECT_PREFIX, list_compose_projects
+112 -42
View File
@@ -7,14 +7,34 @@ two networks, no named volumes.
Pure function. No I/O, no subprocess. Expects every launch-time
field (network names, CA host paths, etc.) on the plan's inner
plans to be populated; chunks 2+3 own that ordering.
plans to be populated; chunks 2+3 own that ordering. Chunk 1 just
encodes the translation so it can be unit-tested in isolation.
Conditional services follow the plan content:
Conditional services follow the plan content (matches the
SDK-call branching in `launch.py` today):
- agent + sidecars bundle: always.
- git-gate: iff plan.git_gate_plan.upstreams.
- egress: iff plan.egress_plan.routes.
- supervise: iff plan.supervise_plan is not None.
- pipelock + agent: always.
- git-gate: iff plan.git_gate_plan.upstreams.
- egress: iff plan.egress_plan.routes.
- supervise: iff plan.supervise_plan is not None.
Naming:
- Compose project: `bot-bottle-<slug>`.
- Service names (inside the file): `agent`, `pipelock`,
`egress`, `git-gate`, `supervise`.
- `container_name:` matches today's pattern
(`bot-bottle-<service>-<slug>`) so dashboard/cleanup discovery
via the prefix scan keeps working through the transition.
- Network aliases preserve the current dial-by-shortname pattern
for `egress` / `supervise`, and add the long container-name as
an internal-network alias for `pipelock` / `git-gate` so any
caller still referencing the long name resolves.
Sidecars that are built (egress, git-gate, supervise) get a
compose `build:` block pointing at the repo Dockerfile; the
`image:` tag is set explicitly so cached images on the daemon
aren't rebuilt on every up.
"""
from __future__ import annotations
@@ -31,6 +51,7 @@ from ...egress import (
)
from ...git_gate import GIT_GATE_HOSTNAME
from ...log import die, warn
from ...pipelock import PIPELOCK_HOSTNAME
from ...supervise import (
CURRENT_CONFIG_DIR_IN_AGENT,
QUEUE_DIR_IN_CONTAINER,
@@ -42,7 +63,7 @@ from ..util import AGENT_CA_BUNDLE, AGENT_CA_PATH
from .bottle_plan import DockerBottlePlan
from .egress import (
EGRESS_CA_IN_CONTAINER,
EGRESS_PORT,
EGRESS_PIPELOCK_CA_IN_CONTAINER,
)
from .git_gate import (
GIT_GATE_ACCESS_HOOK_IN_CONTAINER,
@@ -50,7 +71,11 @@ from .git_gate import (
GIT_GATE_ENTRYPOINT_IN_CONTAINER,
GIT_GATE_HOOK_IN_CONTAINER,
)
from . import network as network_mod
from ...pipelock import (
PIPELOCK_CA_CERT_IN_CONTAINER,
PIPELOCK_CA_KEY_IN_CONTAINER,
)
from .pipelock import PIPELOCK_PORT
from .sidecar_bundle import (
SIDECAR_BUNDLE_DOCKERFILE,
SIDECAR_BUNDLE_IMAGE,
@@ -58,26 +83,20 @@ from .sidecar_bundle import (
)
# Repo root or installed site-packages root, used as the build context for
# Dockerfiles that COPY bot_bottle source files.
# Repo root, used as the build context for the bundle Dockerfile.
_REPO_DIR = str(Path(__file__).resolve().parent.parent.parent.parent)
def _sidecar_bundle_dockerfile() -> str:
if (Path(_REPO_DIR) / SIDECAR_BUNDLE_DOCKERFILE).is_file():
return SIDECAR_BUNDLE_DOCKERFILE
return f"bot_bottle/{SIDECAR_BUNDLE_DOCKERFILE}"
def bottle_plan_to_compose(plan: DockerBottlePlan) -> dict[str, Any]:
"""Render a Compose v2 spec dict from a fully-resolved
DockerBottlePlan.
The plan must have its inner plans (`git_gate_plan`,
`egress_plan`, `supervise_plan`) populated with launch-time
fields — network names, CA host paths. The renderer doesn't
validate; callers feed it a fully-resolved plan or get an
incomplete compose spec back.
The plan must have its inner plans (`proxy_plan`,
`git_gate_plan`, `egress_plan`, `supervise_plan`) populated
with launch-time fields — network names, CA host paths,
pipelock_proxy_url. The renderer doesn't validate; callers
feed it a fully-resolved plan or get an incomplete compose
spec back.
"""
project = f"bot-bottle-{plan.slug}"
services: dict[str, Any] = {
@@ -99,11 +118,11 @@ def _networks(plan: DockerBottlePlan) -> dict[str, Any]:
bridge."""
return {
"internal": {
"name": network_mod.network_name_for_slug(plan.slug),
"name": plan.proxy_plan.internal_network,
"internal": True,
},
"egress": {
"name": network_mod.network_egress_name_for_slug(plan.slug),
"name": plan.proxy_plan.egress_network,
},
}
@@ -123,12 +142,29 @@ def _bind(host: str | Path, target: str, *, read_only: bool = True) -> dict[str,
def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
"""The `sidecars` service: one container per bottle, bundle
image, all daemons under a Python init supervisor.
image, all four daemons under a Python init supervisor.
Daemon subset narrows via `BOT_BOTTLE_SIDECAR_DAEMONS` env.
egress is always present; git-gate / supervise are conditional.
Mechanics:
- Daemon subset narrows via `BOT_BOTTLE_SIDECAR_DAEMONS`
env. pipelock is always present; egress / git-gate /
supervise are conditional on the plan.
- Volumes are the union of the four daemons' bind-mounts,
preserving the same in-container paths so each daemon
finds its config / hooks / CA where it expects.
- Environment is the union of *daemon-private* env vars
(EGRESS_UPSTREAM_PROXY, SUPERVISE_BOTTLE_SLUG, etc).
HTTPS_PROXY is NOT propagated here — see the comment in
egress_entrypoint.sh; setting it at the container level
would route git-gate's git fetches through pipelock,
which is wrong.
- Network aliases register every legacy short/long
hostname (pipelock, egress, git-gate, supervise plus
their `bot-bottle-<service>-<slug>` long forms) so
the agent's HTTPS_PROXY URL and any other inter-service
reference resolves to the bundle.
"""
daemons: list[str] = ["egress"]
daemons: list[str] = ["egress", "pipelock"]
if plan.git_gate_plan.upstreams:
daemons.append("git-gate")
if plan.supervise_plan is not None:
@@ -137,15 +173,31 @@ def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
env: list[str] = [f"BOT_BOTTLE_SIDECAR_DAEMONS={','.join(daemons)}"]
volumes: list[dict[str, Any]] = []
# --- egress -------------------------------------------------------
# --- pipelock ----------------------------------------------------
pp = plan.proxy_plan
volumes += [
_bind(pp.yaml_path, "/etc/pipelock.yaml"),
_bind(pp.ca_cert_host_path, PIPELOCK_CA_CERT_IN_CONTAINER),
_bind(pp.ca_key_host_path, PIPELOCK_CA_KEY_IN_CONTAINER),
]
# --- egress (always part of the bundle; the EGRESS_UPSTREAM_*
# env vars + ca bind-mounts are needed iff routes exist; when
# the bottle has no routes the egress daemon falls back to its
# `regular@9099` mode and is unused) -----------------------------
ep = plan.egress_plan
volumes.append(_bind(ep.mitmproxy_ca_host_path, EGRESS_CA_IN_CONTAINER))
if ep.routes:
volumes.append(_bind(ep.routes_path, EGRESS_ROUTES_IN_CONTAINER))
env.append(f"EGRESS_UPSTREAM_PROXY={ep.pipelock_proxy_url}")
env.append(f"EGRESS_UPSTREAM_CA={EGRESS_PIPELOCK_CA_IN_CONTAINER}")
volumes += [
_bind(ep.routes_path, EGRESS_ROUTES_IN_CONTAINER),
_bind(ep.mitmproxy_ca_host_path, EGRESS_CA_IN_CONTAINER),
_bind(ep.pipelock_ca_host_path, EGRESS_PIPELOCK_CA_IN_CONTAINER),
]
for token_env in sorted(ep.token_env_map.keys()):
env.append(token_env)
# --- git-gate -----------------------------------------------------
# --- git-gate ----------------------------------------------------
gp = plan.git_gate_plan
if gp.upstreams:
volumes += [
@@ -165,7 +217,7 @@ def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
f"{GIT_GATE_CREDS_DIR_IN_CONTAINER}/{u.name}-known_hosts",
))
# --- supervise ----------------------------------------------------
# --- supervise ---------------------------------------------------
sp = plan.supervise_plan
if sp is not None:
env += [
@@ -180,7 +232,13 @@ def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
"read_only": False,
})
internal_aliases = [EGRESS_HOSTNAME]
# Internal-network aliases: the agent reaches each daemon through
# its short name (pipelock / egress / git-gate / supervise) which
# the bundle answers as if it were the daemon itself.
internal_aliases = [
PIPELOCK_HOSTNAME,
EGRESS_HOSTNAME,
]
if gp.upstreams:
internal_aliases.append(GIT_GATE_HOSTNAME)
if sp is not None:
@@ -190,7 +248,7 @@ def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
"image": SIDECAR_BUNDLE_IMAGE,
"build": {
"context": _REPO_DIR,
"dockerfile": _sidecar_bundle_dockerfile(),
"dockerfile": SIDECAR_BUNDLE_DOCKERFILE,
},
"container_name": sidecar_bundle_container_name(plan.slug),
"networks": {
@@ -205,8 +263,11 @@ def _sidecar_bundle_service(plan: DockerBottlePlan) -> dict[str, Any]:
def _agent_service(plan: DockerBottlePlan) -> dict[str, Any]:
"""Agent container. Runs `sleep infinity`; claude is `docker
exec -it`'d into it later. HTTP_PROXY/HTTPS_PROXY point at the
egress sidecar."""
exec -it`'d into it later. No TTY at the container level —
interactivity is per-exec. HTTP_PROXY/HTTPS_PROXY point at the
egress short-alias when an egress is declared, otherwise
straight at pipelock's container name. CA trust trio matches
the existing launch.py wiring."""
proxy_url = _agent_proxy_url(plan)
no_proxy = _agent_no_proxy(plan)
env: list[str] = [
@@ -229,7 +290,7 @@ def _agent_service(plan: DockerBottlePlan) -> dict[str, Any]:
env.append(name)
service: dict[str, Any] = {
"image": plan.image,
"image": plan.runtime_image,
"container_name": plan.container_name,
"command": ["sleep", "infinity"],
"networks": {"internal": None},
@@ -237,6 +298,8 @@ def _agent_service(plan: DockerBottlePlan) -> dict[str, Any]:
}
if plan.use_runsc:
service["runtime"] = "runsc"
if plan.env_file and plan.env_file.exists() and plan.env_file.stat().st_size > 0:
service["env_file"] = [str(plan.env_file)]
volumes: list[dict[str, Any]] = []
if plan.supervise_plan is not None:
@@ -256,14 +319,21 @@ def _agent_service(plan: DockerBottlePlan) -> dict[str, Any]:
def _agent_proxy_url(plan: DockerBottlePlan) -> str:
"""Agent's HTTP_PROXY — always points at egress."""
return f"http://{EGRESS_HOSTNAME}:{EGRESS_PORT}"
"""Pick the agent's HTTP_PROXY. With egress declared, the agent
goes through egress (which in turn HTTPS_PROXYs to pipelock on
its outbound leg). Without egress, the agent talks straight to
pipelock."""
if plan.egress_plan.routes:
from .egress import EGRESS_PORT
return f"http://{EGRESS_HOSTNAME}:{EGRESS_PORT}"
return f"http://{PIPELOCK_HOSTNAME}:{PIPELOCK_PORT}"
def _agent_no_proxy(plan: DockerBottlePlan) -> str:
"""NO_PROXY for the agent: loopback always; supervise hostname
when the supervise sidecar is up (MCP long-poll must bypass
the egress proxy)."""
"""NO_PROXY for the agent. Matches the launch.py rules:
loopback always, supervise hostname when the supervise sidecar
is up (the MCP long-poll pattern needs to bypass pipelock's
idle timeout)."""
hosts = ["localhost", "127.0.0.1"]
if plan.supervise_plan is not None:
hosts.append(SUPERVISE_HOSTNAME)
+17 -3
View File
@@ -22,8 +22,14 @@ from ...log import die
EGRESS_PORT = int(os.environ.get("BOT_BOTTLE_EGRESS_PORT", "9099"))
# In-container path for mitmproxy's CA. The format is a single PEM
# file holding BOTH the cert and the private key, concatenated.
# file holding BOTH the cert and the private key, concatenated. The
# upstream-trust CA (pipelock's, so egress trusts the upstream
# leg) is a separate file because pipelock keeps a different CA on
# its end.
EGRESS_CA_IN_CONTAINER = "/home/mitmproxy/.mitmproxy/mitmproxy-ca.pem"
EGRESS_PIPELOCK_CA_IN_CONTAINER = (
"/home/mitmproxy/.mitmproxy/pipelock-ca.pem"
)
def egress_tls_init(stage_dir: Path) -> tuple[Path, Path]:
@@ -36,8 +42,16 @@ def egress_tls_init(stage_dir: Path) -> tuple[Path, Path]:
trust store by `provision_ca` so the agent trusts the bumped
CONNECT cert egress presents.
openssl req's `subjectKeyIdentifier=hash` extension uses
SHA-1(pubkey), matching mitmproxy's AKI computation on leaves.
Why openssl req (not the pipelock binary's `tls init`):
pipelock's CA generator stamps a non-standard `Subject Key
Identifier` on the CA (random rather than SHA-1 of the pubkey).
mitmproxy computes the `Authority Key Identifier` on each leaf
it mints as SHA-1(issuer's pubkey). openssl's chain validator
uses the leaf's AKI to find the issuer cert by SKI; pipelock's
SKI doesn't match → openssl reports "unable to get local issuer
certificate" even though the CA is right there in the trust
store. openssl req's `subjectKeyIdentifier=hash` extension uses
SHA-1(pubkey), matching mitmproxy's computation.
Both files live under `<stage_dir>/egress-ca/` (mode 644 —
`docker cp` preserves the mode into the container, where the
+309 -6
View File
@@ -1,25 +1,88 @@
"""Host-side helper for egress sidecar inspection (issue #198).
"""Host-side helper to apply a routes.yaml change to a running
egress sidecar (PRD 0014 retargeted by PRD 0017 chunk 3).
`_merge_single_route`, `add_route`, and `apply_routes_change` were
removed when the egress-block MCP tool was dropped. The remaining
helpers support runtime inspection and validation of the routes file
without modifying it at runtime.
Used by the supervise dashboard when the operator approves an
egress-block proposal (or runs the operator-initiated
`routes edit <bottle>` verb). Fetches the current routes.yaml via
`docker exec cat`, validates the new content, writes it into the
sidecar via `docker cp`, then `docker kill --signal HUP` to make
the addon reload without dropping connections.
Also mirrors the new route hosts into pipelock's hostname allowlist
so the downstream leg lets them through — egress enforces
the path-aware allowlist on the agent leg, pipelock enforces the
hostname allowlist + DLP body scan on the upstream leg, and a
host added to one must be in the other or the request 403s
somewhere along the chain.
Raises EgressApplyError on any failure — the dashboard
surfaces the message and keeps the proposal pending so the
operator can retry.
"""
from __future__ import annotations
import json
import re
import subprocess
from pathlib import Path
from typing import cast
from ...egress import EGRESS_ROUTES_IN_CONTAINER
from ...egress_addon_core import load_routes
from ...yaml_subset import YamlSubsetError, parse_yaml_subset
from .bottle_state import egress_state_dir
from .sidecar_bundle import sidecar_bundle_container_name
from .pipelock_apply import (
PipelockApplyError,
apply_allowlist_change,
fetch_current_allowlist,
parse_allowlist_content,
render_allowlist_content,
)
def _render_routes_payload(routes_list: list[dict[str, object]]) -> str:
"""Render a list-of-dicts routes payload as YAML matching the
shape `egress_render_routes` produces. The apply path
round-trips current routes.yaml through this so the file the
sidecar sees stays in the YAML format the addon expects."""
if not routes_list:
return "routes: []\n"
lines: list[str] = ["routes:"]
for entry in routes_list:
host = str(entry.get("host", ""))
lines.append(f' - host: "{host}"')
auth_scheme = entry.get("auth_scheme")
token_env = entry.get("token_env")
if auth_scheme and token_env:
lines.append(f' auth_scheme: "{auth_scheme}"')
lines.append(f' token_env: "{token_env}"')
paths_obj = entry.get("path_allowlist")
paths = cast(list[str], paths_obj) if isinstance(paths_obj, list) else []
if paths:
lines.append(" path_allowlist:")
for p in paths:
lines.append(f' - "{p}"')
return "\n".join(lines) + "\n"
def _egress_routes_host_path(slug: str) -> Path:
"""The bind-mount source for the egress sidecar's routes.yaml.
Must match what egress.prepare wrote at chunk-2 paths."""
return egress_state_dir(slug) / "egress_routes.yaml"
class EgressApplyError(RuntimeError):
pass
"""Raised when fetch / apply fails. Caller renders to the
operator; does not crash the dashboard."""
def fetch_current_routes(slug: str) -> str:
"""Read the live routes.yaml from the running egress sidecar
for `slug`. Returns the file content as a string. Raises
EgressApplyError if the sidecar isn't reachable or the read
fails."""
container = sidecar_bundle_container_name(slug)
r = subprocess.run(
["docker", "exec", container, "cat", EGRESS_ROUTES_IN_CONTAINER],
@@ -34,6 +97,9 @@ def fetch_current_routes(slug: str) -> str:
def validate_routes_content(content: str) -> None:
"""Syntactic check before SIGHUP — the addon's reload also
validates, but failing here keeps the old routes live and gives
the operator a clearer error than the addon's stderr line."""
try:
load_routes(content)
except ValueError as e:
@@ -42,8 +108,245 @@ def validate_routes_content(content: str) -> None:
) from e
def _hosts_in_routes(content: str) -> list[str]:
"""Extract the host list from a routes.yaml content string.
Uses the addon's own parser so any host the addon will match on
also lands in pipelock's allowlist. Returns sorted+deduped."""
try:
routes = load_routes(content)
except ValueError as e:
raise EgressApplyError(
f"proposed routes.yaml is not valid: {e}"
) from e
return sorted({r.host for r in routes if r.host})
# Pipelock's allowlist parser accepts only literal hostnames:
# `[A-Za-z0-9_.-]+`. Anything else (wildcards, IPv6 literals,
# stray characters) is silently dropped from the mirror so the
# pipelock apply doesn't fail parse before the new yaml is even
# written. The dropped hosts stay on egress's route table —
# but the addon does exact-host match only, so they'll never
# match anything either. (Wildcard host matching was removed —
# see `match_route` in egress_addon_core for the rationale.)
_PIPELOCK_HOST_RE = re.compile(r"^[A-Za-z0-9_.-]+$")
def _pipelock_safe_hosts(hosts: list[str]) -> list[str]:
"""Drop any host pipelock's allowlist parser would reject.
Order preserved."""
return [h for h in hosts if _PIPELOCK_HOST_RE.match(h)]
def _mirror_hosts_to_pipelock(slug: str, hosts: list[str]) -> None:
"""Ensure every pipelock-compatible `hosts` entry is on
pipelock's allowlist. Fetches pipelock's current allowlist,
merges, re-applies. Hosts pipelock can't represent (wildcards,
etc.) are silently skipped — they stay live on egress
but aren't enforced at pipelock. No-op if every host is already
present (apply still restarts pipelock if any host is new).
Raises EgressApplyError on pipelock failures so the
caller's diff/audit reflects the half-state."""
safe_hosts = _pipelock_safe_hosts(hosts)
try:
current = fetch_current_allowlist(slug)
existing = parse_allowlist_content(current)
merged = sorted(set(existing) | set(safe_hosts))
if merged == sorted(existing):
return # nothing to add
apply_allowlist_change(slug, render_allowlist_content(merged))
except PipelockApplyError as e:
# Mirror runs BEFORE the egress write, so egress
# is unchanged on this failure path. Report it as a
# pipelock-side problem so the operator looks in the right
# place; their `pipelock edit` flow can repair manually.
raise EgressApplyError(
f"pipelock allowlist mirror failed (egress NOT "
f"updated): {e}. Fix pipelock's allowlist manually with "
f"`pipelock edit <bottle>` then retry the proposal."
) from e
def apply_routes_change(slug: str, new_content: str) -> tuple[str, str]:
"""Apply `new_content` to the egress sidecar for `slug`:
1. Fetch current routes.yaml (for the before-diff).
2. Validate the new content via the addon's own parser.
3. Mirror the route hosts onto pipelock's allowlist (so the
downstream hostname gate lets them through).
4. Write to a temp file, `docker cp` into the egress
sidecar.
5. `docker kill --signal HUP` so the addon reloads.
Order matters: pipelock first, then egress. If the
pipelock step fails, egress hasn't been touched and the
old routes stay live. If the egress step fails after
pipelock succeeded, pipelock has the host in its allowlist but
egress doesn't enforce it yet — harmless extra-permissive
state at pipelock, and a re-approval will land the egress
side.
Returns (before, after) where `after` == `new_content`. Raises
EgressApplyError on any step."""
container = sidecar_bundle_container_name(slug)
before = fetch_current_routes(slug)
validate_routes_content(new_content)
# Pipelock mirror first — if it fails, egress stays intact
# and the operator gets a clear error about the half-state.
_mirror_hosts_to_pipelock(slug, _hosts_in_routes(new_content))
# routes.yaml is bind-mounted into the egress container as a
# SINGLE FILE. Docker single-file bind mounts pin the source
# inode at mount time; write-temp-then-rename swaps the inode
# on the host, which leaves the container's mount pointing at
# the now-orphaned old inode (so the SIGHUP'd reload re-reads
# unchanged content). Write in-place instead. Lose file-level
# atomicity, but the apply path issues SIGHUP only AFTER the
# write returns, and the addon's `load_routes` raises
# `ValueError` on a partial read and keeps the previous
# in-memory routes — so a SIGHUP that hypothetically raced an
# in-flight write is non-disruptive.
target = _egress_routes_host_path(slug)
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(new_content)
# mitmproxy in the container reads through the bind mount as
# uid 1000; the host file has to be world-readable for that
# read to succeed (parent dir at 0o700 still restricts who
# can reach the file on the host). Routes content is not
# secret — tokens live in the container's environ — so 0o644
# is the right trade-off.
target.chmod(0o644)
sig = subprocess.run(
["docker", "kill", "--signal", "HUP", container],
capture_output=True, text=True, check=False,
)
if sig.returncode != 0:
raise EgressApplyError(
f"failed to SIGHUP {container}: "
f"{(sig.stderr or '').strip()}"
)
return before, new_content
def _merge_single_route(
current_yaml: str, new_route: dict[str, object],
) -> str:
"""Merge a single proposed route into the current routes.yaml
content, returning the merged YAML string.
Behavior:
- If `new_route['host']` is NOT in the current routes →
append the route.
- If the host IS already present → union the path_allowlist
entries (proposed existing). The existing `auth_scheme`
and `token_env` are preserved — agent-proposed auth changes
on an existing host are ignored, matching the tool's
documented semantics.
Round-trips the file through `yaml_subset` (the same parser
the addon uses), so the merged output is in the YAML format
the sidecar reads. Token VALUES never appear here; the routes
file carries only env-var slot NAMES."""
try:
cfg = parse_yaml_subset(current_yaml)
except YamlSubsetError as e:
raise EgressApplyError(
f"current routes.yaml is not valid YAML: {e}"
) from e
routes = cfg.get("routes")
if not isinstance(routes, list):
raise EgressApplyError(
"current routes.yaml: 'routes' is not a list"
)
routes_typed = cast(list[object], routes)
new_host = str(new_route.get("host", "")).lower()
if not new_host:
raise EgressApplyError(
"proposed route is missing 'host'"
)
proposed_paths_obj = new_route.get("path_allowlist")
proposed_paths = cast(list[str], proposed_paths_obj) if isinstance(proposed_paths_obj, list) else []
# Look for an existing entry with the same host (case-insensitive).
for entry in routes_typed:
if not isinstance(entry, dict):
continue
entry_typed = cast(dict[str, object], entry)
if str(entry_typed.get("host", "")).lower() == new_host:
# Merge path_allowlist: union proposed + existing, ordered
# by first-seen so existing paths stay in original order.
existing_paths_obj = entry_typed.get("path_allowlist")
existing_paths = cast(list[str], existing_paths_obj) if isinstance(existing_paths_obj, list) else []
seen = {p: None for p in existing_paths}
for p in proposed_paths:
seen.setdefault(p, None)
merged_paths = list(seen.keys())
if merged_paths:
entry_typed["path_allowlist"] = merged_paths
# Preserve existing auth — tool description says agent-
# proposed auth on an existing host is ignored.
break
else:
# Host not present; build a new route entry from the
# proposed fields. Need to assign a token_env slot if
# `auth` was proposed (otherwise the addon's parser rejects
# a half-set auth pair). Slots: count existing slots, pick
# the next free index.
entry_typed: dict[str, object] = {"host": new_route.get("host")} # type: ignore
if proposed_paths:
entry_typed["path_allowlist"] = proposed_paths
auth = new_route.get("auth")
if isinstance(auth, dict) and auth.get("scheme") and auth.get("token_ref"): # type: ignore
auth_typed = cast(dict[str, object], auth)
existing_slots = sorted({
str(r_entry.get("token_env", ""))
for r_entry_obj in routes_typed
if isinstance(r_entry_obj, dict)
for r_entry in [cast(dict[str, object], r_entry_obj)]
if r_entry.get("token_env")
})
next_idx = len(existing_slots)
entry_typed["auth_scheme"] = str(cast(object, auth_typed.get("scheme")))
entry_typed["token_env"] = f"EGRESS_TOKEN_{next_idx}"
# NOTE: the addon reads token VALUES from its container's
# environ keyed by token_env. A newly-added auth route at
# runtime points at a slot that has no env value → the
# addon will 403 with "token env unset" until the operator
# arranges for the value to land in the container's env.
# Recording this here so the operator-facing diff carries
# the slot name they'll need to provision.
routes_typed.append(entry_typed)
return _render_routes_payload(cast(list[dict[str, object]], routes_typed))
def add_route(slug: str, proposed_route_json: str) -> tuple[str, str]:
"""Apply a single-route addition to the egress. Parses the
agent's proposed route, fetches the current routes file, merges,
and applies via `apply_routes_change`. Returns (before, after)
full-file content for the audit log."""
try:
proposed = json.loads(proposed_route_json)
except json.JSONDecodeError as e:
raise EgressApplyError(
f"proposed route is not valid JSON: {e}"
) from e
if not isinstance(proposed, dict):
raise EgressApplyError(
"proposed route must be a JSON object"
)
current = fetch_current_routes(slug)
merged = _merge_single_route(current, proposed)
return apply_routes_change(slug, merged)
__all__ = [
"EgressApplyError",
"add_route",
"apply_routes_change",
"fetch_current_routes",
"validate_routes_content",
]
+1 -3
View File
@@ -15,7 +15,7 @@ from __future__ import annotations
import subprocess
from .. import ActiveAgent
from ...bottle_state import read_metadata
from .bottle_state import read_metadata
from .compose import compose_project_name, list_active_slugs
@@ -39,8 +39,6 @@ def enumerate_active() -> list[ActiveAgent]:
agent_name=metadata.agent_name if metadata else "?",
started_at=metadata.started_at if metadata else "",
services=tuple(sorted(services)),
label=metadata.label if metadata else "",
color=metadata.color if metadata else "",
))
return out
+58 -20
View File
@@ -4,19 +4,25 @@ PRD 0018 chunk 3: each instance is one `docker compose` project.
The flow is:
1. Build the agent image from the provider Dockerfile (compose
builds the sidecar images via the `build:` directive on first up).
2. Mint the per-bottle egress CA (chunk 2 writes it under
state/<slug>/egress/).
3. Populate the inner plans with launch-time fields so the
renderer can read network names, CA paths.
1. Build the agent's base + derived image (compose builds the
sidecar images via the `build:` directive on first up).
2. Pre-create the per-bottle networks. We do this outside compose
so we can inspect the assigned internal CIDR and embed it in
pipelock's yaml (compose's `external: true` lets the compose
file reference these pre-existing networks).
3. Mint the per-bottle CAs (chunk 2 writes them under
state/<slug>/{pipelock,egress}/).
4. Re-render pipelock yaml with the now-known internal CIDR so
the SSRF allowlist exempts the bottle's own subnet.
5. Populate the inner plans with launch-time fields so the
renderer can read network names, CA paths, pipelock URL.
6. Render the compose spec, write it to
state/<slug>/docker-compose.yml, write metadata.json.
7. `docker compose up -d` (token + OAuth values flow into the
compose subprocess env so `environment: [NAME]` bare-name
entries inherit without rendering values into the file).
8. Provision (CA install, prompt copy, skills, workspace, git,
supervise config) — unchanged, uses `docker exec` / `docker cp`.
8. Provision (CA install, prompt copy, skills, git, supervise
config) — unchanged, uses `docker exec`.
9. Yield a DockerBottle handle. `exec_agent` runs claude via
`docker exec -it` exactly like the pre-compose world.
@@ -43,10 +49,11 @@ from . import network as network_mod
from . import util as docker_mod
from .bottle import DockerBottle
from .bottle_plan import DockerBottlePlan
from ...bottle_state import (
from .bottle_state import (
bottle_state_dir,
egress_state_dir,
git_gate_state_dir,
pipelock_state_dir,
)
from .compose import (
bottle_plan_to_compose,
@@ -59,6 +66,10 @@ from .compose import (
write_compose_file,
)
from .egress import egress_tls_init
from .pipelock import (
BUNDLE_LOCAL_PIPELOCK_URL,
pipelock_tls_init,
)
# Where the repo root lives, for `docker build` context. Computed once.
@@ -97,14 +108,40 @@ def launch(
plan.image, _REPO_DIR,
dockerfile=plan.dockerfile_path,
)
if plan.derived_image:
docker_mod.build_image_with_cwd(
plan.derived_image, plan.image, plan.workspace_plan
)
# Networks: compose-managed. The names are derived
# deterministically from the slug so the renderer can put
# them on the services and `compose up` creates them with
# those names. The empirical spike confirmed pipelock's
# SSRF guard only checks proxied-request destinations, not
# source IPs — so the bottle's own internal CIDR doesn't
# need to be in `ssrf.ip_allowlist`. Pre-create + CIDR
# introspection are gone; compose owns the network
# lifecycle.
internal_network = network_mod.network_name_for_slug(plan.slug)
egress_network = network_mod.network_egress_name_for_slug(plan.slug)
# Mint per-bottle CAs into state/<slug>/{pipelock,egress}/.
ca_cert_host, ca_key_host = pipelock_tls_init(pipelock_state_dir(plan.slug))
egress_ca_host, egress_ca_cert_only = egress_tls_init(
egress_state_dir(plan.slug),
)
# Populate launch-time fields on every inner plan so the
# renderer reads concrete network names, CA paths, and
# pipelock URL.
proxy_plan = dataclasses.replace(
plan.proxy_plan,
internal_network=internal_network,
internal_network_cidr="",
egress_network=egress_network,
ca_cert_host_path=ca_cert_host,
ca_key_host_path=ca_key_host,
)
git_gate_plan = plan.git_gate_plan
if git_gate_plan.upstreams:
git_gate_plan = dataclasses.replace(
@@ -112,13 +149,17 @@ def launch(
internal_network=internal_network,
egress_network=egress_network,
)
egress_plan = dataclasses.replace(
plan.egress_plan,
internal_network=internal_network,
egress_network=egress_network,
mitmproxy_ca_host_path=egress_ca_host,
mitmproxy_ca_cert_only_host_path=egress_ca_cert_only,
)
egress_plan = plan.egress_plan
if egress_plan.routes:
egress_plan = dataclasses.replace(
egress_plan,
internal_network=internal_network,
egress_network=egress_network,
mitmproxy_ca_host_path=egress_ca_host,
mitmproxy_ca_cert_only_host_path=egress_ca_cert_only,
pipelock_ca_host_path=ca_cert_host,
pipelock_proxy_url=BUNDLE_LOCAL_PIPELOCK_URL,
)
supervise_plan = plan.supervise_plan
if supervise_plan is not None:
supervise_plan = dataclasses.replace(
@@ -127,6 +168,7 @@ def launch(
)
plan = dataclasses.replace(
plan,
proxy_plan=proxy_plan,
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
@@ -175,10 +217,6 @@ def launch(
None,
agent_command=plan.agent_command,
agent_prompt_mode=plan.agent_prompt_mode,
agent_provider_template=plan.agent_provider_template,
terminal_title=plan.spec.label or plan.spec.agent_name,
terminal_color=plan.spec.color,
agent_workdir=plan.workspace_plan.workdir,
)
bottle.prompt_path = provision(plan, bottle)
+14 -5
View File
@@ -1,10 +1,11 @@
"""Docker network plumbing for the per-agent egress topology.
The agent container sits on a Docker `--internal` network (no default
gateway). Egress straddles that network and a per-agent user-defined
bridge for upstream traffic. We deliberately do NOT use Docker's legacy
gateway). Pipelock straddles that network and a per-agent user-defined
bridge for upstream egress. We deliberately do NOT use Docker's legacy
`bridge` network because only user-defined bridges run Docker's
embedded DNS resolver, which egress needs to resolve upstream hostnames.
embedded DNS resolver, which pipelock needs to resolve api.anthropic.com
and similar upstream hostnames.
Naming: bot-bottle-net-<slug> (internal),
bot-bottle-egress-<slug> (egress). Numeric suffix on conflict
@@ -76,12 +77,20 @@ def network_create_internal(slug: str) -> str:
def network_create_egress(slug: str) -> str:
"""Create a per-agent user-defined bridge (NOT the legacy `bridge`)
so the egress sidecar has working DNS for upstream hostnames."""
so the pipelock sidecar has working DNS for upstream hostnames."""
return _network_create_with_prefix(network_egress_name_for_slug(slug), internal=False)
def network_inspect_cidr(name: str) -> str:
"""Return the IPv4 CIDR Docker assigned to a user-defined network."""
"""Return the IPv4 CIDR Docker assigned to a user-defined network.
Used by pipelock's SSRF guard exception: the bottle's internal
network sits in RFC1918 space, so pipelock's `internal:` list
would block any agent request whose destination resolves there
— including the cred-proxy sidecar's address. Adding the
network's CIDR to pipelock's `ssrf.ip_allowlist` lets traffic
targeted at the bottle's own sidecars through while pipelock
still body-scans and api_allowlist-gates as usual."""
result = subprocess.run(
["docker", "network", "inspect",
"--format", "{{range .IPAM.Config}}{{.Subnet}}{{end}}", name],
+74
View File
@@ -0,0 +1,74 @@
"""Docker-side pipelock helpers: image pin, container naming, and
the one-shot `pipelock tls init` host-side CA mint. The
prepare-time YAML rendering itself lives on the platform-neutral
`PipelockProxy` ABC — backends instantiate it directly.
The per-container `.start()` / `.stop()` lifecycle was deleted in
PRD 0024 chunk 3; compose-up owns the container lifecycle (PRD
0018) and the bundle path (PRD 0024) collapses pipelock + egress
+ git-gate + supervise into one container."""
from __future__ import annotations
import os
import subprocess
from pathlib import Path
from ...log import die
# Pipelock image, pinned by digest. The digest is the multi-arch image
# index for ghcr.io/luckypipewrench/pipelock:2.3.0.
PIPELOCK_IMAGE = os.environ.get(
"BOT_BOTTLE_PIPELOCK_IMAGE",
"ghcr.io/luckypipewrench/pipelock@sha256:"
"3b1a39417b98406ddc5dc2d8fcb42865ddc0c68a43d355db55f0f8cb06bc6de9",
)
# Listening port for pipelock's forward proxy.
PIPELOCK_PORT = os.environ.get("BOT_BOTTLE_PIPELOCK_PORT", "8888")
# The URL egress dials for its upstream HTTPS_PROXY. egress and pipelock
# share the same container's network namespace inside the sidecar bundle, so
# loopback reaches pipelock directly — no docker DNS aliases involved.
BUNDLE_LOCAL_PIPELOCK_URL = f"http://127.0.0.1:{PIPELOCK_PORT}"
def pipelock_tls_init(stage_dir: Path) -> tuple[Path, Path]:
"""Generate a fresh per-bottle CA via a one-shot pipelock container.
Runs `pipelock tls init` against a host-mounted scratch dir, leaving
`ca.pem` (public cert, mode 600) and `ca-key.pem` (private key, mode
600) under `<stage_dir>/pipelock-ca/`. Returns the two host paths.
The image is pinned (same digest the running sidecar uses) so the
generated CA matches what the sidecar expects. Output is owned by
whatever UID the one-shot ran as; the compose renderer's
bind-mounts pin the files in place at runtime, so ownership
inside the running sidecar (root in pipelock's distroless image)
is independent."""
work = stage_dir / "pipelock-ca"
work.mkdir(exist_ok=True)
result = subprocess.run(
["docker", "run", "--rm",
"-v", f"{work}:/h",
"-e", "PIPELOCK_HOME=/h",
PIPELOCK_IMAGE, "tls", "init"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
die(f"pipelock tls init failed: {result.stderr.strip()}")
cert = work / "ca.pem"
key = work / "ca-key.pem"
if not cert.is_file() or not key.is_file():
die(f"pipelock tls init did not produce ca files in {work}")
# Explicit perms in case a future pipelock release changes
# defaults. Pipelock runs as root in its distroless image and
# bind-mounts work with 0o600 (root reads everything); the key
# has no reason to be readable to anyone else on the host.
key.chmod(0o600)
cert.chmod(0o644)
return (cert, key)
+200
View File
@@ -0,0 +1,200 @@
"""pipelock_apply — host-side helper to apply an api_allowlist
change to a running pipelock sidecar (PRD 0015).
Used by the supervise dashboard when the operator approves a
pipelock-block proposal (or runs the operator-initiated `pipelock
edit <bottle>` verb). Fetches the current pipelock.yaml via `docker
exec`, parses it, swaps the api_allowlist with the proposed hosts,
re-renders, writes back via the bind-mount path, then signals the
bundle supervisor to restart the pipelock daemon (`docker kill
--signal USR1`) so
pipelock picks up the new config.
v1 uses restart, not SIGHUP — pipelock has no in-process reload
hook and adding one is the "SIGHUP reload for pipelock" open
question in PRD 0015. Restart drops in-flight outbound calls; the
agent's HTTP client retries pick up against the restarted proxy.
"""
from __future__ import annotations
import os
import re
import subprocess
import tempfile
from pathlib import Path
from ...pipelock import pipelock_render_yaml
from ...yaml_subset import YamlSubsetError, parse_yaml_subset
from .bottle_state import pipelock_state_dir
from .sidecar_bundle import sidecar_bundle_container_name
def _pipelock_yaml_host_path(slug: str) -> Path:
"""The bind-mount source for the pipelock sidecar's
pipelock.yaml — matches what pipelock.prepare wrote at chunk-2
paths."""
return pipelock_state_dir(slug) / "pipelock.yaml"
PIPELOCK_YAML_IN_CONTAINER = "/etc/pipelock.yaml"
# Allowlist proposals are one-hostname-per-line. Blank lines and
# `#`-prefixed comments are ignored. The character set matches the
# supervise sidecar's syntactic check on the agent's pipelock-block
# proposal (alphanumerics + dot/dash/underscore).
_HOST_OK = re.compile(r"^[A-Za-z0-9_.-]+$")
class PipelockApplyError(RuntimeError):
"""Raised when fetch / parse / apply fails. The dashboard renders
the message and keeps the proposal pending — never crashes."""
def parse_allowlist_content(content: str) -> list[str]:
"""One hostname per line. Blanks and `#` comments are ignored.
Raises PipelockApplyError if a line has a disallowed character."""
hosts: list[str] = []
for i, raw_line in enumerate(content.splitlines(), start=1):
line = raw_line.strip()
if not line or line.startswith("#"):
continue
if not _HOST_OK.match(line):
raise PipelockApplyError(
f"allowlist line {i}: {line!r} has disallowed characters"
)
hosts.append(line)
return hosts
def render_allowlist_content(hosts: list[str]) -> str:
"""Hosts → one-per-line string (the operator-facing format)."""
if not hosts:
return ""
return "\n".join(hosts) + "\n"
def fetch_current_yaml(slug: str) -> str:
"""Read the live /etc/pipelock.yaml from the sidecar bundle.
Uses `docker cp` because pipelock inside the bundle is the
distroless pipelock binary with no shell, and `docker cp` is a
daemon-API tarball copy that works regardless of what's
available inside the container.
Raises PipelockApplyError if the read fails."""
container = sidecar_bundle_container_name(slug)
fd, tmp_path = tempfile.mkstemp(prefix="cb-pipelock-fetch.", suffix=".yaml")
os.close(fd)
try:
r = subprocess.run(
[
"docker", "cp",
f"{container}:{PIPELOCK_YAML_IN_CONTAINER}", tmp_path,
],
capture_output=True, text=True, check=False,
)
if r.returncode != 0:
raise PipelockApplyError(
f"could not fetch pipelock.yaml from {container}: "
f"{(r.stderr or '').strip() or 'container not running?'}"
)
return Path(tmp_path).read_text(encoding="utf-8")
finally:
try:
Path(tmp_path).unlink()
except OSError:
pass
def fetch_current_allowlist(slug: str) -> str:
"""Fetch the live yaml, extract api_allowlist, render as one-per-
line — the operator-facing format for the TUI / agent's
current-config mount."""
yaml = fetch_current_yaml(slug)
try:
cfg = parse_yaml_subset(yaml)
except YamlSubsetError as e:
raise PipelockApplyError(f"running pipelock yaml: {e}") from e
hosts = cfg.get("api_allowlist", [])
if not isinstance(hosts, list):
raise PipelockApplyError(
"running pipelock yaml: api_allowlist is not a list"
)
return render_allowlist_content([str(h) for h in hosts])
def apply_allowlist_change(
slug: str, new_allowlist_content: str,
) -> tuple[str, str]:
"""Apply `new_allowlist_content` to the sidecar bundle:
1. Parse the proposed hosts (one per line).
2. Fetch + parse current pipelock.yaml.
3. Replace api_allowlist with the proposed hosts; re-render.
4. Write the new yaml to the bind-mount source.
5. `docker kill --signal USR1 <bundle>` so the supervisor
restarts the pipelock daemon in place (leaving egress,
git-gate, and supervise running). Pipelock has no
in-process reload; the supervisor's per-daemon restart
keeps the agent's MCP socket alive — a whole-bundle
`docker restart` would bounce supervise too.
Returns (before, after) where both are one-per-line allowlist
strings (operator-facing format). Raises PipelockApplyError on
any failure; the sidecar's existing config stays in place until
the host write succeeds, and the SIGUSR1 is what makes it
live."""
new_hosts = parse_allowlist_content(new_allowlist_content)
container = sidecar_bundle_container_name(slug)
current_yaml = fetch_current_yaml(slug)
try:
cfg = parse_yaml_subset(current_yaml)
except YamlSubsetError as e:
raise PipelockApplyError(f"running pipelock yaml: {e}") from e
current_hosts = cfg.get("api_allowlist", [])
if not isinstance(current_hosts, list):
raise PipelockApplyError(
"running pipelock yaml: api_allowlist is not a list"
)
before = render_allowlist_content([str(h) for h in current_hosts])
after = render_allowlist_content(new_hosts)
cfg["api_allowlist"] = new_hosts
rendered = pipelock_render_yaml(cfg)
# pipelock.yaml is bind-mounted into the container as a SINGLE
# FILE — same Docker single-file inode issue as egress_apply:
# write-temp-then-rename swaps the host inode and leaves the
# container's mount pointing at the orphaned old one. Write
# in-place. The SIGUSR1 below makes the new content live
# (pipelock has no in-process reload, so the supervisor
# restarts the pipelock daemon in response).
target = _pipelock_yaml_host_path(slug)
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(rendered)
# pipelock runs as root in its distroless image — any mode is
# fine — but 0o600 matches what prepare wrote.
target.chmod(0o600)
restart = subprocess.run(
["docker", "kill", "--signal", "USR1", container],
capture_output=True, text=True, check=False,
)
if restart.returncode != 0:
raise PipelockApplyError(
f"failed to signal {container} for pipelock restart: "
f"{(restart.stderr or '').strip()}"
)
return before, after
__all__ = [
"PIPELOCK_YAML_IN_CONTAINER",
"PipelockApplyError",
"apply_allowlist_change",
"fetch_current_allowlist",
"fetch_current_yaml",
"parse_allowlist_content",
"render_allowlist_content",
]
+278
View File
@@ -0,0 +1,278 @@
"""Prepare step for the Docker bottle backend.
`resolve_plan` does all host-side resolution (image and container
names, env-file, prompt-file, proxy plan, runtime detection) and
returns a frozen DockerBottlePlan. No Docker resources are created;
the only side effects are scratch files under `stage_dir` and a probe
of `docker info`. Cross-backend host-side validation has already run
via the base class's `prepare` template before this is called.
"""
from __future__ import annotations
import os
from datetime import datetime, timezone
from dataclasses import replace
from pathlib import Path
from ...agent_provider import agent_provision_plan, runtime_for
from ...egress import Egress
from ...env import ResolvedEnv, resolve_env
from ...git_gate import GitGate
from ...log import die
from ...pipelock import PipelockProxy
from ...supervise import Supervise
from ...workspace import workspace_plan as resolve_workspace_plan
from .. import BottleSpec
from . import util as docker_mod
from .bottle_plan import DockerBottlePlan
from .bottle_state import (
BottleMetadata,
agent_state_dir,
bottle_identity,
clear_preserve_marker,
egress_state_dir,
git_gate_state_dir,
per_bottle_dockerfile,
per_bottle_dockerfile_path,
per_bottle_image_tag,
pipelock_state_dir,
supervise_state_dir,
write_metadata,
)
from .sidecar_bundle import sidecar_bundle_container_name
def resolve_plan(
spec: BottleSpec,
*,
stage_dir: Path,
) -> DockerBottlePlan:
"""Resolve Docker-specific names and write scratch files. Trusts
that the agent and its skills/git-gate keys are present —
validation already ran in the base class."""
docker_mod.require_docker()
proxy = PipelockProxy()
git_gate = GitGate()
egress = Egress()
supervise = Supervise()
manifest = spec.manifest
agent = manifest.agents[spec.agent_name]
bottle = manifest.bottle_for(spec.agent_name)
provider = bottle.agent_provider
provider_runtime = runtime_for(provider.template)
guest_home = "/home/node"
workspace_plan = resolve_workspace_plan(spec, guest_home=guest_home)
# PRD 0016 follow-up: identity, not bare slug. A fresh `start`
# mints a random-suffixed identity (so parallel runs of the same
# agent in the same cwd don't collide on container/network
# names); a `resume` passes the recorded identity in via
# spec.identity to continue an existing bottle's state.
slug = spec.identity or bottle_identity(spec.agent_name)
# Record the launch metadata so `cli.py resume <identity>` can
# reconstruct the spec. Idempotent — re-writes on resume with a
# refreshed started_at.
write_metadata(BottleMetadata(
identity=slug,
agent_name=spec.agent_name,
cwd=spec.user_cwd if spec.copy_cwd else "",
copy_cwd=spec.copy_cwd,
started_at=datetime.now(timezone.utc).isoformat(),
compose_project=f"bot-bottle-{slug}",
backend="docker",
))
# Clear any leftover preserve marker from a prior capability-block
# so this fresh launch can be cleaned up at session-end unless
# the agent triggers another capability-block.
clear_preserve_marker(slug)
# PRD 0016 capability-block: if a per-bottle Dockerfile has been
# written (via apply_capability_change), the base image becomes
# per_bottle_image_tag(slug) built from that file. --cwd still
# layers a derived image on top.
dockerfile_path = ""
if per_bottle_dockerfile(slug) is not None:
image_default = per_bottle_image_tag(slug)
dockerfile_path = str(per_bottle_dockerfile_path(slug))
elif provider.dockerfile:
image_default = f"bot-bottle-{provider.template}:{slug}"
dockerfile_path = _resolve_manifest_dockerfile(provider.dockerfile, spec)
elif provider_runtime.dockerfile:
image_default = provider_runtime.image
dockerfile_path = provider_runtime.dockerfile
else:
image_default = provider_runtime.image
image = os.environ.get("BOT_BOTTLE_IMAGE", image_default)
derived_image = ""
runtime_image = image
if spec.copy_cwd:
derived_image = os.environ.get(
"BOT_BOTTLE_DERIVED_IMAGE", f"bot-bottle-cwd:{slug}"
)
runtime_image = derived_image
default_container = f"bot-bottle-{slug}"
pinned_container = os.environ.get("BOT_BOTTLE_CONTAINER", "")
container_name_pinned = bool(pinned_container)
if container_name_pinned:
container_name = pinned_container
if docker_mod.container_exists(container_name):
die(
f"container '{container_name}' already exists "
f"(pinned via BOT_BOTTLE_CONTAINER). "
f"Remove it with 'docker rm -f {container_name}' or unset the override."
)
else:
container_name = ""
for candidate in docker_mod.container_name_candidates(default_container):
if not docker_mod.container_exists(candidate):
container_name = candidate
break
if not container_name:
die(
f"could not find a free container name after "
f"{default_container}-{docker_mod.MAX_CONTAINER_SUFFIX}; "
f"clean up old containers with 'docker rm -f <name>'"
)
# Probe the sidecar-bundle container name for an orphan from a
# previous run. Otherwise a stale bundle surfaces as a
# docker-create conflict deep inside launch() with no actionable
# hint; failing fast here points at the cleanup command.
bundle_name = sidecar_bundle_container_name(slug)
if docker_mod.container_exists(bundle_name):
die(
f"sidecar bundle container '{bundle_name}' already exists. "
f"This is an orphan from a previous run; clean it up with "
f"'./cli.py cleanup' (or 'docker rm -f {bundle_name}') and "
f"retry."
)
# PRD 0018 chunk 2: prepare-time scratch files live under
# ~/.bot-bottle/state/<slug>/<service>/ so chunk 3's compose
# bind-mounts can point at stable paths. The state subdirs are
# cleaned up by start.py's session-end teardown unless something
# explicitly preserves the state dir (capability-block, crash).
agent_dir = agent_state_dir(slug)
agent_dir.mkdir(parents=True, exist_ok=True)
env_file = agent_dir / "agent.env"
prompt_file = agent_dir / "prompt.txt"
prompt_file.write_text("")
prompt_file.chmod(0o600)
git_gate_dir = git_gate_state_dir(slug)
git_gate_dir.mkdir(parents=True, exist_ok=True)
git_gate_plan = git_gate.prepare(bottle, slug, git_gate_dir)
resolved = resolve_env(manifest, spec.agent_name)
# Everything that should reach the bottle by-name (so its value
# never lands on argv or in env_file) goes into one dict. Nothing
# mutates the host os.environ.
forwarded_env: dict[str, str] = dict(resolved.forwarded)
_write_env_file(resolved, env_file)
prompt_file.write_text(agent.prompt)
use_runsc = docker_mod.runsc_available()
agent_provision = agent_provision_plan(
template=provider.template,
dockerfile=dockerfile_path,
state_dir=agent_dir,
guest_home=guest_home,
forward_host_credentials=provider.forward_host_credentials,
auth_token=provider.auth_token,
host_env=dict(os.environ),
trusted_project_path=workspace_plan.workdir,
)
guest_env = dict(agent_provision.guest_env)
for key, val in agent_provision.env_vars.items():
guest_env.setdefault(key, val)
agent_provision = replace(agent_provision, guest_env=guest_env)
pipelock_dir = pipelock_state_dir(slug)
pipelock_dir.mkdir(parents=True, exist_ok=True)
proxy_plan = proxy.prepare(
bottle, slug, pipelock_dir, agent_provision.egress_routes,
)
egress_dir = egress_state_dir(slug)
egress_dir.mkdir(parents=True, exist_ok=True)
egress_plan = egress.prepare(
bottle, slug, egress_dir, agent_provision.egress_routes,
)
supervise_plan = None
if bottle.supervise:
# Current Dockerfile for the agent image. Read from the repo
# root; for `--cwd` derived images the base Dockerfile is what
# the agent should propose changes against (the derived layer
# is just a workspace copy).
# (routes.yaml + pipelock allowlist used to land here too but
# PRD 0017 chunk 3 moved them behind the
# `list-egress-routes` MCP tool so the agent gets live
# state rather than a launch-time snapshot.)
supervise_dockerfile_path = (
Path(dockerfile_path)
if dockerfile_path
else Path(__file__).resolve().parent.parent.parent.parent / "Dockerfile.claude"
)
dockerfile_content = (
supervise_dockerfile_path.read_text(encoding="utf-8")
if supervise_dockerfile_path.is_file()
else ""
)
supervise_dir = supervise_state_dir(slug)
supervise_dir.mkdir(parents=True, exist_ok=True)
supervise_plan = supervise.prepare(
slug, supervise_dir,
dockerfile_content=dockerfile_content,
)
return DockerBottlePlan(
spec=spec,
stage_dir=stage_dir,
guest_home=guest_home,
slug=slug,
container_name=container_name,
container_name_pinned=container_name_pinned,
image=image,
derived_image=derived_image,
runtime_image=runtime_image,
dockerfile_path=dockerfile_path,
env_file=env_file,
forwarded_env=forwarded_env,
prompt_file=prompt_file,
proxy_plan=proxy_plan,
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
use_runsc=use_runsc,
agent_provision=agent_provision,
workspace_plan=workspace_plan,
)
def _write_env_file(resolved: ResolvedEnv, env_file: Path) -> None:
"""Serialize the literal portion of a ResolvedEnv into docker's
`--env-file` syntax (NAME=VALUE per line, mode 600 since the file
may carry verbatim values from the manifest). Forwarded names ride
on the plan as a structured tuple instead."""
env_lines: list[str] = []
for name, value in resolved.literals.items():
if "\n" in value:
die(
f"env entry {name} (literal) contains a newline; "
f"docker --env-file cannot represent multi-line values."
)
env_lines.append(f"{name}={value}")
env_file.write_text("\n".join(env_lines) + ("\n" if env_lines else ""))
env_file.chmod(0o600)
def _resolve_manifest_dockerfile(path_value: str, spec: BottleSpec) -> str:
path = Path(os.path.expanduser(path_value))
if not path.is_absolute():
path = Path(spec.user_cwd) / path
return str(path)
@@ -2,11 +2,10 @@
Per PRD 0050 the per-provider provisioning steps (prompt, skills,
declarative provision-plan apply, supervise MCP registration) live on
the `AgentProvider` plugin under `bot_bottle/contrib/`. CA and git
provisioning also moved to the AgentProvider ABC (with Debian/node
defaults); user plugins override them for non-standard images.
the `AgentProvider` plugin under `bot_bottle/contrib/`. The modules
left in this subpackage handle only the steps that are
backend-specific:
No modules remain in this subpackage — the directory is kept so that
existing imports of `from .provision import ...` don't need updating
if new backend-specific provisioners are added later.
- ca.py — install per-bottle CA bundle into the guest trust store
- git.py — copy host cwd `.git` into the guest when --cwd is used
"""
+51
View File
@@ -0,0 +1,51 @@
"""Install the per-bottle MITM CA into the agent container's trust
store.
Post-PRD-0017 the CA depends on the agent's HTTP_PROXY target:
- Bottle declares `egress.routes[]` → agent's HTTP_PROXY
points at egress; the cert the agent must trust is the
one egress mints leaf certs with (the egress CA).
- No egress routes → agent's HTTP_PROXY points straight at
pipelock; the cert the agent must trust is pipelock's CA (the
pre-cutover behavior).
By the time this provisioner runs, the corresponding `tls_init`
helper has generated the chosen CA under `plan.stage_dir`, and the
sidecar (pipelock or egress) is up referencing the
in-container CA paths.
Cert lands on Debian's standard source path
(`/usr/local/share/ca-certificates/`); `update-ca-certificates`
rebuilds `/etc/ssl/certs/ca-certificates.crt`, which is what curl,
Python `ssl`, and OpenSSL-based tools all read by default. The env
trio set on the agent's `docker run` covers Node
(`NODE_EXTRA_CA_CERTS`) and Python `requests` /
`SSL_CERT_FILE`-honoring libraries that don't load the system
bundle.
The fingerprint is computed via stdlib (`ssl.PEM_cert_to_DER_cert`
+ `hashlib.sha256`) and logged once to stderr. The private key
stays on the host (under `stage_dir`) until teardown wipes the
stage dir; nothing in the agent ever sees it."""
from __future__ import annotations
from ... import Bottle
from ...util import AGENT_CA_PATH, log_ca_fingerprint, select_ca_cert
from ..bottle_plan import DockerBottlePlan
def provision_ca(plan: DockerBottlePlan, bottle: Bottle) -> None:
"""Copy the agent-facing CA cert into the agent, rebuild the
trust bundle, emit a one-line fingerprint log. Called from
`BottleBackend.provision` after the agent container is up."""
cert_host_path, label = select_ca_cert(plan.egress_plan, plan.proxy_plan)
bottle.cp_in(str(cert_host_path), AGENT_CA_PATH)
bottle.exec(
f"chmod 644 {AGENT_CA_PATH} && update-ca-certificates",
user="root",
)
log_ca_fingerprint(cert_host_path, label)
+106
View File
@@ -0,0 +1,106 @@
"""Git provisioning inside a running Docker bottle.
Three concerns, all about git in the agent:
1. If --cwd was passed AND the host cwd has a .git, copy that .git
into the planned guest workspace so the agent operates on the
user's repo.
2. If the bottle declares `git` entries (PRD 0008), write a
~/.gitconfig with insteadOf rules so every git operation
against a declared upstream (push, fetch, clone, pull,
ls-remote) transparently hits the per-agent git-gate. The
gate mirrors the upstream in both directions, so URL
rewriting is symmetric.
3. If the bottle declares `git.user` (issue #86), set
`git config --global user.{name,email}` inside the bottle so
the agent's commits are attributed to that identity.
"""
from __future__ import annotations
import shlex
from ....git_gate import GIT_GATE_HOSTNAME, git_gate_render_gitconfig
from ....log import info
from ... import Bottle
from ..bottle_plan import DockerBottlePlan
def provision_git(plan: DockerBottlePlan, bottle: Bottle) -> None:
"""Set up git inside the bottle. Runs all three subcases; each
no-ops when its condition isn't met."""
_provision_cwd_git(plan, bottle)
_provision_git_gate_config(plan, bottle)
_provision_git_user(plan, bottle)
def _provision_cwd_git(plan: DockerBottlePlan, bottle: Bottle) -> None:
"""If --cwd was set and the host cwd has a .git directory, copy
it into /home/node/workspace/.git and fix ownership. No-op
otherwise."""
workspace = plan.workspace_plan
if not (workspace.enabled and workspace.copy_git and workspace.has_host_git_dir):
return
guest_workspace_git = f"{workspace.guest_path}/.git"
host_git = str(workspace.host_path / ".git")
info(f"copying {host_git} -> {bottle.name}:{guest_workspace_git}")
bottle.cp_in(host_git, guest_workspace_git)
bottle.exec(
f"chown -R {shlex.quote(workspace.owner)} {shlex.quote(guest_workspace_git)}",
user="root",
)
def _provision_git_gate_config(plan: DockerBottlePlan, bottle: Bottle) -> None:
"""Write ~/.gitconfig in the bottle with the git-gate
insteadOf rules. No-op when the bottle has no `git` entries."""
manifest_bottle = plan.spec.manifest.bottle_for(plan.spec.agent_name)
if not manifest_bottle.git:
return
container_gitconfig = f"{plan.guest_home}/.gitconfig"
content = git_gate_render_gitconfig(manifest_bottle.git, GIT_GATE_HOSTNAME)
config_file = plan.stage_dir / "agent_gitconfig"
config_file.write_text(content)
config_file.chmod(0o600)
info(f"writing {container_gitconfig} with {len(manifest_bottle.git)} insteadOf rule(s)")
bottle.cp_in(str(config_file), container_gitconfig)
bottle.exec(
f"chown node:node {shlex.quote(container_gitconfig)} && "
f"chmod 644 {shlex.quote(container_gitconfig)}",
user="root",
)
def _provision_git_user(plan: DockerBottlePlan, bottle: Bottle) -> None:
"""Apply `git config --global user.{name,email}` inside the
bottle so the agent's commits are attributed to the operator-
chosen identity instead of the agent image's default
(which is no user — git would refuse to commit at all
until the agent ran its own `git config`).
Runs as the `node` user so `--global` lands in
`/home/node/.gitconfig` (matching the existing
`_provision_git_gate_config` write location). No-op when the
bottle didn't declare `git.user`.
Each field set independently — name-only or email-only
configs only run the `git config` line for the field
present."""
manifest_bottle = plan.spec.manifest.bottle_for(plan.spec.agent_name)
gu = manifest_bottle.git_user
if gu.is_empty():
return
if gu.name:
info(f"git config --global user.name = {gu.name!r}")
bottle.exec(
f"git config --global user.name {shlex.quote(gu.name)}",
user="node",
)
if gu.email:
info(f"git config --global user.email = {gu.email!r}")
bottle.exec(
f"git config --global user.email {shlex.quote(gu.email)}",
user="node",
)
-59
View File
@@ -1,59 +0,0 @@
"""Prepare step for the Docker bottle backend.
`resolve_plan` does all host-side resolution (image and container
names, prompt-file, proxy plan, runtime detection) and returns a
frozen DockerBottlePlan. No Docker resources are created; the only
side effects are scratch files under `stage_dir` and a probe of
`docker info`. Cross-backend host-side validation has already run
via the base class's `prepare` template before this is called.
"""
from __future__ import annotations
from pathlib import Path
from . import util as docker_mod
from .bottle_plan import DockerBottlePlan
from .. import BottleSpec
from ...env import ResolvedEnv
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...supervise import SupervisePlan
from ...git_gate import GitGatePlan
def preflight() -> None:
docker_mod.require_docker()
def build_guest_env(resolved_env: ResolvedEnv) -> dict[str, str]:
return dict(resolved_env.literals)
def resolve_plan(
spec: BottleSpec,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
supervise_plan: SupervisePlan | None,
git_gate_plan: GitGatePlan,
stage_dir: Path,
) -> DockerBottlePlan:
"""Resolve Docker-specific names and write scratch files. Trusts
that the agent and its skills/git-gate keys are present —
validation already ran in the base class."""
# ==== docker specific setup ====
use_runsc = docker_mod.runsc_available()
return DockerBottlePlan(
spec=spec,
stage_dir=stage_dir,
slug=slug,
forwarded_env=dict(resolved_env.forwarded),
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
use_runsc=use_runsc,
agent_provision=agent_provision_plan,
)
+8 -8
View File
@@ -2,20 +2,20 @@
(PRD 0024).
The bundle image (built by Dockerfile.sidecars, PRD 0024 chunk 1)
runs egress + git-gate + supervise as one container per bottle
under a small Python init supervisor. As of chunk 5 the bundle
is the only shape — the legacy four-sidecar topology and its
`BOT_BOTTLE_SIDECAR_BUNDLE` feature flag are gone."""
runs pipelock + egress + git-gate + supervise as one container
per bottle under a small Python init supervisor. As of chunk 5
the bundle is the only shape — the legacy four-sidecar topology
and its `BOT_BOTTLE_SIDECAR_BUNDLE` feature flag are gone."""
from __future__ import annotations
import os
# Bundle image. Defaults to a built-locally tag. Source checkouts
# build from the repo-root Dockerfile.sidecars; installed packages
# build from the packaged copy under bot_bottle/.
# Operators pinning to a published digest can override via env.
# Bundle image. Defaults to a built-locally tag (built from the
# repo's Dockerfile.sidecars via compose `build:`). Operators
# pinning to a published digest can override via env, matching
# the existing `BOT_BOTTLE_PIPELOCK_IMAGE` shape.
SIDECAR_BUNDLE_IMAGE = os.environ.get(
"BOT_BOTTLE_SIDECAR_IMAGE",
"bot-bottle-sidecars:latest",
+35 -34
View File
@@ -7,10 +7,11 @@ from __future__ import annotations
import re
import shutil
import subprocess
import tempfile
from typing import Iterable, Iterator
from ...log import die, info
# from ...workspace import WorkspacePlan
from ...workspace import WorkspacePlan
# Cap on the suffix the container-name conflict logic will try before
@@ -117,39 +118,39 @@ def build_image(ref: str, context: str, *, dockerfile: str = "") -> None:
subprocess.run(args, check=True)
# def build_image_with_cwd(
# derived: str,
# base: str,
# workspace: "WorkspacePlan",
# ) -> None:
# """Build a thin derived image that copies the workspace into
# the plan's guest path and sets the plan's workdir."""
# import os
#
# cwd = str(workspace.host_path)
# if not os.path.isdir(cwd):
# die(f"cwd not found at {cwd}")
# info(f"building image {derived} from {base} with {cwd} -> {workspace.guest_path}")
# with tempfile.TemporaryDirectory(prefix="bot-bottle-cwd.") as tmp:
# context_dir = os.path.join(tmp, "context")
# staged_workspace = os.path.join(context_dir, "workspace")
# shutil.copytree(
# cwd,
# staged_workspace,
# symlinks=True,
# ignore=shutil.ignore_patterns(".git"),
# )
# dockerfile = (
# f"FROM {base}\n"
# f"COPY --chown=node:node workspace/. {workspace.guest_path}\n"
# f"WORKDIR {workspace.workdir}\n"
# )
# subprocess.run(
# ["docker", "build", "-t", derived, "-f", "-", context_dir],
# input=dockerfile,
# text=True,
# check=True,
# )
def build_image_with_cwd(
derived: str,
base: str,
workspace: WorkspacePlan,
) -> None:
"""Build a thin derived image that copies the workspace into
the plan's guest path and sets the plan's workdir."""
import os
cwd = str(workspace.host_path)
if not os.path.isdir(cwd):
die(f"cwd not found at {cwd}")
info(f"building image {derived} from {base} with {cwd} -> {workspace.guest_path}")
with tempfile.TemporaryDirectory(prefix="bot-bottle-cwd.") as tmp:
context_dir = os.path.join(tmp, "context")
staged_workspace = os.path.join(context_dir, "workspace")
shutil.copytree(
cwd,
staged_workspace,
symlinks=True,
ignore=shutil.ignore_patterns(".git"),
)
dockerfile = (
f"FROM {base}\n"
f"COPY --chown=node:node workspace/. {workspace.guest_path}\n"
f"WORKDIR {workspace.workdir}\n"
)
subprocess.run(
["docker", "build", "-t", derived, "-f", "-", context_dir],
input=dockerfile,
text=True,
check=True,
)
def image_id(ref: str) -> str:
@@ -1,10 +0,0 @@
"""macOS Apple Container backend.
Selectable via `BOT_BOTTLE_BACKEND=macos-container`. This package owns
the Apple `container` CLI integration; launch remains gated until the
sidecar network enforcement shape is implemented.
"""
from .backend import MacosContainerBottleBackend
__all__ = ["MacosContainerBottleBackend"]
@@ -1,84 +0,0 @@
"""MacosContainerBottleBackend — Apple Container implementation."""
from __future__ import annotations
from contextlib import contextmanager
from pathlib import Path
from typing import Generator, Sequence
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...env import ResolvedEnv
from ...git_gate import GitGatePlan
from ...supervise import SupervisePlan
from .. import ActiveAgent, BottleBackend, BottleSpec
from . import cleanup as _cleanup
from . import enumerate as _enumerate
from . import launch as _launch
from . import resolve_plan as _resolve_plan
from . import util as _container
from .bottle import MacosContainerBottle
from .bottle_cleanup_plan import MacosContainerBottleCleanupPlan
from .bottle_plan import MacosContainerBottlePlan
class MacosContainerBottleBackend(
BottleBackend["MacosContainerBottlePlan", "MacosContainerBottleCleanupPlan"]
):
"""Apple Container backend. Selected by
`BOT_BOTTLE_BACKEND=macos-container` or
`--backend=macos-container`."""
name = "macos-container"
@classmethod
def is_available(cls) -> bool:
return _container.is_available()
def _preflight(self) -> None:
_resolve_plan.preflight()
def _build_guest_env(self, resolved_env: ResolvedEnv) -> dict[str, str]:
return _resolve_plan.build_guest_env(resolved_env)
def _resolve_plan(
self,
spec: BottleSpec,
*,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
git_gate_plan: GitGatePlan,
supervise_plan: SupervisePlan | None,
stage_dir: Path,
) -> MacosContainerBottlePlan:
return _resolve_plan.resolve_plan(
spec,
slug=slug,
resolved_env=resolved_env,
agent_provision_plan=agent_provision_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
git_gate_plan=git_gate_plan,
stage_dir=stage_dir,
)
@contextmanager
def launch(
self, plan: MacosContainerBottlePlan
) -> Generator[MacosContainerBottle, None, None]:
with _launch.launch(plan, provision=self.provision) as bottle:
yield bottle
def prepare_cleanup(self) -> MacosContainerBottleCleanupPlan:
return _cleanup.prepare_cleanup()
def cleanup(self, plan: MacosContainerBottleCleanupPlan) -> None:
_cleanup.cleanup(plan)
def enumerate_active(self) -> Sequence[ActiveAgent]:
return _enumerate.enumerate_active()
def supervise_mcp_url(self, plan: MacosContainerBottlePlan) -> str:
return plan.agent_supervise_url
@@ -1,91 +0,0 @@
"""Bottle handle for Apple's `container` CLI."""
from __future__ import annotations
import subprocess
from typing import Callable, cast
from ...agent_provider import PromptMode, prompt_args
from .. import Bottle, ExecResult
from ..terminal import exec_shell_script
class MacosContainerBottle(Bottle):
def __init__(
self,
container: str,
teardown: Callable[[], None],
prompt_path_in_container: str | None,
*,
agent_command: str = "claude",
agent_prompt_mode: PromptMode = "append_file",
agent_provider_template: str = "claude",
terminal_title: str = "",
terminal_color: str = "",
agent_workdir: str = "/home/node",
):
self.name = container
self._teardown = teardown
self.prompt_path = prompt_path_in_container
self._agent_prompt_mode = agent_prompt_mode
self.agent_command = agent_command
self.terminal_title = terminal_title
self.terminal_color = terminal_color
self.agent_provider_template = agent_provider_template
self.agent_workdir = agent_workdir
self._closed = False
def agent_argv(self, argv: list[str], *, tty: bool = True) -> list[str]:
full_argv = list(argv)
full_argv.extend(
prompt_args(
cast(PromptMode, self._agent_prompt_mode),
self.prompt_path,
argv=full_argv,
)
)
cmd = ["container", "exec"]
if tty:
cmd.extend(["--interactive", "--tty"])
if self.agent_workdir and self.agent_workdir != "/home/node":
cmd.extend(["--workdir", self.agent_workdir])
cmd.extend([self.name, self.agent_command, *full_argv])
return cmd
def exec_agent(self, argv: list[str], *, tty: bool = True) -> int:
agent_argv = self.agent_argv(argv, tty=tty)
script = (
exec_shell_script(agent_argv, self.terminal_title, self.terminal_color)
if tty else None
)
if script is None:
return subprocess.run(agent_argv, check=False).returncode
return subprocess.run(["sh", "-lc", script], check=False).returncode
def exec(self, script: str, *, user: str = "node") -> ExecResult:
result = subprocess.run(
["container", "exec", "--user", user, "--interactive",
self.name, "sh", "-s"],
input=script,
capture_output=True,
text=True,
check=False,
)
return ExecResult(
returncode=result.returncode,
stdout=result.stdout,
stderr=result.stderr,
)
def cp_in(self, host_path: str, container_path: str) -> None:
subprocess.run(
["container", "cp", host_path, f"{self.name}:{container_path}"],
stdout=subprocess.DEVNULL,
check=True,
)
def close(self) -> None:
if self._closed:
return
self._closed = True
self._teardown()
@@ -1,27 +0,0 @@
"""Cleanup plan for the macOS Apple Container backend."""
from __future__ import annotations
from dataclasses import dataclass
from ...log import info
from .. import BottleCleanupPlan
@dataclass(frozen=True)
class MacosContainerBottleCleanupPlan(BottleCleanupPlan):
containers: tuple[str, ...] = ()
networks: tuple[str, ...] = ()
def print(self) -> None:
if not self.containers and not self.networks:
info("macos-container cleanup: nothing to remove")
return
for name in self.containers:
info(f"macos-container container: {name}")
for name in self.networks:
info(f"macos-container network: {name}")
@property
def empty(self) -> bool:
return not self.containers and not self.networks
@@ -1,58 +0,0 @@
"""Plan type for the macOS Apple Container backend."""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
from ...agent_provider import PromptMode
from .. import BottlePlan
@dataclass(frozen=True)
class MacosContainerBottlePlan(BottlePlan):
slug: str
forwarded_env: dict[str, str] = field(repr=False)
agent_proxy_url: str = ""
agent_git_gate_url: str = ""
agent_supervise_url: str = ""
@property
def container_name(self) -> str:
return self.agent_provision.instance_name
@property
def image(self) -> str:
return self.agent_provision.image
@property
def dockerfile_path(self) -> str:
return self.agent_provision.dockerfile
@property
def prompt_file(self) -> Path:
return self.agent_provision.prompt_file
@property
def agent_command(self) -> str:
return self.agent_provision.command
@property
def agent_prompt_mode(self) -> PromptMode:
return self.agent_provision.prompt_mode
@property
def agent_provider_template(self) -> str:
return self.agent_provision.template
@property
def git_gate_insteadof_host(self) -> str:
if self.agent_git_gate_url.startswith("http://"):
return self.agent_git_gate_url.removeprefix("http://").rstrip("/")
return super().git_gate_insteadof_host
@property
def git_gate_insteadof_scheme(self) -> str:
if self.agent_git_gate_url.startswith("http://"):
return "http"
return super().git_gate_insteadof_scheme
@@ -1,70 +0,0 @@
"""Cleanup for the macOS Apple Container backend."""
from __future__ import annotations
import subprocess
from ...log import info, warn
from . import util as container_mod
from .bottle_cleanup_plan import MacosContainerBottleCleanupPlan
_PREFIX = "bot-bottle-"
_BUNDLE_PREFIX = "bot-bottle-sidecars-"
def _list_prefixed_containers() -> list[str]:
result = subprocess.run(
["container", "list", "--all", "--quiet"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
warn(f"container list failed: {result.stderr.strip()}")
return []
return sorted(
name for name in (line.strip() for line in result.stdout.splitlines())
if name.startswith(_PREFIX) or name.startswith(_BUNDLE_PREFIX)
)
def _list_prefixed_networks() -> list[str]:
result = subprocess.run(
["container", "network", "list", "--quiet"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
return []
return sorted(
name for name in (line.strip() for line in result.stdout.splitlines())
if name.startswith(_PREFIX)
)
def prepare_cleanup() -> MacosContainerBottleCleanupPlan:
container_mod.require_container()
return MacosContainerBottleCleanupPlan(
containers=tuple(_list_prefixed_containers()),
networks=tuple(_list_prefixed_networks()),
)
def cleanup(plan: MacosContainerBottleCleanupPlan) -> None:
for name in plan.containers:
info(f"container delete --force {name}")
subprocess.run(
["container", "delete", "--force", name],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
for name in plan.networks:
info(f"container network delete {name}")
subprocess.run(
["container", "network", "delete", name],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
@@ -1,40 +0,0 @@
"""Active-agent enumeration for the macOS Apple Container backend."""
from __future__ import annotations
import subprocess
from ...bottle_state import read_metadata
from .. import ActiveAgent
_PREFIX = "bot-bottle-"
_SIDECAR_PREFIX = "bot-bottle-sidecars-"
def enumerate_active() -> list[ActiveAgent]:
result = subprocess.run(
["container", "list", "--quiet"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
return []
out: list[ActiveAgent] = []
for name in sorted(line.strip() for line in result.stdout.splitlines()):
if not name.startswith(_PREFIX):
continue
if name.startswith(_SIDECAR_PREFIX):
continue
slug = name[len(_PREFIX):]
metadata = read_metadata(slug)
out.append(ActiveAgent(
backend_name="macos-container",
slug=slug,
agent_name=metadata.agent_name if metadata else "?",
started_at=metadata.started_at if metadata else "",
services=(),
label=metadata.label if metadata else "",
color=metadata.color if metadata else "",
))
return out
@@ -1,426 +0,0 @@
"""Launch flow for the macOS Apple Container backend.
This backend keeps the explicit proxy-env enforcement model for v1:
the agent container is attached only to a host-only Apple Container
network, while the sidecar bundle is attached to a NAT network first
and the host-only network second. The sidecar's host-only IP is
discovered from `container inspect` and stamped into the agent's
HTTP_PROXY / HTTPS_PROXY env vars.
"""
from __future__ import annotations
import dataclasses
import os
import shutil
import subprocess
from contextlib import ExitStack, contextmanager
from pathlib import Path
from typing import Callable, Generator
from ...bottle_state import egress_state_dir, git_gate_state_dir
from ...egress import EGRESS_ROUTES_IN_CONTAINER, egress_resolve_token_values
from ...git_gate import revoke_git_gate_provisioned_keys
from ...log import die, info, warn
from ...supervise import QUEUE_DIR_IN_CONTAINER, SUPERVISE_PORT
from ...util import expand_tilde
from ..docker.egress import EGRESS_CA_IN_CONTAINER, EGRESS_PORT
from ..docker.git_gate import (
GIT_GATE_ACCESS_HOOK_IN_CONTAINER,
GIT_GATE_CREDS_DIR_IN_CONTAINER,
GIT_GATE_ENTRYPOINT_IN_CONTAINER,
GIT_GATE_HOOK_IN_CONTAINER,
)
from ..docker.sidecar_bundle import (
SIDECAR_BUNDLE_DOCKERFILE,
SIDECAR_BUNDLE_IMAGE,
)
from ..docker.egress import egress_tls_init
from ..util import AGENT_CA_BUNDLE, AGENT_CA_PATH
from . import util as container_mod
from .bottle import MacosContainerBottle
from .bottle_plan import MacosContainerBottlePlan
_REPO_DIR = str(Path(__file__).resolve().parent.parent.parent.parent)
_AGENT_SLEEP_SECONDS = "2147483647"
_GIT_HTTP_PORT = 9420
_GIT_GATE_READY_FILE = "/run/git-gate/ready"
def internal_network_name(slug: str) -> str:
return f"bot-bottle-net-{slug}"
def egress_network_name(slug: str) -> str:
return f"bot-bottle-egress-{slug}"
def sidecar_container_name(slug: str) -> str:
return f"bot-bottle-sidecars-{slug}"
@contextmanager
def launch(
plan: MacosContainerBottlePlan,
*,
provision: Callable[[MacosContainerBottlePlan, "MacosContainerBottle"], str | None],
) -> Generator[MacosContainerBottle, None, None]:
"""Build, run, provision, and yield an Apple Container bottle."""
stack = ExitStack()
bottle_for_revoke = plan.spec.manifest.bottle_for(plan.spec.agent_name)
git_gate_dir_for_revoke = git_gate_state_dir(plan.slug)
def teardown() -> None:
teardown_exc: BaseException | None = None
try:
stack.close()
except BaseException as exc: # noqa: W0718 - teardown must continue
teardown_exc = exc
warn(f"macos-container teardown failed: {exc!r}")
revoke_git_gate_provisioned_keys(bottle_for_revoke, git_gate_dir_for_revoke)
if teardown_exc is not None:
raise teardown_exc
try:
plan = _mint_certs(plan)
_build_images(plan)
internal_network = internal_network_name(plan.slug)
egress_network = egress_network_name(plan.slug)
_create_networks(internal_network, egress_network, stack)
sidecar_name = sidecar_container_name(plan.slug)
container_mod.force_remove_container(sidecar_name)
_start_sidecar_bundle(plan, sidecar_name, internal_network, egress_network)
stack.callback(container_mod.force_remove_container, sidecar_name)
_stage_git_gate(plan, sidecar_name)
sidecar_ip = container_mod.container_ipv4_on_network(
sidecar_name, internal_network,
)
plan = _stamp_agent_urls(plan, sidecar_ip)
container_mod.force_remove_container(plan.container_name)
_start_agent(plan, internal_network, sidecar_ip)
stack.callback(container_mod.force_remove_container, plan.container_name)
bottle = MacosContainerBottle(
plan.container_name,
teardown,
None,
agent_command=plan.agent_command,
agent_prompt_mode=plan.agent_prompt_mode,
agent_provider_template=plan.agent_provider_template,
terminal_title=plan.spec.label or plan.spec.agent_name,
terminal_color=plan.spec.color,
agent_workdir=plan.workspace_plan.workdir,
)
bottle.prompt_path = provision(plan, bottle)
yield bottle
finally:
teardown()
def _mint_certs(plan: MacosContainerBottlePlan) -> MacosContainerBottlePlan:
egress_ca_host, egress_ca_cert_only = egress_tls_init(
egress_state_dir(plan.slug),
)
egress_plan = dataclasses.replace(
plan.egress_plan,
mitmproxy_ca_host_path=egress_ca_host,
mitmproxy_ca_cert_only_host_path=egress_ca_cert_only,
)
return dataclasses.replace(plan, egress_plan=egress_plan)
def _build_images(plan: MacosContainerBottlePlan) -> None:
container_mod.build_image(
SIDECAR_BUNDLE_IMAGE,
_REPO_DIR,
dockerfile=SIDECAR_BUNDLE_DOCKERFILE,
)
container_mod.build_image(
plan.image,
_REPO_DIR,
dockerfile=plan.dockerfile_path,
)
def _create_networks(
internal_network: str,
egress_network: str,
stack: ExitStack,
) -> None:
container_mod.create_network(internal_network, internal=True)
stack.callback(container_mod.remove_network, internal_network)
container_mod.create_network(egress_network)
stack.callback(container_mod.remove_network, egress_network)
def _start_sidecar_bundle(
plan: MacosContainerBottlePlan,
sidecar_name: str,
internal_network: str,
egress_network: str,
) -> None:
argv = _sidecar_run_argv(plan, sidecar_name, internal_network, egress_network)
effective_env = {**dict(os.environ), **plan.agent_provision.provisioned_env}
token_values = egress_resolve_token_values(
plan.egress_plan.token_env_map, effective_env,
)
env = {**os.environ, **token_values}
info(f"container run sidecar bundle {sidecar_name}")
result = subprocess.run(
argv, capture_output=True, text=True, env=env, check=False,
)
if result.returncode != 0:
die(
f"container run for sidecar bundle {sidecar_name} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
def _start_agent(
plan: MacosContainerBottlePlan,
internal_network: str,
sidecar_ip: str,
) -> None:
argv = _agent_run_argv(plan, internal_network, sidecar_ip)
env = {
**os.environ,
**plan.forwarded_env,
}
info(f"container run agent {plan.container_name}")
result = subprocess.run(
argv, capture_output=True, text=True, env=env, check=False,
)
if result.returncode != 0:
die(
f"container run for agent {plan.container_name} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
def _stamp_agent_urls(
plan: MacosContainerBottlePlan,
sidecar_ip: str,
) -> MacosContainerBottlePlan:
proxy_url = f"http://{sidecar_ip}:{EGRESS_PORT}"
supervise_url = ""
if plan.supervise_plan is not None:
supervise_url = f"http://{sidecar_ip}:{SUPERVISE_PORT}/"
git_gate_url = ""
if plan.git_gate_plan.upstreams:
git_gate_url = f"http://{sidecar_ip}:{_GIT_HTTP_PORT}"
return dataclasses.replace(
plan,
agent_proxy_url=proxy_url,
agent_git_gate_url=git_gate_url,
agent_supervise_url=supervise_url,
)
def _stage_git_gate(plan: MacosContainerBottlePlan, sidecar_name: str) -> None:
gp = plan.git_gate_plan
if not gp.upstreams:
return
container_mod.exec_container(
sidecar_name,
[
"mkdir",
"-p",
str(Path(GIT_GATE_HOOK_IN_CONTAINER).parent),
GIT_GATE_CREDS_DIR_IN_CONTAINER,
"/git",
str(Path(_GIT_GATE_READY_FILE).parent),
],
)
for host_path, container_path in _git_gate_files(plan):
container_mod.copy_into_container(
sidecar_name, host_path, container_path,
)
container_mod.exec_container(
sidecar_name,
[
"sh",
"-c",
"chmod 755 "
f"{GIT_GATE_ENTRYPOINT_IN_CONTAINER} "
f"{GIT_GATE_HOOK_IN_CONTAINER} "
f"{GIT_GATE_ACCESS_HOOK_IN_CONTAINER} && "
f"chmod 600 {GIT_GATE_CREDS_DIR_IN_CONTAINER}/* && "
f"touch {_GIT_GATE_READY_FILE}",
],
)
def _git_gate_files(
plan: MacosContainerBottlePlan,
) -> tuple[tuple[str, str], ...]:
gp = plan.git_gate_plan
files: list[tuple[str, str]] = [
(str(gp.entrypoint_script), GIT_GATE_ENTRYPOINT_IN_CONTAINER),
(str(gp.hook_script), GIT_GATE_HOOK_IN_CONTAINER),
(str(gp.access_hook_script), GIT_GATE_ACCESS_HOOK_IN_CONTAINER),
]
for upstream in gp.upstreams:
files.append((
expand_tilde(upstream.identity_file),
f"{GIT_GATE_CREDS_DIR_IN_CONTAINER}/{upstream.name}-key",
))
if upstream.known_hosts_file:
files.append((
str(upstream.known_hosts_file),
f"{GIT_GATE_CREDS_DIR_IN_CONTAINER}/{upstream.name}-known_hosts",
))
return tuple(files)
def _sidecar_run_argv(
plan: MacosContainerBottlePlan,
sidecar_name: str,
internal_network: str,
egress_network: str,
) -> list[str]:
argv = [
"container", "run",
"--name", sidecar_name,
"--detach",
"--rm",
"--network", egress_network,
"--network", internal_network,
"--dns", _sidecar_dns(),
"--env", f"BOT_BOTTLE_SIDECAR_DAEMONS={','.join(_sidecar_daemons(plan))}",
]
for entry in _sidecar_env_entries(plan):
argv += ["--env", entry]
for host_path, container_path, read_only in _sidecar_mounts(plan):
argv += ["--mount", _mount_spec(host_path, container_path, read_only)]
argv.append(SIDECAR_BUNDLE_IMAGE)
return argv
def _agent_run_argv(
plan: MacosContainerBottlePlan,
internal_network: str,
sidecar_ip: str,
) -> list[str]:
argv = [
"container", "run",
"--name", plan.container_name,
"--detach",
"--rm",
"--network", internal_network,
]
for entry in _agent_env_entries(plan, sidecar_ip):
argv += ["--env", entry]
argv += [plan.image, "sleep", _AGENT_SLEEP_SECONDS]
return argv
def _sidecar_dns() -> str:
return container_mod.dns_server()
def _sidecar_daemons(plan: MacosContainerBottlePlan) -> tuple[str, ...]:
daemons = ["egress"]
if plan.git_gate_plan.upstreams:
daemons += ["git-gate", "git-http"]
if plan.supervise_plan is not None:
daemons.append("supervise")
return tuple(daemons)
def _sidecar_env_entries(plan: MacosContainerBottlePlan) -> tuple[str, ...]:
env: list[str] = []
if plan.egress_plan.routes:
env.extend(sorted(plan.egress_plan.token_env_map.keys()))
if plan.git_gate_plan.upstreams:
env.append(f"BOT_BOTTLE_GIT_GATE_READY_FILE={_GIT_GATE_READY_FILE}")
if plan.supervise_plan is not None:
env += [
f"SUPERVISE_BOTTLE_SLUG={plan.slug}",
f"SUPERVISE_QUEUE_DIR={QUEUE_DIR_IN_CONTAINER}",
f"SUPERVISE_PORT={SUPERVISE_PORT}",
]
return tuple(env)
def _sidecar_mounts(
plan: MacosContainerBottlePlan,
) -> tuple[tuple[str, str, bool], ...]:
mounts: list[tuple[str, str, bool]] = []
ep = plan.egress_plan
mounts.append((
str(ep.mitmproxy_ca_host_path.parent),
str(Path(EGRESS_CA_IN_CONTAINER).parent),
False,
))
if ep.routes:
mounts.append((
str(_stage_routes_dir(plan)),
str(Path(EGRESS_ROUTES_IN_CONTAINER).parent),
True,
))
sp = plan.supervise_plan
if sp is not None:
mounts.append((str(sp.queue_dir), QUEUE_DIR_IN_CONTAINER, False))
return tuple(mounts)
def _stage_routes_dir(plan: MacosContainerBottlePlan) -> Path:
routes_dir = plan.stage_dir / "macos-container-egress"
routes_dir.mkdir(parents=True, exist_ok=True)
shutil.copyfile(
plan.egress_plan.routes_path,
routes_dir / Path(EGRESS_ROUTES_IN_CONTAINER).name,
)
return routes_dir
def _mount_spec(host_path: str, container_path: str, read_only: bool) -> str:
spec = f"type=bind,source={host_path},target={container_path}"
if read_only:
spec += ",readonly"
return spec
def _agent_env_entries(
plan: MacosContainerBottlePlan,
sidecar_ip: str,
) -> tuple[str, ...]:
proxy_url = f"http://{sidecar_ip}:{EGRESS_PORT}"
no_proxy = _agent_no_proxy(plan, sidecar_ip)
env = [
f"HTTPS_PROXY={proxy_url}",
f"HTTP_PROXY={proxy_url}",
f"https_proxy={proxy_url}",
f"http_proxy={proxy_url}",
f"NO_PROXY={no_proxy}",
f"no_proxy={no_proxy}",
f"NODE_EXTRA_CA_CERTS={AGENT_CA_PATH}",
f"SSL_CERT_FILE={AGENT_CA_BUNDLE}",
f"REQUESTS_CA_BUNDLE={AGENT_CA_BUNDLE}",
]
if plan.agent_git_gate_url:
env.append(f"GIT_GATE_URL={plan.agent_git_gate_url}")
if plan.agent_supervise_url:
env.append(f"MCP_SUPERVISE_URL={plan.agent_supervise_url}")
for name, value in sorted(plan.agent_provision.guest_env.items()):
env.append(f"{name}={value}")
for name in sorted(plan.forwarded_env.keys()):
env.append(name)
return tuple(env)
def _agent_no_proxy(plan: MacosContainerBottlePlan, sidecar_ip: str) -> str:
hosts = ["localhost", "127.0.0.1", sidecar_ip]
return ",".join(hosts)
@@ -1,44 +0,0 @@
"""Prepare step for the macOS Apple Container backend."""
from __future__ import annotations
from pathlib import Path
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...env import ResolvedEnv
from ...git_gate import GitGatePlan
from ...supervise import SupervisePlan
from .. import BottleSpec
from . import util as container_mod
from .bottle_plan import MacosContainerBottlePlan
def preflight() -> None:
container_mod.require_container()
def build_guest_env(resolved_env: ResolvedEnv) -> dict[str, str]:
return dict(resolved_env.literals)
def resolve_plan(
spec: BottleSpec,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
supervise_plan: SupervisePlan | None,
git_gate_plan: GitGatePlan,
stage_dir: Path,
) -> MacosContainerBottlePlan:
return MacosContainerBottlePlan(
spec=spec,
stage_dir=stage_dir,
slug=slug,
forwarded_env=dict(resolved_env.forwarded),
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
agent_provision=agent_provision_plan,
)
-388
View File
@@ -1,388 +0,0 @@
"""Host-side primitives for Apple's `container` CLI."""
from __future__ import annotations
import json
import os
import ipaddress
import platform
import shutil
import subprocess
import time
from typing import Iterable
from ...log import die, info
_CONTAINER = "container"
_DEFAULT_DNS = "1.1.1.1"
def is_macos() -> bool:
return platform.system() == "Darwin"
def is_available() -> bool:
return is_macos() and shutil.which(_CONTAINER) is not None
def require_container() -> None:
"""Fail with an install pointer if Apple Container is unavailable."""
if not is_macos():
info("BOT_BOTTLE_BACKEND=macos-container requires macOS.")
die("macos-container backend is only supported on macOS")
if shutil.which(_CONTAINER) is None:
info("Apple Container is required but was not found on PATH.")
info("Install: https://github.com/apple/container/releases")
die("container not found")
_require_container_service()
def _require_container_service() -> None:
result = subprocess.run(
[_CONTAINER, "system", "status"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
if result.returncode != 0:
info("Apple Container system service is not running.")
info("Start it with: container system start")
die("container system service not running")
def dns_server() -> str:
override = os.environ.get("BOT_BOTTLE_MACOS_CONTAINER_DNS", "").strip()
if override:
return override
return _host_ipv4_dns() or _DEFAULT_DNS
def build_image(ref: str, context: str, *, dockerfile: str = "") -> None:
"""Build an OCI image with Apple's BuildKit-backed `container build`."""
info(
f"building image {ref} from {context} with Apple Container "
"(layer cache keeps repeat builds fast)"
)
_ensure_builder_dns()
args = [_CONTAINER, "build", "-t", ref, "--dns", dns_server()]
if dockerfile:
args.extend(["-f", dockerfile])
args.append(context)
subprocess.run(args, check=True)
def _ensure_builder_dns() -> None:
dns = dns_server()
status = _builder_status()
override = os.environ.get("BOT_BOTTLE_MACOS_CONTAINER_DNS", "").strip()
if _builder_running(status) and _builder_resolves_build_hosts():
if override and not _builder_has_dns(status, dns):
_restart_builder_with_dns(dns)
return
_restart_builder_with_dns(dns)
def _restart_builder_with_dns(dns: str) -> None:
subprocess.run(
[_CONTAINER, "builder", "stop"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
subprocess.run(
[_CONTAINER, "builder", "start", "--dns", dns],
check=True,
)
def _host_ipv4_dns() -> str:
if not is_macos():
return ""
result = subprocess.run(
["scutil", "--dns"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
return ""
blocks: list[list[str]] = []
current: list[str] = []
for line in result.stdout.splitlines():
if line.startswith("resolver #") and current:
blocks.append(current)
current = []
current.append(line)
if current:
blocks.append(current)
for direct_only in (True, False):
for block in blocks:
text = "\n".join(block)
if direct_only and "Directly Reachable Address" not in text:
continue
for line in block:
if "nameserver[" not in line or ":" not in line:
continue
candidate = line.split(":", 1)[1].strip()
if _usable_ipv4(candidate):
return candidate
return ""
def _usable_ipv4(value: str) -> bool:
try:
address = ipaddress.ip_address(value)
except ValueError:
return False
return (
address.version == 4
and not address.is_loopback
and not address.is_link_local
and not address.is_multicast
and not address.is_unspecified
)
def _builder_status() -> list[dict[str, object]]:
result = subprocess.run(
[_CONTAINER, "builder", "status", "--format", "json"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
return []
try:
data = json.loads(result.stdout or "[]")
except json.JSONDecodeError:
return []
if isinstance(data, list):
return [entry for entry in data if isinstance(entry, dict)]
if isinstance(data, dict):
return [data]
return []
def _builder_running(status: list[dict[str, object]]) -> bool:
for entry in status:
entry_status = entry.get("status")
if isinstance(entry_status, dict) and entry_status.get("state") == "running":
return True
return False
def _builder_dns_nameservers(status: list[dict[str, object]]) -> list[str]:
out: list[str] = []
for entry in status:
config = entry.get("configuration")
config_dns = config.get("dns") if isinstance(config, dict) else None
nameservers = (
config_dns.get("nameservers")
if isinstance(config_dns, dict)
else None
)
if not isinstance(nameservers, list):
continue
out.extend(name for name in nameservers if isinstance(name, str))
return out
def _builder_has_dns(status: list[dict[str, object]], dns: str) -> bool:
return dns in _builder_dns_nameservers(status)
def _builder_resolves_build_hosts() -> bool:
result = subprocess.run(
[_CONTAINER, "exec", "buildkit", "getent", "hosts", "deb.debian.org"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
return result.returncode == 0
def image_exists(ref: str) -> bool:
return _silent_run([_CONTAINER, "image", "inspect", ref]) == 0
def container_exists(name: str) -> bool:
result = subprocess.run(
[_CONTAINER, "list", "--all", "--quiet"],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
return False
return name in {line.strip() for line in result.stdout.splitlines()}
def force_remove_container(name: str) -> None:
if container_exists(name):
subprocess.run(
[_CONTAINER, "delete", "--force", name],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
def copy_into_container(name: str, host_path: str, container_path: str) -> None:
cmd = [_CONTAINER, "cp", host_path, f"{name}:{container_path}"]
result = _run_container_op(cmd)
if result.returncode != 0:
die(
f"container cp into {name}:{container_path} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
def exec_container(name: str, argv: list[str]) -> None:
result = _run_container_op([_CONTAINER, "exec", name, *argv])
if result.returncode != 0:
die(
f"container exec in {name} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
def _run_container_op(cmd: list[str]) -> subprocess.CompletedProcess[str]:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=False,
)
for _ in range(19):
if result.returncode == 0:
return result
time.sleep(0.1)
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=False,
)
return result
def create_network(name: str, *, internal: bool = False) -> None:
args = [
_CONTAINER, "network", "create",
"--label", "bot-bottle.backend=macos-container",
]
if internal:
args.append("--internal")
args.append(name)
result = subprocess.run(
args, capture_output=True, text=True, check=False,
)
if result.returncode == 0:
return
if "already exists" in (result.stderr or "").lower():
return
die(
f"container network create {name} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
def remove_network(name: str) -> None:
result = subprocess.run(
[_CONTAINER, "network", "delete", name],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
if result.returncode != 0:
return
def inspect_container(name: str) -> dict[str, object]:
result = subprocess.run(
[_CONTAINER, "inspect", name],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
die(
f"container inspect {name} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
try:
data = json.loads(result.stdout or "[]")
except json.JSONDecodeError as exc:
die(f"container inspect {name} returned malformed JSON: {exc}")
if isinstance(data, list) and data and isinstance(data[0], dict):
return data[0]
if isinstance(data, dict):
return data
die(f"container inspect {name} returned an unexpected shape")
raise AssertionError("unreachable")
def container_ipv4_on_network(name: str, network: str) -> str:
data = inspect_container(name)
status = data.get("status")
networks = status.get("networks") if isinstance(status, dict) else None
if not isinstance(networks, list):
die(f"container inspect {name} did not include status.networks")
for entry in networks:
if not isinstance(entry, dict):
continue
if entry.get("network") != network:
continue
raw = entry.get("ipv4Address")
if not isinstance(raw, str) or not raw:
die(f"container {name} has no IPv4 address on {network}")
return raw.split("/", 1)[0]
die(f"container {name} is not attached to network {network}")
raise AssertionError("unreachable")
def image_id(ref: str) -> str:
"""Return the image digest/ID from `container image inspect`.
The command returns JSON on current Apple Container releases. Keep
parsing narrow and fatal so callers do not cache on an empty key.
"""
import json
result = subprocess.run(
[_CONTAINER, "image", "inspect", ref],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
die(
f"container image inspect for {ref!r} failed: "
f"{(result.stderr or '').strip() or '<no stderr>'}"
)
try:
data = json.loads(result.stdout or "{}")
except json.JSONDecodeError as exc:
die(f"container image inspect for {ref!r} returned malformed JSON: {exc}")
if isinstance(data, list) and data:
data = data[0]
if isinstance(data, dict):
value = data.get("id") or data.get("digest") or data.get("ID")
if value:
return str(value)
die(f"container image inspect for {ref!r} did not include an image id")
raise AssertionError("unreachable")
def save(ref: str, output: str) -> None:
subprocess.run([_CONTAINER, "image", "save", ref, "-o", output], check=True)
def _silent_run(cmd: Iterable[str]) -> int:
return subprocess.run(
list(cmd),
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
).returncode
-122
View File
@@ -1,122 +0,0 @@
"""Shared helpers used by both backends' resolve_plan steps.
Each helper owns one well-defined step of the per-bottle plan
resolution so docker and smolmachines don't repeat the same logic.
Backend-specific steps (container names, env-file, per-bottle
Dockerfile overrides, subnet allocation) stay in the backend's own
resolve_plan.py.
"""
from __future__ import annotations
import os
from dataclasses import replace
from datetime import datetime, timezone
from pathlib import Path
from ..agent_provider import AgentProvisionPlan
from ..bottle_state import (
BottleMetadata,
agent_state_dir,
bottle_identity,
egress_state_dir,
git_gate_state_dir,
supervise_state_dir,
write_metadata,
)
from ..egress import Egress, EgressPlan
from ..git_gate import GitGate, GitGatePlan
from ..manifest import ManifestBottle
from ..supervise import Supervise, SupervisePlan
from . import BottleSpec
def mint_slug(spec: BottleSpec) -> str:
"""Return the bottle identity: the recorded identity for a resume,
or a freshly minted one for a new start."""
return spec.identity or bottle_identity(spec.agent_name)
def write_launch_metadata(
slug: str, spec: BottleSpec, *, compose_project: str, backend: str,
) -> None:
"""Persist launch metadata so `cli.py resume <identity>` can
reconstruct the spec. Idempotent — re-writes on resume with a
refreshed started_at."""
write_metadata(BottleMetadata(
identity=slug,
agent_name=spec.agent_name,
cwd=spec.user_cwd if spec.copy_cwd else "",
copy_cwd=spec.copy_cwd,
started_at=datetime.now(timezone.utc).isoformat(),
compose_project=compose_project,
backend=backend,
label=spec.label,
color=spec.color,
))
def prepare_agent_state_dir(slug: str, spec: BottleSpec) -> tuple[Path, Path]:
"""Create the agent state subdir, write the prompt file.
Returns (agent_dir, prompt_file)."""
manifest = spec.manifest
agent = manifest.agents[spec.agent_name]
agent_dir = agent_state_dir(slug)
agent_dir.mkdir(parents=True, exist_ok=True)
prompt_file = agent_dir / "prompt.txt"
prompt_file.write_text(agent.prompt or "")
prompt_file.chmod(0o600)
return agent_dir, prompt_file
def prepare_git_gate(bottle: ManifestBottle, slug: str) -> GitGatePlan:
git_gate_dir = git_gate_state_dir(slug)
git_gate_dir.mkdir(parents=True, exist_ok=True)
return GitGate().prepare(bottle, slug, git_gate_dir)
def prepare_egress(
bottle: ManifestBottle, slug: str, provision: AgentProvisionPlan,
) -> EgressPlan:
egress_dir = egress_state_dir(slug)
egress_dir.mkdir(parents=True, exist_ok=True)
return Egress().prepare(bottle, slug, egress_dir, provision.egress_routes)
def prepare_supervise(bottle: ManifestBottle, slug: str) -> SupervisePlan | None:
"""Prepare the supervise sidecar state dir. Returns None when
bottle.supervise is falsy."""
if not bottle.supervise:
return None
supervise_dir = supervise_state_dir(slug)
supervise_dir.mkdir(parents=True, exist_ok=True)
return Supervise().prepare(slug, supervise_dir)
def merge_provision_env_vars(provision: AgentProvisionPlan) -> AgentProvisionPlan:
"""Fold provision.env_vars into guest_env (setdefault semantics)
and return a new plan with the merged guest_env."""
merged = dict(provision.guest_env)
for key, val in provision.env_vars.items():
merged.setdefault(key, val)
return replace(provision, guest_env=merged)
def resolve_manifest_dockerfile(path_value: str, spec: BottleSpec) -> str:
"""Resolve a manifest-supplied dockerfile path relative to user_cwd."""
path = Path(os.path.expanduser(path_value))
if not path.is_absolute():
path = Path(spec.user_cwd) / path
return str(path)
__all__ = [
"merge_provision_env_vars",
"mint_slug",
"prepare_agent_state_dir",
"prepare_egress",
"prepare_git_gate",
"prepare_supervise",
"resolve_manifest_dockerfile",
"write_launch_metadata",
]
+22 -33
View File
@@ -13,20 +13,18 @@ from contextlib import contextmanager
from pathlib import Path
from typing import Generator, Sequence
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...env import ResolvedEnv
from ...git_gate import GitGatePlan
from ...supervise import SupervisePlan
from .. import ActiveAgent, BottleBackend, BottleSpec
from .. import ActiveAgent, Bottle, BottleBackend, BottleSpec
from . import cleanup as _cleanup
from . import enumerate as _enumerate
from . import launch as _launch
from . import resolve_plan as _resolve_plan
from . import prepare as _prepare
from . import smolvm as _smolvm
from .bottle import SmolmachinesBottle
from .bottle_cleanup_plan import SmolmachinesBottleCleanupPlan
from .bottle_plan import SmolmachinesBottlePlan
from .provision import ca as _ca
from .provision import git as _git
from .provision import workspace as _workspace
class SmolmachinesBottleBackend(
@@ -45,34 +43,10 @@ class SmolmachinesBottleBackend(
runtime check happens at `prepare`."""
return _smolvm.is_available()
def _preflight(self) -> None:
_resolve_plan.preflight()
def _build_guest_env(self, resolved_env: ResolvedEnv) -> dict[str, str]:
return _resolve_plan.build_guest_env(resolved_env)
def _resolve_plan(
self,
spec: BottleSpec,
*,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
git_gate_plan: GitGatePlan,
supervise_plan: SupervisePlan | None,
stage_dir: Path,
self, spec: BottleSpec, *, stage_dir: Path
) -> SmolmachinesBottlePlan:
return _resolve_plan.resolve_plan(
spec,
slug=slug,
resolved_env=resolved_env,
agent_provision_plan=agent_provision_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
git_gate_plan=git_gate_plan,
stage_dir=stage_dir,
)
return _prepare.resolve_plan(spec, stage_dir=stage_dir)
@contextmanager
def launch(
@@ -81,6 +55,21 @@ class SmolmachinesBottleBackend(
with _launch.launch(plan, provision=self.provision) as bottle:
yield bottle
def provision_ca(
self, plan: SmolmachinesBottlePlan, bottle: Bottle
) -> None:
_ca.provision_ca(plan, bottle)
def provision_workspace(
self, plan: SmolmachinesBottlePlan, bottle: Bottle
) -> None:
_workspace.provision_workspace(plan, bottle)
def provision_git(
self, plan: SmolmachinesBottlePlan, bottle: Bottle
) -> None:
_git.provision_git(plan, bottle)
def supervise_mcp_url(self, plan: SmolmachinesBottlePlan) -> str:
"""The smolmachines guest reaches the supervise sidecar via a
host-published random port the launch step pinned earlier
+11 -40
View File
@@ -19,13 +19,10 @@ from __future__ import annotations
import subprocess
import sys
import time
import shlex
from typing import Mapping, cast
from ...agent_provider import PromptMode, prompt_args
from .. import Bottle, ExecResult
from ..terminal import exec_shell_script
from . import pty_resize as _pty_resize
from . import smolvm as _smolvm
@@ -70,10 +67,6 @@ class SmolmachinesBottle(Bottle):
guest_env: Mapping[str, str] | None = None,
agent_command: str = "claude",
agent_prompt_mode: PromptMode = "append_file",
agent_provider_template: str = "claude",
terminal_title: str = "",
terminal_color: str = "",
agent_workdir: str = "/home/node",
) -> None:
self.name = machine_name
# In-VM path to the agent's prompt file. None when the
@@ -87,10 +80,9 @@ class SmolmachinesBottle(Bottle):
self._guest_env = dict(guest_env or {})
self._agent_prompt_mode = agent_prompt_mode
self.agent_command = agent_command
self.terminal_title = terminal_title
self.terminal_color = terminal_color
self.agent_provider_template = agent_provider_template
self.agent_workdir = agent_workdir
self.agent_provider_template = (
"codex" if agent_command == "codex" else "claude"
)
def agent_argv(
self, argv: list[str], *, tty: bool = True,
@@ -98,14 +90,8 @@ class SmolmachinesBottle(Bottle):
flags = ["smolvm", "machine", "exec", "--name", self.name]
if tty:
flags += ["-i", "-t"]
agent_tail = ["env", *_env_assignments_for("node", self._guest_env)]
if self.agent_workdir and self.agent_workdir != _HOME_FOR["node"]:
agent_tail += [
"sh", "-lc",
f"cd {shlex.quote(self.agent_workdir)} && exec \"$@\"",
"bot-bottle-agent",
]
agent_tail.append(self.agent_command)
agent_tail = ["env", *_env_assignments_for("node", self._guest_env),
self.agent_command]
provider_prompt_args = prompt_args(
cast(PromptMode, self._agent_prompt_mode), self.prompt_path, argv=argv,
)
@@ -141,16 +127,9 @@ class SmolmachinesBottle(Bottle):
UID switches via `runuser -u node --` (not `-l`) so we
avoid login-shell wiring. HOME / USER come from `smolvm
-e` instead, which sets them on the process env."""
agent_argv = self.agent_argv(argv, tty=tty)
script = exec_shell_script(agent_argv, self.terminal_title, self.terminal_color) if tty else None
if script is None:
return subprocess.run(agent_argv, check=False).returncode
return subprocess.run(["sh", "-lc", script], check=False).returncode
# smolvm/libkrun can SIGKILL an otherwise-normal exec during
# early-VM provisioning. Retry once after a short settle so
# callers (provision_ca, etc.) don't have to handle it themselves.
_SIGKILL_EXIT = 128 + 9
return subprocess.run(
self.agent_argv(argv, tty=tty), check=False,
).returncode
def exec(self, script: str, *, user: str = "node") -> ExecResult:
"""Run a POSIX shell script as `user` (default `node`) and
@@ -162,22 +141,14 @@ class SmolmachinesBottle(Bottle):
`runuser -u <user> -- env ... /bin/sh -c <script>` switches UID
without invoking a login shell, then sets HOME / USER and the
bottle env in the child process.
Retries once on SIGKILL (exit 137) — libkrun occasionally
kills short-lived execs during VM bring-up."""
r = self._exec_raw(script, user=user)
if r.returncode == self._SIGKILL_EXIT:
time.sleep(1.0)
r = self._exec_raw(script, user=user)
return r
def _exec_raw(self, script: str, *, user: str = "node") -> ExecResult:
bottle env in the child process."""
argv = [
"--", "runuser", "-u", user, "--",
"env", *_env_assignments_for(user, self._guest_env),
"/bin/sh", "-c", script,
]
# Call smolvm directly because this path needs the host-side
# subprocess capture shape used by the Docker backend.
r = subprocess.run(
["smolvm", "machine", "exec", "--name", self.name] + argv,
capture_output=True, text=True, check=False,
+28 -36
View File
@@ -12,6 +12,7 @@ from dataclasses import dataclass
from pathlib import Path
from ...agent_provider import PromptMode
from ...pipelock import PipelockProxyPlan
from .. import BottlePlan
@@ -29,6 +30,27 @@ class SmolmachinesBottlePlan(BottlePlan):
bundle_subnet: str
bundle_gateway: str
bundle_ip: str
# smolvm machine name + agent image source. machine_create
# boots from a packed `.smolmachine` artifact (pre-baked at
# prepare time via `smolvm pack create`); using `--from`
# instead of `--image` avoids the registry-pull race we hit
# when machine_start tried to fetch on-demand and the libkrun
# agent's network attempt got refused by macOS.
#
# Chunk 2d ships with a public placeholder image (alpine)
# since bot-bottle-claude:latest lives in the operator's local
# docker daemon and smolvm's crane backend can't read from
# there; chunk 4 resolves the agent-image-conversion gap
# (push to a registry first, or smolvm grows a docker-daemon
# transport).
machine_name: str
# Agent image ref (docker tag). `launch` runs the
# build → save → registry push → smolvm pack pipeline against
# this and feeds the resulting `.smolmachine` artifact to
# `machine_create --from`. The pipeline runs at launch time
# (not prepare time) so the docker build output doesn't garble
# the dashboard's preflight modal.
agent_image_ref: str
# In-guest env vars (HTTPS_PROXY etc) — IP-literal URLs since
# the guest has no DNS resolver inside the TSI allowlist.
# Passed to `smolvm machine create` as `-e K=V` flags.
@@ -36,6 +58,11 @@ class SmolmachinesBottlePlan(BottlePlan):
# `--smolfile` is mutually exclusive with `--from`, and
# `--from` is the path that avoids the registry-pull race).
guest_env: dict[str, str]
# Path to the agent's prompt file on the host. Always written
# (mode 0o600) so the in-VM path always exists; the file is
# empty when the agent has no prompt — claude-code reads it
# via --append-system-prompt-file only when non-empty.
prompt_file: Path
# Inner Plans for the sidecar bundle daemons. The same shape the
# docker backend uses — same `.prepare()` calls produced
# them — but our launch step doesn't populate the
@@ -44,6 +71,7 @@ class SmolmachinesBottlePlan(BottlePlan):
# docker's `--internal` + egress bridge topology; it's on a
# per-bottle bridge with a pinned IP. The unused fields stay
# at their dataclass defaults.
proxy_plan: PipelockProxyPlan
# Agent-side endpoints. On Docker Desktop the docker bridge
# IPs aren't reachable from the smolvm guest (TSI uses macOS
# networking; docker container IPs live in the daemon's VM),
@@ -56,42 +84,6 @@ class SmolmachinesBottlePlan(BottlePlan):
agent_git_gate_host: str = ""
agent_supervise_url: str = ""
@property
def machine_name(self) -> str:
"""smolvm machine name. `machine_create` boots from a packed
`.smolmachine` artifact (pre-baked at prepare time via
`smolvm pack create`); using `--from` instead of `--image`
avoids the registry-pull race we hit when machine_start tried
to fetch on-demand and the libkrun agent's network attempt
got refused by macOS."""
return self.agent_provision.instance_name
@property
def agent_image(self) -> str:
"""Agent image ref (docker tag). `launch` runs the
build → save → registry push → smolvm pack pipeline against
this and feeds the resulting `.smolmachine` artifact to
`machine_create --from`. The pipeline runs at launch time
(not prepare time) so the docker build output doesn't garble
the dashboard's preflight modal."""
return self.agent_provision.image
@property
def prompt_file(self) -> Path:
"""Path to the agent's prompt file on the host. Always written
(mode 0o600) so the in-VM path always exists; the file is
empty when the agent has no prompt — claude-code reads it
via --append-system-prompt-file only when non-empty."""
return self.agent_provision.prompt_file
@property
def git_gate_insteadof_host(self) -> str:
return self.agent_git_gate_host
@property
def git_gate_insteadof_scheme(self) -> str:
return "http"
@property
def agent_command(self) -> str:
return self.agent_provision.command
+3 -5
View File
@@ -23,7 +23,7 @@ import json
import subprocess
from .. import ActiveAgent
from ...bottle_state import read_metadata
from ..docker.bottle_state import read_metadata
from . import sidecar_bundle as _bundle
@@ -64,15 +64,13 @@ def enumerate_active() -> list[ActiveAgent]:
agent_name=metadata.agent_name if metadata else "?",
started_at=metadata.started_at if metadata else "",
services=services_by_slug.get(slug, ()),
label=metadata.label if metadata else "",
color=metadata.color if metadata else "",
))
return out
def _query_bundle_services() -> dict[str, tuple[str, ...]]:
"""`{slug: ('egress', ...)}` from each running bundle container's
`BOT_BOTTLE_SIDECAR_DAEMONS` env var.
"""`{slug: ('egress', 'pipelock', ...)}` from each running
bundle container's `BOT_BOTTLE_SIDECAR_DAEMONS` env var.
Smolmachines bundles all run the PRD-0024 image with the
same daemon set declared via env, so one inspect per bundle
gets us the picture without exec'ing into the container.
+90 -26
View File
@@ -9,9 +9,13 @@ guest pointed at the bundle's pinned IP via TSI's
exit.
The bundle's daemons consume the inner Plans the docker backend
already produces: egress reads routes + CAs from the EgressPlan.
Git-gate + supervise plumb through the same plans the docker
backend uses, minus the docker-network fields that don't apply here."""
already produces: pipelock reads its yaml + CA from the
PipelockProxyPlan; egress reads routes + CAs from the EgressPlan
+ EGRESS_UPSTREAM_PROXY pointing at `127.0.0.1:8888` (bundle
local), since the agent dials pipelock first (not egress) on the
smolmachines path. Git-gate + supervise plumb through the same
plans the docker backend uses, minus the docker-network fields
that don't apply here."""
from __future__ import annotations
@@ -25,11 +29,16 @@ from ...egress import (
EGRESS_ROUTES_IN_CONTAINER,
egress_resolve_token_values,
)
from ...pipelock import (
PIPELOCK_CA_CERT_IN_CONTAINER,
PIPELOCK_CA_KEY_IN_CONTAINER,
)
from ...supervise import QUEUE_DIR_IN_CONTAINER, SUPERVISE_PORT
from ...util import expand_tilde
from ..docker import util as docker_mod
from ..docker.egress import (
EGRESS_CA_IN_CONTAINER,
EGRESS_PIPELOCK_CA_IN_CONTAINER,
EGRESS_PORT as _EGRESS_PORT,
egress_tls_init,
)
@@ -39,9 +48,14 @@ from ..docker.git_gate import (
GIT_GATE_ENTRYPOINT_IN_CONTAINER,
GIT_GATE_HOOK_IN_CONTAINER,
)
from ..docker.pipelock import (
BUNDLE_LOCAL_PIPELOCK_URL,
PIPELOCK_PORT as _PIPELOCK_PORT_STR,
pipelock_tls_init,
)
from ...git_gate import revoke_git_gate_provisioned_keys
from ...log import warn
from ...bottle_state import egress_state_dir, git_gate_state_dir
from ..docker.bottle_state import git_gate_state_dir
from . import loopback_alias as _loopback
from . import sidecar_bundle as _bundle
from . import smolvm as _smolvm
@@ -64,7 +78,9 @@ _SMOLMACHINE_CACHE_DIR = Path.home() / ".cache" / "bot-bottle" / "smolmachines"
# Container-internal listening ports for each bundle daemon. The
# bundle publishes each one on a random host loopback port (see
# `_bundle.start_bundle`), and `_bundle.bundle_host_port` looks
# them up post-start.
# them up post-start. Pipelock's port is an env-overridable string
# in docker.pipelock; coerce to int here.
_PIPELOCK_PORT = int(_PIPELOCK_PORT_STR)
_GIT_HTTP_PORT = 9420
_SUPERVISE_PORT = SUPERVISE_PORT
@@ -90,7 +106,7 @@ def launch(
# here, not in prepare, so the docker-build output doesn't
# garble the dashboard's preflight modal.
agent_from_path = _ensure_smolmachine(
plan.agent_image,
plan.agent_image_ref,
dockerfile=plan.agent_dockerfile_path,
)
@@ -103,10 +119,6 @@ def launch(
guest_env=plan.guest_env,
agent_command=plan.agent_command,
agent_prompt_mode=plan.agent_prompt_mode,
agent_provider_template=plan.agent_provider_template,
terminal_title=plan.spec.label or plan.spec.agent_name,
terminal_color=plan.spec.color,
agent_workdir=plan.workspace_plan.workdir,
)
bottle.prompt_path = provision(plan, bottle)
@@ -155,16 +167,33 @@ def _allocate_resources(
def _mint_certs(plan: SmolmachinesBottlePlan) -> SmolmachinesBottlePlan:
"""Mint the egress MITM CA and return the plan with CA paths filled."""
egress_ca_host, egress_ca_cert_only = egress_tls_init(
egress_state_dir(plan.slug),
"""Mint per-bottle CAs and return the plan with CA paths filled.
Pipelock always runs in the bundle. Egress's CA is only minted
when the bottle declares routes — otherwise egress runs idle
without MITM and the CA files would be unused."""
ca_cert_host, ca_key_host = pipelock_tls_init(plan.proxy_plan.yaml_path.parent)
proxy_plan = dataclasses.replace(
plan.proxy_plan,
ca_cert_host_path=ca_cert_host,
ca_key_host_path=ca_key_host,
)
egress_plan = dataclasses.replace(
plan.egress_plan,
mitmproxy_ca_host_path=egress_ca_host,
mitmproxy_ca_cert_only_host_path=egress_ca_cert_only,
)
return dataclasses.replace(plan, egress_plan=egress_plan)
egress_plan = plan.egress_plan
if egress_plan.routes:
egress_ca_host, egress_ca_cert_only = egress_tls_init(
plan.egress_plan.routes_path.parent,
)
egress_plan = dataclasses.replace(
egress_plan,
mitmproxy_ca_host_path=egress_ca_host,
mitmproxy_ca_cert_only_host_path=egress_ca_cert_only,
pipelock_ca_host_path=ca_cert_host,
# On smolmachines, egress's upstream is pipelock on the
# bundle's localhost — they're in the same container's
# network namespace.
pipelock_proxy_url=BUNDLE_LOCAL_PIPELOCK_URL,
)
return dataclasses.replace(plan, proxy_plan=proxy_plan, egress_plan=egress_plan)
def _start_bundle(
@@ -195,10 +224,17 @@ def _discover_urls(
macOS networking, and macOS sees the daemon's bridge via the
published-port loopback forward only.
Proxy hop order: when the bottle declares egress routes, the
agent's first hop is egress (for token injection), then
pipelock. Without routes, the agent dials pipelock directly.
NO_PROXY includes the per-bottle loopback alias so the
supervise + git-gate URLs bypass HTTPS_PROXY."""
if plan.egress_plan.routes:
agent_facing_port = _EGRESS_PORT
else:
agent_facing_port = _PIPELOCK_PORT
agent_facing_host_port = _bundle.bundle_host_port(
plan.slug, _EGRESS_PORT, host_ip=loopback_ip,
plan.slug, agent_facing_port, host_ip=loopback_ip,
)
agent_proxy_url = f"http://{loopback_ip}:{agent_facing_host_port}"
@@ -292,7 +328,8 @@ def _bundle_launch_spec(
"""Build a BundleLaunchSpec from the resolved inner Plans.
Daemons in the CSV:
- egress is always present.
- egress + pipelock are always present (pipelock is the
agent's first hop; egress is its upstream).
- git-gate + git-http are conditional on plan.git_gate_plan.upstreams.
- supervise is conditional on plan.supervise_plan.
@@ -300,15 +337,36 @@ def _bundle_launch_spec(
daemon-private values only (HTTPS_PROXY is scoped to the
egress process by egress_entrypoint.sh — see PRD 0024's bundle
bind-address PR)."""
daemons: list[str] = ["egress"]
daemons: list[str] = ["egress", "pipelock"]
env: list[str] = []
volumes: list[tuple[str, str, bool]] = []
# In this Docker-Desktop-compatible topology, whichever daemon
# is "agent-facing" gets its port published on the host
# loopback (see `_ensure_smolmachine`'s discovery loop) and the
# other stays bundle-internal. The bundle is NOT reachable by
# bridge IP from the smolvm guest on macOS — TSI uses macOS
# networking, and macOS sees the daemon's bridge via the
# published-port loopback forward only.
# --- pipelock ---------------------------------------------
pp = plan.proxy_plan
volumes += [
(str(pp.yaml_path), "/etc/pipelock.yaml", True),
(str(pp.ca_cert_host_path), PIPELOCK_CA_CERT_IN_CONTAINER, True),
(str(pp.ca_key_host_path), PIPELOCK_CA_KEY_IN_CONTAINER, True),
]
# --- egress -----------------------------------------------
ep = plan.egress_plan
volumes.append((str(ep.mitmproxy_ca_host_path), EGRESS_CA_IN_CONTAINER, True))
if ep.routes:
volumes.append((str(ep.routes_path), EGRESS_ROUTES_IN_CONTAINER, True))
env.append(f"EGRESS_UPSTREAM_PROXY={ep.pipelock_proxy_url}")
env.append(f"EGRESS_UPSTREAM_CA={EGRESS_PIPELOCK_CA_IN_CONTAINER}")
volumes += [
(str(ep.routes_path), EGRESS_ROUTES_IN_CONTAINER, True),
(str(ep.mitmproxy_ca_host_path), EGRESS_CA_IN_CONTAINER, True),
(str(ep.pipelock_ca_host_path), EGRESS_PIPELOCK_CA_IN_CONTAINER, True),
]
# Bare-name entries for upstream-token slots. Their values
# come from the docker-run subprocess env (inherited from
# the operator's shell), never landing on argv.
@@ -351,8 +409,14 @@ def _bundle_launch_spec(
# Container ports the agent reaches from the smolvm guest —
# published on host loopback so the guest can dial via TSI +
# macOS networking. Egress is always the agent's HTTP/HTTPS proxy.
ports_to_publish: list[int] = [_EGRESS_PORT]
# macOS networking. The HTTP/HTTPS chokepoint is whichever
# daemon's port we publish: egress when routes are declared
# (token injection first, then forwards to bundle-internal
# pipelock), pipelock otherwise.
if ep.routes:
ports_to_publish: list[int] = [_EGRESS_PORT]
else:
ports_to_publish = [_PIPELOCK_PORT]
if gp.upstreams:
ports_to_publish.append(_GIT_HTTP_PORT)
if sp is not None:
@@ -48,7 +48,7 @@ from ...log import die
# registry:2.8.3, pinned by digest. Same env-override pattern as the
# sidecar image pin in bot_bottle/backend/docker/sidecar_bundle.py.
# pipelock image pin in bot_bottle/backend/docker/pipelock.py.
REGISTRY_IMAGE = os.environ.get(
"BOT_BOTTLE_REGISTRY_IMAGE",
"registry@sha256:a3d8aaa63ed8681a604f1dea0aa03f100d5895b6a58ace528858a7b332415373",
+197
View File
@@ -0,0 +1,197 @@
"""smolmachines `_resolve_plan` (PRD 0023 chunks 2d + 4c).
Resolves the per-bottle docker subnet + bundle IP and assembles
the guest env. The agent's docker image build → smolmachine
pack pipeline runs in `launch.launch`, not here, so the
dashboard's preflight modal isn't garbled by docker-build output
before the operator has confirmed.
No VM bringup — that's `launch.launch`'s job."""
from __future__ import annotations
import os
from datetime import datetime, timezone
from dataclasses import replace
from pathlib import Path
from ...agent_provider import agent_provision_plan, runtime_for
from ...backend import BottleSpec
from ...backend.docker.bottle_state import (
BottleMetadata,
agent_state_dir,
bottle_identity,
egress_state_dir,
git_gate_state_dir,
pipelock_state_dir,
supervise_state_dir,
write_metadata,
)
from ...egress import Egress
from ...env import resolve_env
from ...git_gate import GitGate
from ...pipelock import PipelockProxy
from ...supervise import Supervise
from ...workspace import workspace_plan as resolve_workspace_plan
from .bottle_plan import SmolmachinesBottlePlan
from .util import smolmachines_bundle_subnet, smolmachines_preflight
# Gateway ports the bundle exposes inside its container — pipelock
# HTTPS proxy, git-gate's git-daemon, supervise's MCP. The agent
# inside the smolvm guest dials these on the bundle's pinned IP.
_BUNDLE_PIPELOCK_PORT = 8888
_BUNDLE_GIT_GATE_PORT = 9418
_BUNDLE_SUPERVISE_PORT = 9100
def resolve_plan(
spec: BottleSpec, *, stage_dir: Path
) -> SmolmachinesBottlePlan:
"""Materialize the smolmachines plan. The bundle's docker
subnet + pinned IP are derived from the slug; the agent's
`.smolmachine` artifact is built (or cache-hit) here so
launch's `machine create --from` boots without a registry
pull. Per-bottle guest env + the TSI allow_cidrs land on the
plan for launch to pass straight through to
`machine create` flags."""
smolmachines_preflight()
manifest = spec.manifest
bottle = manifest.bottle_for(spec.agent_name)
provider = bottle.agent_provider
provider_runtime = runtime_for(provider.template)
guest_home = "/home/node"
workspace_plan = resolve_workspace_plan(spec, guest_home=guest_home)
slug = spec.identity or bottle_identity(spec.agent_name)
# Record minimal metadata so `cli.py resume` can recover the
# slug. Same schema as the docker backend.
write_metadata(BottleMetadata(
identity=slug,
agent_name=spec.agent_name,
cwd=spec.user_cwd if spec.copy_cwd else "",
copy_cwd=spec.copy_cwd,
started_at=datetime.now(timezone.utc).isoformat(),
compose_project="",
backend="smolmachines",
))
subnet, gateway, bundle_ip = smolmachines_bundle_subnet(slug)
# Agent's env: resolve through resolve_env() so ?prompt entries
# are prompted and ${HOST_VAR} entries are interpolated — matching
# the Docker backend's contract. Forwarded (secret/interpolated)
# values still reach the guest as -e K=V smolvm flags because
# smolvm 0.8.0 has no env-file or stdin injection path; this is
# the known argv-exposure gap documented in PRD 0038.
# HTTPS_PROXY / GIT_GATE_URL / MCP_SUPERVISE_URL are populated
# in launch.py after bundle bringup.
resolved = resolve_env(manifest, spec.agent_name)
guest_env: dict[str, str] = {
**resolved.literals,
**resolved.forwarded,
"NO_PROXY": "localhost,127.0.0.1",
"NODE_EXTRA_CA_CERTS": "/etc/ssl/certs/ca-certificates.crt",
"SSL_CERT_FILE": "/etc/ssl/certs/ca-certificates.crt",
"REQUESTS_CA_BUNDLE": "/etc/ssl/certs/ca-certificates.crt",
}
git_gate_dir = git_gate_state_dir(slug)
git_gate_dir.mkdir(parents=True, exist_ok=True)
git_gate_plan = GitGate().prepare(bottle, slug, git_gate_dir)
# Prompt file is always written (mode 0o600) so the in-VM
# path always exists. Content is the agent's `prompt`
# field (markdown body) — empty for agents with no prompt.
# claude-code reads it via --append-system-prompt-file only
# when non-empty, but the file must exist either way to
# match the docker backend's contract.
agent_dir = agent_state_dir(slug)
agent_dir.mkdir(parents=True, exist_ok=True)
prompt_file = agent_dir / "prompt.txt"
agent = manifest.agents[spec.agent_name]
prompt_file.write_text(agent.prompt or "")
prompt_file.chmod(0o600)
machine_name = f"bot-bottle-{slug}"
# Stash the agent image ref — `launch.launch` runs the
# build → pack pipeline at bringup. Honors BOT_BOTTLE_IMAGE
# to match the docker backend's `resolve_plan` default.
agent_dockerfile_path = ""
if provider.dockerfile:
agent_dockerfile_path = _resolve_manifest_dockerfile(provider.dockerfile, spec)
image_default = f"bot-bottle-{provider.template}:{slug}"
elif provider_runtime.dockerfile:
agent_dockerfile_path = provider_runtime.dockerfile
image_default = provider_runtime.image
else:
image_default = provider_runtime.image
agent_image_ref = os.environ.get("BOT_BOTTLE_IMAGE", image_default)
agent_provision = agent_provision_plan(
template=provider.template,
dockerfile=agent_dockerfile_path,
state_dir=agent_dir,
guest_home=guest_home,
guest_env=guest_env,
forward_host_credentials=provider.forward_host_credentials,
auth_token=provider.auth_token,
host_env=dict(os.environ),
trusted_project_path=workspace_plan.workdir,
)
merged_guest_env = dict(agent_provision.guest_env)
for key, val in agent_provision.env_vars.items():
merged_guest_env.setdefault(key, val)
agent_provision = replace(agent_provision, guest_env=merged_guest_env)
# Inner Plans for the four bundle daemons. The ABCs are
# platform-neutral — `.prepare()` writes config files + returns
# a Plan dataclass with no backend-specific assumptions. State
# dirs are still keyed by slug under the docker backend's
# bottle_state layout (shared on-host convention; not a docker
# dependency).
pipelock_dir = pipelock_state_dir(slug)
pipelock_dir.mkdir(parents=True, exist_ok=True)
proxy_plan = PipelockProxy().prepare(
bottle, slug, pipelock_dir, agent_provision.egress_routes,
)
egress_dir = egress_state_dir(slug)
egress_dir.mkdir(parents=True, exist_ok=True)
egress_plan = Egress().prepare(
bottle, slug, egress_dir, agent_provision.egress_routes,
)
supervise_plan = None
if bottle.supervise:
supervise_dir = supervise_state_dir(slug)
supervise_dir.mkdir(parents=True, exist_ok=True)
supervise_plan = Supervise().prepare(slug, supervise_dir)
return SmolmachinesBottlePlan(
spec=spec,
stage_dir=stage_dir,
guest_home=guest_home,
slug=slug,
bundle_subnet=subnet,
bundle_gateway=gateway,
bundle_ip=bundle_ip,
machine_name=machine_name,
agent_image_ref=agent_image_ref,
guest_env=agent_provision.guest_env,
prompt_file=prompt_file,
proxy_plan=proxy_plan,
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
agent_provision=agent_provision,
workspace_plan=workspace_plan,
)
def _resolve_manifest_dockerfile(path_value: str, spec: BottleSpec) -> str:
path = Path(os.path.expanduser(path_value))
if not path.is_absolute():
path = Path(spec.user_cwd) / path
return str(path)
@@ -2,11 +2,11 @@
Per PRD 0050 the per-provider provisioning steps (prompt, skills,
declarative provision-plan apply, supervise MCP registration) live on
the `AgentProvider` plugin under `bot_bottle/contrib/`. CA and git
provisioning also moved to the AgentProvider ABC (with Debian/node
defaults); user plugins override them for non-standard images.
the `AgentProvider` plugin under `bot_bottle/contrib/`. The modules
left in this subpackage handle only the steps that are
backend-specific:
No modules remain in this subpackage. Workspace copying now runs
through `BottleBackend.provision_workspace` against the running
bottle for every backend.
- ca.py — install per-bottle CA bundle into the guest trust store
- git.py — copy host cwd `.git` into the guest when --cwd is used
- workspace.py — copy the operator workspace into the guest
"""
@@ -0,0 +1,93 @@
"""Install the per-bottle MITM CA into the smolmachines guest's
trust store (PRD 0023 chunk 4d).
Mirrors `backend.docker.provision.ca`: select the right CA (egress
when the bottle has routes, else pipelock), copy it to Debian's
`/usr/local/share/ca-certificates/` path,
`update-ca-certificates` to rebuild the trust bundle, and log the
fingerprint once. The selected cert depends on the agent's
HTTP_PROXY target — same logic as the docker backend, since the
agent dials the same daemons through the same bundle.
`smolvm machine exec` runs commands as root in the VM (no `-u`
flag exists; the VM init is root), so we don't need the explicit
`-u 0` the docker backend uses on its `docker exec` calls."""
from __future__ import annotations
import time
from ....log import die
from ...util import (
AGENT_CA_BUNDLE,
AGENT_CA_PATH,
log_ca_fingerprint,
select_ca_cert,
)
from ... import Bottle, ExecResult
from ..bottle_plan import SmolmachinesBottlePlan
_SIGKILL_EXIT = 128 + 9
def provision_ca(plan: SmolmachinesBottlePlan, bottle: Bottle) -> None:
"""Copy the agent-facing CA cert into the guest, rebuild the
trust bundle, emit a one-line fingerprint log. Called from
`BottleBackend.provision` after the smolvm guest is up."""
cert_host_path, label = select_ca_cert(plan.egress_plan, plan.proxy_plan)
bottle.cp_in(str(cert_host_path), AGENT_CA_PATH)
# Mode 0644 — readable to non-root tools in the guest.
# update-ca-certificates rebuilds the bundle at AGENT_CA_BUNDLE,
# which is what curl / Python ssl / OpenSSL-based tools read by
# default. The env trio (NODE_EXTRA_CA_CERTS / SSL_CERT_FILE /
# REQUESTS_CA_BUNDLE) on the guest_env covers Node + Python
# `requests` / libraries that don't load the system bundle.
#
r = _install_ca(bottle)
if r.returncode == _SIGKILL_EXIT:
# smolvm/libkrun can SIGKILL an otherwise-normal exec
# during early-VM provisioning. `update-ca-certificates`
# is idempotent, so retry the same install once after a
# short settle delay before treating it as fatal.
time.sleep(1.0)
r = _install_ca(bottle)
if r.returncode != 0:
# update-ca-certificates not adding our cert is fatal —
# claude-code's TLS handshake against the egress-MITM'd
# api.anthropic.com would fail downstream. Bail early
# with what we can see (output is captured so we can
# surface it).
die(
f"update-ca-certificates didn't add the agent CA "
f"(exit {r.returncode}): "
f"stdout={(r.stdout or '').strip()!r} "
f"stderr={(r.stderr or '').strip()!r}"
)
log_ca_fingerprint(cert_host_path, label)
def _install_ca(bottle: Bottle) -> ExecResult:
# chown + chmod + update-ca-certificates + bundle
# verification run in one exec so we only pay one
# round trip; the `&&` chaining surfaces the first failure
# as the return code. The verify check is more stable than
# requiring "1 added" in stdout: a retry after a
# partially-completed first run may legitimately report "0
# added" while the cert is already installed.
return bottle.exec(
f"chown root:root {AGENT_CA_PATH} && "
f"chmod 644 {AGENT_CA_PATH} && "
f"update-ca-certificates && "
f"openssl verify -CAfile {AGENT_CA_BUNDLE} {AGENT_CA_PATH}",
user="root",
)
# Re-exported for the launch/provision_ca caller + tests. The path
# constants live in the shared `backend.util` (Debian's
# `update-ca-certificates` layout is the same in both backends).
__all__ = ["AGENT_CA_BUNDLE", "AGENT_CA_PATH", "provision_ca"]
@@ -0,0 +1,133 @@
"""Git provisioning inside a running smolmachines bottle
(PRD 0023 chunk 4d).
Three concerns, all about git in the agent:
1. If --cwd was passed AND the host cwd has a .git, copy that
.git into the planned guest workspace so the agent operates on
the user's repo.
2. If the bottle declares `git` entries (PRD 0008), write a
~/.gitconfig with insteadOf rules so every git operation
against a declared upstream transparently hits the per-bottle
git-gate. The gate mirrors the upstream in both directions,
so URL rewriting is symmetric.
3. If the bottle declares `git.user` (issue #86), set
`git config --global user.{name,email}` inside the guest so
the agent's commits are attributed to that identity.
Differs from `backend.docker.provision.git` in one address detail:
the TSI-allowlisted guest can only reach the bundle's pinned IP
(no DNS resolver in the /32 allowlist), so the insteadOf URLs
are `http://<bundle_ip>:<port>/<name>.git` rather than the
docker backend's `git://git-gate/<name>.git`. The render itself
is the shared `git_gate_render_gitconfig` on the platform-neutral
git_gate module."""
from __future__ import annotations
import os
import shlex
import tempfile
from pathlib import Path
from ....git_gate import git_gate_render_gitconfig
from ....log import info
from ... import Bottle
from ..bottle_plan import SmolmachinesBottlePlan
def provision_git(plan: SmolmachinesBottlePlan, bottle: Bottle) -> None:
"""Set up git inside the guest. Runs all three subcases; each
no-ops when its condition isn't met."""
_provision_cwd_git(plan, bottle)
_provision_git_gate_config(plan, bottle)
_provision_git_user(plan, bottle)
def _provision_cwd_git(plan: SmolmachinesBottlePlan, bottle: Bottle) -> None:
"""If --cwd was set and the host cwd has a .git directory, copy
it into <guest_home>/workspace/.git and fix ownership. No-op
otherwise."""
workspace = plan.workspace_plan
if not (workspace.enabled and workspace.copy_git and workspace.has_host_git_dir):
return
guest_workspace_git = f"{workspace.guest_path}/.git"
host_git = str(workspace.host_path / ".git")
info(f"copying {host_git} -> {bottle.name}:{guest_workspace_git}")
# mkdir -p the workspace dir so cp_in lands the .git
# directly there even on first-time bottles.
bottle.exec(f"mkdir -p {shlex.quote(workspace.guest_path)}", user="root")
bottle.cp_in(host_git, guest_workspace_git)
# cp_in lands files as root; the agent runs as node so
# the workspace tree must be chowned over.
bottle.exec(
f"chown -R {shlex.quote(workspace.owner)} {shlex.quote(guest_workspace_git)}",
user="root",
)
def _provision_git_gate_config(
plan: SmolmachinesBottlePlan, bottle: Bottle
) -> None:
"""Write ~/.gitconfig in the guest with the git-gate insteadOf
rules. No-op when the bottle has no `git` entries."""
manifest_bottle = plan.spec.manifest.bottle_for(plan.spec.agent_name)
if not manifest_bottle.git:
return
# `<loopback alias>:<host port>` form: the bundle's git-gate
# HTTP port is published on host loopback at launch time so
# the smolvm guest (which can only reach macOS networking via
# TSI, not the docker bridge IP) can dial it. launch.py
# populates `plan.agent_git_gate_host` after bundle bringup.
content = git_gate_render_gitconfig(
manifest_bottle.git, plan.agent_git_gate_host, scheme="http",
)
guest_gitconfig = f"{plan.guest_home}/.gitconfig"
# Stage the file under the plan's stage_dir so cp_in
# has a stable host path. The plan's stage_dir is cleaned up
# by start.py's session-end teardown.
with tempfile.NamedTemporaryFile(
"w", dir=str(plan.stage_dir), prefix="gitconfig.",
delete=False,
) as f:
f.write(content)
config_file = Path(f.name)
os.chmod(config_file, 0o600)
info(f"writing {guest_gitconfig} with {len(manifest_bottle.git)} insteadOf rule(s)")
bottle.cp_in(str(config_file), guest_gitconfig)
bottle.exec(
f"chown node:node {shlex.quote(guest_gitconfig)} && "
f"chmod 644 {shlex.quote(guest_gitconfig)}",
user="root",
)
def _provision_git_user(
plan: SmolmachinesBottlePlan, bottle: Bottle,
) -> None:
"""Apply `git config --global user.{name,email}` inside the
guest as the node user so --global lands in the same
`/home/node/.gitconfig` that `_provision_git_gate_config`
writes to. No-op when the bottle didn't declare `git.user`.
SmolmachinesBottle.exec(user="node") automatically sets
HOME=/home/node so --global writes to /home/node/.gitconfig."""
manifest_bottle = plan.spec.manifest.bottle_for(plan.spec.agent_name)
gu = manifest_bottle.git_user
if gu.is_empty():
return
if gu.name:
info(f"git config --global user.name = {gu.name!r}")
bottle.exec(
f"git config --global user.name {shlex.quote(gu.name)}",
user="node",
)
if gu.email:
info(f"git config --global user.email = {gu.email!r}")
bottle.exec(
f"git config --global user.email {shlex.quote(gu.email)}",
user="node",
)
@@ -0,0 +1,32 @@
"""Copy the operator workspace into a smolmachines guest."""
from __future__ import annotations
import shlex
from ....log import info
from ... import Bottle
from ..bottle_plan import SmolmachinesBottlePlan
def provision_workspace(plan: SmolmachinesBottlePlan, bottle: Bottle) -> None:
"""Copy host cwd contents to the planned guest workspace."""
workspace = plan.workspace_plan
if not (workspace.enabled and workspace.copy_contents):
return
guest_parent = workspace.guest_path.rsplit("/", 1)[0] or "/"
guest_path_q = shlex.quote(workspace.guest_path)
guest_parent_q = shlex.quote(guest_parent)
owner_q = shlex.quote(workspace.owner)
mode_q = shlex.quote(workspace.mode)
info(f"copying {workspace.host_path} -> {bottle.name}:{workspace.guest_path}")
bottle.exec(
f"rm -rf {guest_path_q} && mkdir -p {guest_parent_q}",
user="root",
)
bottle.cp_in(str(workspace.host_path), workspace.guest_path)
bottle.exec(
f"chown -R {owner_q} {guest_path_q} && chmod {mode_q} {guest_path_q}",
user="root",
)
@@ -36,6 +36,7 @@ follow-up tracked separately)."""
from __future__ import annotations
import fcntl
import io
import signal
import struct
import subprocess
@@ -68,9 +69,12 @@ def _read_winsize() -> tuple[int, int] | None:
- tmux respawn-pane: tmux sets all three to the pane's PTY.
- non-TTY (someone piped stdin in tests): none are; the
sync just no-ops, which is the right behavior."""
for stream in (sys.stdin, sys.stdout, sys.stderr):
for default_fd, stream in enumerate((sys.stdin, sys.stdout, sys.stderr)):
try:
fd = stream.fileno()
except (AttributeError, io.UnsupportedOperation, OSError):
fd = default_fd
try:
data = fcntl.ioctl(fd, termios.TIOCGWINSZ, b"\x00" * 8)
except OSError:
continue
@@ -1,80 +0,0 @@
"""smolmachines `_resolve_plan` (PRD 0023 chunks 2d + 4c).
Resolves the per-bottle docker subnet + bundle IP and assembles
the guest env. The agent's docker image build → smolmachine
pack pipeline runs in `launch.launch`, not here, so the
dashboard's preflight modal isn't garbled by docker-build output
before the operator has confirmed.
No VM bringup — that's `launch.launch`'s job."""
from __future__ import annotations
from pathlib import Path
from .. import BottleSpec
from ...env import ResolvedEnv
from ...agent_provider import AgentProvisionPlan
from ...egress import EgressPlan
from ...supervise import SupervisePlan
from ...git_gate import GitGatePlan
from .bottle_plan import SmolmachinesBottlePlan
from .util import smolmachines_bundle_subnet, smolmachines_preflight
def preflight() -> None:
smolmachines_preflight()
def build_guest_env(resolved_env: ResolvedEnv) -> dict[str, str]:
# Agent's env: resolve through resolve_env() so ?prompt entries
# are prompted and ${HOST_VAR} entries are interpolated — matching
# the Docker backend's contract. Forwarded (secret/interpolated)
# values still reach the guest as -e K=V smolvm flags because
# smolvm 0.8.0 has no env-file or stdin injection path; this is
# the known argv-exposure gap documented in PRD 0038.
# HTTPS_PROXY / GIT_GATE_URL / MCP_SUPERVISE_URL are populated
# in launch.py after bundle bringup.
return {
**resolved_env.literals,
**resolved_env.forwarded,
"NO_PROXY": "localhost,127.0.0.1",
"NODE_EXTRA_CA_CERTS": "/etc/ssl/certs/ca-certificates.crt",
"SSL_CERT_FILE": "/etc/ssl/certs/ca-certificates.crt",
"REQUESTS_CA_BUNDLE": "/etc/ssl/certs/ca-certificates.crt",
}
def resolve_plan(
spec: BottleSpec,
slug: str,
resolved_env: ResolvedEnv,
agent_provision_plan: AgentProvisionPlan,
egress_plan: EgressPlan,
supervise_plan: SupervisePlan | None,
git_gate_plan: GitGatePlan,
stage_dir: Path,
) -> SmolmachinesBottlePlan:
"""Materialize the smolmachines plan. The bundle's docker
subnet + pinned IP are derived from the slug; the agent's
`.smolmachine` artifact is built (or cache-hit) here so
launch's `machine create --from` boots without a registry
pull. Per-bottle guest env + the TSI allow_cidrs land on the
plan for launch to pass straight through to
`machine create` flags."""
# ==== smolmachines specific setup ====
subnet, gateway, bundle_ip = smolmachines_bundle_subnet(slug)
return SmolmachinesBottlePlan(
spec=spec,
stage_dir=stage_dir,
slug=slug,
bundle_subnet=subnet,
bundle_gateway=gateway,
bundle_ip=bundle_ip,
guest_env=agent_provision_plan.guest_env,
git_gate_plan=git_gate_plan,
egress_plan=egress_plan,
supervise_plan=supervise_plan,
agent_provision=agent_provision_plan,
)
@@ -19,7 +19,7 @@ This module ships the lifecycle primitives only — create
network, start bundle, stop bundle, remove network — wrapped
around `subprocess.run(["docker", ...])`. Wiring them into the
launch flow + populating the `BundleLaunchSpec` from the inner
Plans (EgressPlan, …) lands in chunk 2d."""
Plans (PipelockProxyPlan, EgressPlan, …) lands in chunk 2d."""
from __future__ import annotations
@@ -69,7 +69,7 @@ class BundleLaunchSpec:
# Daemon subset CSV for BOT_BOTTLE_SIDECAR_DAEMONS. The
# supervisor inside the bundle reads it to skip
# bottle-irrelevant daemons (e.g. supervise=False bottles).
daemons_csv: str = "egress"
daemons_csv: str = "egress,pipelock"
# Plain "KEY=VALUE" strings + "KEY" bare names (the bare-name
# form inherits the value from the docker-run subprocess env,
# matching the docker backend's compose-up secret-forwarding
+1 -3
View File
@@ -21,9 +21,7 @@ def smolmachines_preflight() -> None:
die(
"BOT_BOTTLE_BACKEND=smolmachines requires `smolvm` on "
"PATH. Install with: "
"curl -sSL https://smolmachines.com/install.sh | sh. "
"To use the legacy Docker backend instead, set "
"BOT_BOTTLE_BACKEND=docker or pass --backend=docker."
"curl -sSL https://smolmachines.com/install.sh | sh"
)
-82
View File
@@ -1,82 +0,0 @@
"""Terminal escape-sequence helpers shared across all bottle backends."""
from __future__ import annotations
import shlex
# color name → (normal_idx, normal_hex, bright_idx, bright_hex, dark_bg_hex)
# OSC 4 sets indexed palette entries (affects syntax-highlighted code and any
# TUI content that uses indexed colors). dark_bg_hex is used for OSC 11
# (default background) — a very dark tint that's visible even when the TUI
# uses true/24-bit colors for its own chrome, which would otherwise bypass
# the palette entirely.
_COLORS: dict[str, tuple[int, str, int, str, str]] = {
"black": (0, "#2d2d2d", 8, "#5c5c5c", "#0a0a0a"),
"red": (1, "#c0392b", 9, "#e74c3c", "#1a0707"),
"green": (2, "#27ae60", 10, "#2ecc71", "#071a09"),
"yellow": (3, "#d4ac0d", 11, "#f1c40f", "#1a1507"),
"blue": (4, "#2471a3", 12, "#3498db", "#07071a"),
"magenta": (5, "#7d3c98", 13, "#9b59b6", "#12071a"),
"cyan": (6, "#148f77", 14, "#1abc9c", "#071a1a"),
"white": (7, "#bdc3c7", 15, "#ecf0f1", "#111111"),
"bright-black": (8, "#5c5c5c", 0, "#2d2d2d", "#111111"),
"bright-red": (9, "#e74c3c", 1, "#c0392b", "#200808"),
"bright-green": (10, "#2ecc71", 2, "#27ae60", "#082008"),
"bright-yellow": (11, "#f1c40f", 3, "#d4ac0d", "#201808"),
"bright-blue": (12, "#3498db", 4, "#2471a3", "#080820"),
"bright-magenta": (13, "#9b59b6", 5, "#7d3c98", "#160820"),
"bright-cyan": (14, "#1abc9c", 6, "#148f77", "#082020"),
"bright-white": (15, "#ecf0f1", 7, "#bdc3c7", "#151515"),
}
# OSC 104 resets all indexed palette entries; OSC 111 resets default background.
_RESET_PRINTF = "printf '\\033]104\\007\\033]111\\007'"
def palette_printf(color: str) -> str:
"""Shell `printf` command that emits OSC 4 + OSC 11 to tint the terminal
for *color*: sets the normal/bright palette entries AND the default
background to a dark shade of that color. Returns '' if unknown."""
entry = _COLORS.get(color)
if not entry:
return ""
n_idx, n_hex, b_idx, b_hex, bg_hex = entry
seq = (
f"\\033]4;{n_idx};{n_hex}\\007"
f"\\033]4;{b_idx};{b_hex}\\007"
f"\\033]11;{bg_hex}\\007"
)
return f"printf '{seq}'"
def exec_shell_script(
agent_argv: list[str],
terminal_title: str = "",
terminal_color: str = "",
) -> str | None:
"""Build a shell script string that optionally sets the terminal
title and/or palette before running *agent_argv*, and resets the
palette + background on exit. Returns None when no decoration is
needed callers should run *agent_argv* directly in that case."""
title_cmd = (
f"printf '\\033]0;%s\\007' {shlex.quote(terminal_title)}"
if terminal_title else ""
)
pal_cmd = palette_printf(terminal_color)
if not title_cmd and not pal_cmd:
return None
parts: list[str] = []
if title_cmd:
parts.append(title_cmd)
if pal_cmd:
parts.append(pal_cmd)
parts.append(shlex.join(agent_argv))
parts.append(_RESET_PRINTF)
else:
# No palette change — exec so the agent replaces the shell.
parts.append(f"exec {shlex.join(agent_argv)}")
return "; ".join(parts)
+27 -11
View File
@@ -14,6 +14,7 @@ from ..log import die, info
if TYPE_CHECKING:
from ..egress import EgressPlan
from ..pipelock import PipelockProxyPlan
# Debian-family CA layout, shared by every backend (all guest images
@@ -34,20 +35,35 @@ def host_skill_dir(name: str) -> str:
return f"{home}/.claude/skills/{name}"
def select_ca_cert(egress_plan: EgressPlan) -> tuple[Path, str]:
"""Return the egress MITM CA cert path and label for provision_ca.
def select_ca_cert(
egress_plan: EgressPlan, proxy_plan: PipelockProxyPlan
) -> tuple[Path, str]:
"""Pick the agent-facing CA cert (and a short label for the log
line) that matches the proxy the agent's HTTP_PROXY points at.
Egress wins when the bottle declares any routes (it sits in front
of pipelock); else pipelock.
Launch always mints the CA and re-binds the host path into the
egress_plan before provision runs, so an empty/missing path here
means launch's bringup is broken — fatal."""
cert = egress_plan.mitmproxy_ca_cert_only_host_path
if cert == Path() or not cert.is_file():
Shared by every backend's `provision_ca`: launch mints the chosen
CA(s) and re-binds their host paths into these inner plans before
provision runs, so an empty/missing path here means launch's
bringup is broken fatal."""
if egress_plan.routes:
cert = egress_plan.mitmproxy_ca_cert_only_host_path
if cert == Path() or not cert.is_file():
die(
f"egress CA cert missing at {cert or '(empty)'}; "
f"launch must have called egress_tls_init and "
f"re-bound the plan before provision"
)
return cert, "egress"
cert = proxy_plan.ca_cert_host_path
if not cert or not cert.is_file():
die(
f"egress CA cert missing at {cert or '(empty)'}; "
f"launch must have called egress_tls_init and "
f"re-bound the plan before provision"
f"pipelock CA cert missing at {cert or '(empty)'}; "
f"launch must have called pipelock_tls_init and re-bound "
f"the plan before provision"
)
return cert, "egress"
return cert, "pipelock"
def log_ca_fingerprint(cert_host_path: Path, label: str) -> None:
+1 -4
View File
@@ -1,6 +1,6 @@
"""Main CLI dispatcher.
Commands: cleanup, doctor, edit, info, init, list, resume, start, supervise
Commands: cleanup, edit, info, init, list, resume, start, supervise
"""
from __future__ import annotations
@@ -12,7 +12,6 @@ from ..manifest import ManifestError
from ._common import PROG
from . import list as _list_mod
from .cleanup import cmd_cleanup
from .doctor import cmd_doctor
from .edit import cmd_edit
from .info import cmd_info
from .init import cmd_init
@@ -24,7 +23,6 @@ cmd_list = _list_mod.cmd_list
COMMANDS = {
"cleanup": cmd_cleanup,
"doctor": cmd_doctor,
"edit": cmd_edit,
"info": cmd_info,
"init": cmd_init,
@@ -39,7 +37,6 @@ def usage() -> None:
sys.stderr.write(f"usage: {PROG} <command> [args...]\n\n")
sys.stderr.write("Commands:\n")
sys.stderr.write(" cleanup stop and remove all active bot-bottle containers\n")
sys.stderr.write(" doctor check Python, Docker, and bot-bottle config prerequisites\n")
sys.stderr.write(" edit open an agent in vim for editing\n")
sys.stderr.write(" info print env, skills, and prompt details for a named agent\n")
sys.stderr.write(" init interactively create a new agent and add it to bot-bottle.json\n")
+1 -1
View File
@@ -6,7 +6,7 @@ import os
import sys
from pathlib import Path
PROG = Path(sys.argv[0]).name or "bot-bottle"
PROG = "cli.py"
USER_CWD = os.getcwd()
REPO_DIR = str(Path(__file__).resolve().parent.parent.parent)
-73
View File
@@ -1,73 +0,0 @@
"""doctor: validate host prerequisites for running bot-bottle."""
from __future__ import annotations
import argparse
import shutil
import subprocess
import sys
from pathlib import Path
from ._common import PROG
def _ok(label: str, detail: str) -> None:
print(f"ok: {label}: {detail}")
def _fail(label: str, detail: str) -> None:
print(f"fail: {label}: {detail}")
def _check_python() -> bool:
version = sys.version_info
detail = f"{version.major}.{version.minor}.{version.micro}"
if version >= (3, 11):
_ok("python", detail)
return True
_fail("python", f"{detail}; need 3.11 or newer")
return False
def _check_docker() -> bool:
docker = shutil.which("docker")
if not docker:
_fail("docker", "docker command not found")
return False
try:
result = subprocess.run(
[docker, "info"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
timeout=10,
)
except (OSError, subprocess.TimeoutExpired) as exc:
_fail("docker", f"daemon check failed: {exc}")
return False
if result.returncode == 0:
_ok("docker", "daemon reachable")
return True
_fail("docker", "daemon not reachable")
return False
def _check_config_dir() -> bool:
config = Path.home() / ".bot-bottle"
if config.is_dir():
_ok("config", str(config))
return True
_fail("config", f"{config} does not exist")
return False
def cmd_doctor(argv: list[str]) -> int:
parser = argparse.ArgumentParser(prog=f"{PROG} doctor", add_help=True)
parser.parse_args(argv)
checks = (
_check_python(),
_check_docker(),
_check_config_dir(),
)
return 0 if all(checks) else 1
+5 -40
View File
@@ -3,47 +3,12 @@
from __future__ import annotations
import argparse
import os
import sys
from ..backend import enumerate_active_agents
from ..manifest import Manifest
from ._common import PROG, USER_CWD
_ANSI_COLOR_CODES: dict[str, str] = {
"black": "\033[30m",
"red": "\033[31m",
"green": "\033[32m",
"yellow": "\033[33m",
"blue": "\033[34m",
"magenta": "\033[35m",
"cyan": "\033[36m",
"white": "\033[37m",
"bright-black": "\033[90m",
"bright-red": "\033[91m",
"bright-green": "\033[92m",
"bright-yellow": "\033[93m",
"bright-blue": "\033[94m",
"bright-magenta": "\033[95m",
"bright-cyan": "\033[96m",
"bright-white": "\033[97m",
}
_ANSI_RESET = "\033[0m"
def _ansi_label(text: str, color: str) -> str:
if not color:
return text
if not sys.stdout.isatty():
return text
term = os.environ.get("TERM", "")
if term in ("dumb", ""):
return text
code = _ANSI_COLOR_CODES.get(color)
if not code:
return text
return f"{code}{text}{_ANSI_RESET}"
def cmd_list(argv: list[str]) -> int:
parser = argparse.ArgumentParser(prog=f"{PROG} list", add_help=True)
@@ -62,11 +27,11 @@ def cmd_list(argv: list[str]) -> int:
if not active:
print("no active bot-bottle bottles", file=sys.stderr)
return 0
# One line per bottle: `<backend>\t<slug>\t<label>\t<services>`.
# Tab-separated keeps the format stable for shell pipelines.
# One line per bottle: `<backend>\t<slug>\t<agent>\t<status>`.
# Tab-separated keeps the format stable for shell pipelines;
# the dashboard renders the same data through its own
# formatter.
for b in active:
services = ",".join(b.services) if b.services else "-"
display_name = b.label if b.label else b.agent_name
colored_name = _ansi_label(display_name, b.color)
print(f"{b.backend_name}\t{b.slug}\t{colored_name}\t{services}")
print(f"{b.backend_name}\t{b.slug}\t{b.agent_name}\t{services}")
return 0
+1 -1
View File
@@ -18,7 +18,7 @@ from __future__ import annotations
import argparse
from ..backend import BottleSpec
from ..bottle_state import read_metadata
from ..backend.docker.bottle_state import read_metadata
from ..log import die
from ..manifest import Manifest
from ._common import PROG, USER_CWD
+14 -14
View File
@@ -24,12 +24,12 @@ from ..backend import (
known_backend_names,
)
from ..backend.docker.bottle_plan import DockerBottlePlan
from ..bottle_state import (
from ..backend.docker.bottle_state import (
cleanup_state,
is_preserved,
mark_preserved,
)
# from ..backend.docker.capability_apply import snapshot_transcript
from ..backend.docker.capability_apply import snapshot_transcript
from ..log import info
from ..manifest import Manifest
from ._common import PROG, USER_CWD, read_tty_line
@@ -39,7 +39,7 @@ from . import tui
def cmd_start(argv: list[str]) -> int:
parser = argparse.ArgumentParser(prog=f"{PROG} start", add_help=True)
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--cwd", action="store_true", help="copy host cwd into the running bottle")
parser.add_argument("--cwd", action="store_true", help="copy host cwd into a derived image")
parser.add_argument("--remote-control", action="store_true")
parser.add_argument(
"--backend",
@@ -47,7 +47,7 @@ def cmd_start(argv: list[str]) -> int:
default=None,
help=(
"backend to launch the bottle on (default: $BOT_BOTTLE_BACKEND "
"or host auto-selection). Overrides the env var when set."
"or 'docker'). Overrides the env var when set."
),
)
parser.add_argument(
@@ -72,16 +72,19 @@ def cmd_start(argv: list[str]) -> int:
return 0
backend_name: str | None = args.backend
label, color = tui.name_color_modal(default_label=agent_name)
if backend_name is None and "BOT_BOTTLE_BACKEND" not in os.environ:
backend_name = tui.filter_select(
list(known_backend_names()),
title="Select backend",
)
if backend_name is None:
return 0
spec = BottleSpec(
manifest=manifest,
agent_name=agent_name,
copy_cwd=args.cwd,
user_cwd=USER_CWD,
label=label,
color=color,
)
return _launch_bottle(
spec,
@@ -107,8 +110,8 @@ def prepare_with_preflight(
injected callable, prompt y/N via the injected callable.
`backend_name` selects which backend prepares the plan
(`None` `$BOT_BOTTLE_BACKEND` host auto-selection). The CLI
passes whatever `--backend` resolved to.
(`None` `$BOT_BOTTLE_BACKEND` `docker`). The CLI passes
whatever `--backend` resolved to.
Returns `(plan, identity)`. `plan` is None on dry-run or
operator-N, but `identity` is set as soon as `backend.prepare`
@@ -133,7 +136,6 @@ def prepare_with_preflight(
def attach_agent(
bottle: Bottle, *, remote_control: bool = False, resume: bool = False,
agent_provider_template: str = "claude",
startup_args: tuple[str, ...] = (),
) -> int:
"""Run the selected provider CLI inside `bottle` as an
interactive session. Blocks until the session ends; returns the
@@ -152,7 +154,6 @@ def attach_agent(
agent_args = list(runtime.bypass_args)
if remote_control:
agent_args.extend(runtime.remote_control_args)
agent_args.extend(startup_args)
if resume:
agent_args.extend(runtime.resume_args)
return bottle.exec_agent(agent_args, tty=True)
@@ -167,7 +168,7 @@ def capture_claude_session_state(identity: str, exit_code: int) -> None:
# instead of relying on each agent's transcript layout.
if not identity:
return
# snapshot_transcript(identity)
snapshot_transcript(identity)
if exit_code != 0:
mark_preserved(identity)
@@ -237,7 +238,6 @@ def _launch_bottle(
bottle,
remote_control=remote_control,
agent_provider_template=agent_provider_template,
startup_args=plan.agent_provision.startup_args,
)
info(
f"session ended (exit {exit_code}); "
+86 -28
View File
@@ -2,8 +2,11 @@
act on them (approve / modify / reject).
Curses-based TUI; modify-then-approve shells out to $EDITOR. The
approval handler wires to PRD 0016 (capability-block), which rebuilds
the bottle Dockerfile. The egress-block tool was removed in issue #198.
approval handlers wire to the per-tool remediation engines:
PRD 0014 (egress, retargeted from cred-proxy in PRD 0017
chunk 3) writes routes.yaml + SIGHUPs egress; PRD 0015
(pipelock) writes the allowlist + restarts pipelock; PRD 0016
(capability) rebuilds the bottle Dockerfile.
"""
from __future__ import annotations
@@ -20,17 +23,20 @@ from datetime import datetime, timezone
from pathlib import Path
from .. import supervise as _supervise
# from ..bottle_state import read_metadata
# from ..backend.docker.capability_apply import (
# CapabilityApplyError,
# apply_capability_change,
# )
from ..backend.docker.bottle_state import read_metadata
from ..backend.docker.capability_apply import (
CapabilityApplyError,
apply_capability_change,
)
from ..backend.docker.egress_apply import EgressApplyError, add_route
from ..backend.docker.pipelock_apply import (
PipelockApplyError,
apply_allowlist_change,
fetch_current_allowlist,
parse_allowlist_content,
render_allowlist_content,
)
from ..log import Die, error, info
class CapabilityApplyError(RuntimeError):
"""Placeholder while capability_apply is disabled."""
from ..supervise import (
COMPONENT_FOR_TOOL,
AuditEntry,
@@ -40,6 +46,8 @@ from ..supervise import (
STATUS_MODIFIED,
STATUS_REJECTED,
TOOL_CAPABILITY_BLOCK,
TOOL_EGRESS_BLOCK,
TOOL_PIPELOCK_BLOCK,
archive_proposal,
list_pending_proposals,
render_diff,
@@ -63,7 +71,7 @@ class QueuedProposal:
# Errors any remediation engine may raise. Caught by the TUI key
# handlers and surfaced in the status line so a failed apply keeps
# the proposal pending rather than crashing curses.
ApplyError = (CapabilityApplyError,)
ApplyError = (EgressApplyError, PipelockApplyError, CapabilityApplyError)
def discover_pending() -> list[QueuedProposal]:
@@ -84,7 +92,9 @@ def discover_pending() -> list[QueuedProposal]:
def _approval_status(qp: QueuedProposal, verb: str) -> str:
"""Status-line text after a successful approval."""
base = f"{verb} {qp.proposal.tool} for [{qp.proposal.bottle_slug}]"
return f"{base}; resume: ./cli.py resume {qp.proposal.bottle_slug}"
if qp.proposal.tool == TOOL_CAPABILITY_BLOCK:
return f"{base}; resume: ./cli.py resume {qp.proposal.bottle_slug}"
return base
def _detail_lines(
@@ -106,12 +116,33 @@ def _detail_lines(
out.extend((" " + line, 0) for line in p.justification.splitlines() or [""])
out.extend([
("", 0),
("proposed file:", 0),
(_proposed_payload_label(p.tool) + ":", 0),
])
out.extend((line, 0) for line in p.proposed_file.splitlines() or [""])
if p.tool == TOOL_PIPELOCK_BLOCK:
host = _failed_url_host(p.proposed_file)
if host:
out.append(("", 0))
out.append((host, green_attr))
return out
def _failed_url_host(url: str) -> str:
"""Best-effort hostname extraction from a pipelock-block proposal."""
import urllib.parse
try:
return urllib.parse.urlsplit(url.strip()).hostname or ""
except ValueError:
return ""
def _proposed_payload_label(tool: str) -> str:
if tool == TOOL_PIPELOCK_BLOCK:
return "failed URL"
return "proposed file"
def _suffix_for_tool(tool: str) -> str:
if tool == TOOL_CAPABILITY_BLOCK:
return ".dockerfile"
@@ -129,19 +160,28 @@ def approve(
) -> None:
"""Apply the proposal, write the waiting response, and audit it."""
status = STATUS_MODIFIED if final_file is not None else STATUS_APPROVED
file_to_apply = final_file if final_file is not None else qp.proposal.proposed_file
diff_before, diff_after = "", ""
# if qp.proposal.tool == TOOL_CAPABILITY_BLOCK:
# _meta = read_metadata(qp.proposal.bottle_slug)
# if _meta is not None and not _meta.compose_project:
# raise CapabilityApplyError(
# "capability-block remediation is not supported for smolmachines "
# "bottles. Reject this proposal or handle the capability change "
# "manually, then restart the bottle."
# )
# diff_before, diff_after = apply_capability_change(
# qp.proposal.bottle_slug, file_to_apply,
# )
if qp.proposal.tool == TOOL_EGRESS_BLOCK:
diff_before, diff_after = add_route(
qp.proposal.bottle_slug, file_to_apply,
)
elif qp.proposal.tool == TOOL_PIPELOCK_BLOCK:
diff_before, diff_after = _apply_pipelock_url(
qp.proposal.bottle_slug, file_to_apply,
)
elif qp.proposal.tool == TOOL_CAPABILITY_BLOCK:
_meta = read_metadata(qp.proposal.bottle_slug)
if _meta is not None and not _meta.compose_project:
raise CapabilityApplyError(
"capability-block remediation is not supported for smolmachines "
"bottles. Reject this proposal or handle the capability change "
"manually, then restart the bottle."
)
diff_before, diff_after = apply_capability_change(
qp.proposal.bottle_slug, file_to_apply,
)
response = Response(
proposal_id=qp.proposal.id,
@@ -170,6 +210,23 @@ def reject(qp: QueuedProposal, *, reason: str) -> None:
_write_audit(qp, action=STATUS_REJECTED, notes=reason, diff_before="", diff_after="")
def _apply_pipelock_url(slug: str, failed_url: str) -> tuple[str, str]:
"""Merge a pipelock-block failed URL's host into the allowlist."""
import urllib.parse
parsed = urllib.parse.urlsplit(failed_url.strip())
host = parsed.hostname or ""
if not host:
raise PipelockApplyError(
f"proposed failed_url has no extractable host: {failed_url!r}"
)
current = fetch_current_allowlist(slug)
hosts = parse_allowlist_content(current)
if host not in hosts:
hosts.append(host)
return apply_allowlist_change(slug, render_allowlist_content(hosts))
def _write_audit(
qp: QueuedProposal,
*,
@@ -178,7 +235,7 @@ def _write_audit(
diff_before: str,
diff_after: str,
) -> None:
"""Audit log for egress tool."""
"""Audit log for egress / pipelock tools."""
component = COMPONENT_FOR_TOOL.get(qp.proposal.tool)
if component is None:
return
@@ -410,7 +467,8 @@ def _render(
cursor = "> " if i == selected else " "
line = (
f"{cursor}{ts_short} "
f"[{p.bottle_slug}] {p.tool:<18} {p.id[:8]}"
f"[{p.bottle_slug}] {p.tool:<18} {p.id[:8]} "
f"{_proposed_payload_label(p.tool)}"
)
attr = curses.A_REVERSE if i == selected else curses.A_NORMAL
stdscr.addnstr(row, 0, line, w - 1, attr)
+2 -218
View File
@@ -3,7 +3,6 @@
Exposed surface:
filter_select(items, *, title="", tty_path="/dev/tty") -> str | None
name_color_modal(default_label, *, tty_path="/dev/tty") -> (str, str)
Opens /dev/tty directly so the picker works even when stdout/stdin are
redirected. Returns the selected item or None on cancel.
@@ -43,7 +42,8 @@ def filter_select(
# Use os.dup() to duplicate the fd so the original file object
# and FileIO in _run_picker each manage independent copies,
# preventing double-close errors.
fd_dup = os.dup(tty_fd.fileno())
import os as _os
fd_dup = _os.dup(tty_fd.fileno())
return _run_picker(items, title=title, tty_fd=fd_dup)
finally:
tty_fd.close()
@@ -219,219 +219,3 @@ def _addstr_safe(screen: Any, row: int, col: int, text: str, attr: int = curses.
screen.addstr(row, col, text, attr)
except curses.error:
pass
# ---------------------------------------------------------------------------
# name_color_modal — two-step label + color picker
# ---------------------------------------------------------------------------
_ANSI_COLORS = [
"red", "green", "blue", "yellow", "magenta", "cyan", "white", "black",
"bright-red", "bright-green", "bright-blue", "bright-yellow",
"bright-magenta", "bright-cyan", "bright-white", "bright-black",
]
_CURSES_COLOR_MAP: dict[str, int] = {
"black": curses.COLOR_BLACK,
"red": curses.COLOR_RED,
"green": curses.COLOR_GREEN,
"yellow": curses.COLOR_YELLOW,
"blue": curses.COLOR_BLUE,
"magenta": curses.COLOR_MAGENTA,
"cyan": curses.COLOR_CYAN,
"white": curses.COLOR_WHITE,
}
_COLOR_NONE = "(none)"
def name_color_modal(
default_label: str,
*,
tty_path: str = "/dev/tty",
) -> tuple[str, str]:
"""Present a two-step curses modal: first edit the agent label,
then optionally pick a color.
Returns ``(label, color)`` where ``color`` is one of the 16 ANSI
color name strings or ``""`` for no color. Falls back to
``(default_label, "")`` on any error (terminal too small, not a tty).
"""
try:
tty_fd = open(tty_path, "r+b", buffering=0) # pylint: disable=consider-using-with
except OSError:
return default_label, ""
try:
fd_dup = os.dup(tty_fd.fileno())
return _run_name_color(default_label, tty_fd=fd_dup)
except Exception: # noqa: BLE001 # pylint: disable=broad-exception-caught
return default_label, ""
finally:
tty_fd.close()
def _run_name_color(default_label: str, *, tty_fd: int) -> tuple[str, str]:
import io
orig_stdin = sys.__stdin__
orig_stdout = sys.__stdout__
try:
tty_text = io.TextIOWrapper(io.FileIO(tty_fd, mode="r+"), write_through=True)
sys.__stdin__ = tty_text # type: ignore[assignment]
sys.__stdout__ = tty_text # type: ignore[assignment]
os.environ.setdefault("TERM", "xterm-256color")
screen = curses.initscr()
curses.noecho()
curses.cbreak()
screen.keypad(True)
try:
label = _label_step(screen, default_label)
color = _color_step(screen, label)
finally:
screen.keypad(False)
curses.nocbreak()
curses.echo()
curses.endwin()
finally:
sys.__stdin__ = orig_stdin # type: ignore[assignment]
sys.__stdout__ = orig_stdout # type: ignore[assignment]
return label, color
def _label_step(screen: Any, default_label: str) -> str:
"""Step 1: edit the label. First printable key replaces the
pre-fill; subsequent keys append. Enter confirms."""
text = default_label
replaced = False # True once the user has typed their first char
while True:
_render_label(screen, text)
try:
key = screen.getch()
except KeyboardInterrupt:
return default_label
if key in (curses.KEY_ENTER, _KEY_ENTER_ALT, ord("\r")):
return text.strip() or default_label
if key in (curses.KEY_BACKSPACE, _KEY_BACKSPACE_WIN, 127):
if replaced:
text = text[:-1]
else:
text = ""
replaced = True
elif 32 <= key <= 126:
if not replaced:
text = chr(key)
replaced = True
else:
text += chr(key)
def _render_label(screen: Any, text: str) -> None:
screen.erase()
rows, cols = screen.getmaxyx()
sep = "" * min(cols - 1, 40)
_addstr_safe(screen, 0, 0, "Name agent", curses.A_BOLD)
_addstr_safe(screen, 1, 0, sep)
_addstr_safe(screen, 2, 0, text[:cols - 1], curses.A_REVERSE)
_addstr_safe(screen, 3, 0, sep)
if rows > 5:
_addstr_safe(screen, 5, 0, "[any key] edit [Enter] confirm", curses.A_DIM)
screen.refresh()
def _color_step(screen: Any, confirmed_label: str) -> str:
"""Step 2: pick a color from the list, or skip."""
items = [_COLOR_NONE] + _ANSI_COLORS
cursor = 0
# Initialise color pairs once; index 0 = none, 1..16 = palette.
color_attrs = _init_color_pairs()
while True:
_render_color(screen, items, cursor, confirmed_label, color_attrs)
try:
key = screen.getch()
except KeyboardInterrupt:
return ""
if key in (ord("q"), _KEY_ESC):
return ""
if key in (curses.KEY_ENTER, _KEY_ENTER_ALT, ord("\r")):
chosen = items[cursor]
return "" if chosen == _COLOR_NONE else chosen
if key in (curses.KEY_UP, ord("k")) and cursor > 0:
cursor -= 1
elif key in (curses.KEY_DOWN, ord("j")) and cursor < len(items) - 1:
cursor += 1
def _init_color_pairs() -> dict[str, int]:
"""Return {color_name: curses_attr} for the palette items."""
attrs: dict[str, int] = {_COLOR_NONE: curses.A_NORMAL}
try:
curses.start_color()
curses.use_default_colors()
pair_idx = 2 # pair 1 reserved for other uses
for name in _ANSI_COLORS:
base = name.replace("bright-", "")
fg = _CURSES_COLOR_MAP.get(base, curses.COLOR_WHITE)
try:
curses.init_pair(pair_idx, fg, -1)
attr = curses.color_pair(pair_idx)
if name.startswith("bright-"):
attr |= curses.A_BOLD
attrs[name] = attr
pair_idx += 1
except curses.error:
attrs[name] = curses.A_NORMAL
except curses.error:
for name in _ANSI_COLORS:
attrs[name] = curses.A_NORMAL
return attrs
def _render_color(
screen: Any,
items: list[str],
cursor: int,
confirmed_label: str,
color_attrs: dict[str, int],
) -> None:
screen.erase()
rows, cols = screen.getmaxyx()
sep = "" * min(cols - 1, 40)
_addstr_safe(screen, 0, 0, "Name agent", curses.A_BOLD)
_addstr_safe(screen, 1, 0, sep)
_addstr_safe(screen, 2, 0, confirmed_label[:cols - 1])
_addstr_safe(screen, 3, 0, sep)
_addstr_safe(screen, 4, 0, "Color (optional)", curses.A_BOLD)
list_start = 5
list_rows = rows - list_start - 2
scroll = max(0, cursor - list_rows + 1)
visible = items[scroll: scroll + list_rows]
for idx, name in enumerate(visible):
abs_idx = scroll + idx
row = list_start + idx
if row >= rows - 2:
break
prefix = "> " if abs_idx == cursor else " "
attr = color_attrs.get(name, curses.A_NORMAL)
if abs_idx == cursor:
attr |= curses.A_REVERSE
_addstr_safe(screen, row, 0, (prefix + name)[:cols - 1], attr)
_addstr_safe(screen, rows - 2, 0, sep)
_addstr_safe(
screen, rows - 1, 0,
"[↑↓/jk] move [Enter] select [Esc/q] skip",
curses.A_DIM,
)
screen.refresh()
+12 -125
View File
@@ -17,11 +17,9 @@ from typing import TYPE_CHECKING
from ...agent_provider import (
AgentProvider,
AgentProviderRuntime,
AgentProvisionDir,
AgentProvisionFile,
AgentProvisionPlan,
)
from ...backend.docker import util as docker_mod
from ...egress import EgressRoute
from ...log import die, info, warn
@@ -30,6 +28,8 @@ if TYPE_CHECKING:
from ...backend import Bottle, BottlePlan
_REPO_ROOT = Path(__file__).resolve().parents[3]
_SUPERVISE_MCP_NAME = "supervise"
@@ -40,75 +40,11 @@ def _skills_dir(guest_home: str) -> str:
def _prompt_path(guest_home: str) -> str:
return f"{guest_home}/.bot-bottle-prompt.txt"
_STATUS_LINE_COLORS = {
"black": "\033[30m",
"red": "\033[31m",
"green": "\033[32m",
"yellow": "\033[33m",
"blue": "\033[34m",
"magenta": "\033[35m",
"cyan": "\033[36m",
"white": "\033[37m",
"bright-black": "\033[90m",
"bright-red": "\033[91m",
"bright-green": "\033[92m",
"bright-yellow": "\033[93m",
"bright-blue": "\033[94m",
"bright-magenta": "\033[95m",
"bright-cyan": "\033[96m",
"bright-white": "\033[97m",
}
_CLAUDE_THEME_COLORS = {
"black": "black",
"red": "red",
"green": "green",
"yellow": "yellow",
"blue": "blue",
"magenta": "magenta",
"cyan": "cyan",
"white": "white",
"bright-black": "blackBright",
"bright-red": "redBright",
"bright-green": "greenBright",
"bright-yellow": "yellowBright",
"bright-blue": "blueBright",
"bright-magenta": "magentaBright",
"bright-cyan": "cyanBright",
"bright-white": "whiteBright",
}
def _status_line_script(label: str, color: str) -> str:
if not label:
return "#!/bin/sh\nprintf '\\n'\n"
label_q = shlex.quote(label)
if color and color in _STATUS_LINE_COLORS:
return (
"#!/bin/sh\n"
f"printf '%b%s%b\\n' '{_STATUS_LINE_COLORS[color]}' {label_q} '\\033[0m'\n"
)
return f"#!/bin/sh\nprintf '%s\\n' {label_q}\n"
def _custom_theme_payload(color: str) -> dict[str, object] | None:
theme_color = _CLAUDE_THEME_COLORS.get(color)
if not theme_color:
return None
return {
"name": f"Bot-bottle {color}",
"base": "dark",
"overrides": {
"claude": f"ansi:{theme_color}",
},
}
_RUNTIME = AgentProviderRuntime(
template="claude",
command="claude",
image="bot-bottle-claude:latest",
dockerfile=str(_REPO_ROOT / "Dockerfile.claude"),
prompt_mode="append_file",
bypass_args=("--dangerously-skip-permissions",),
resume_args=("--continue",),
@@ -126,103 +62,54 @@ class ClaudeAgentProvider(AgentProvider):
*,
dockerfile: str,
state_dir: Path,
instance_name: str,
prompt_file: Path,
guest_home: str,
guest_env: dict[str, str] | None = None,
auth_token: str = "",
forward_host_credentials: bool = False,
host_env: dict[str, str] | None = None,
trusted_project_path: str = "",
label: str = "",
color: str = "",
provider_settings: dict[str, object] | None = None,
) -> AgentProvisionPlan:
del forward_host_credentials, host_env, provider_settings
del forward_host_credentials, host_env # Codex-only knobs
resolved_guest_env = dict(guest_env or {})
guest_home = self.guest_home
trusted_path = trusted_project_path or guest_home
env_vars: dict[str, str] = {
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"DISABLE_ERROR_REPORTING": "1",
}
dirs = (
AgentProvisionDir(f"{guest_home}/.claude"),
AgentProvisionDir(f"{guest_home}/.claude/themes"),
)
claude_config = state_dir / "claude.json"
claude_projects = {guest_home: {"hasTrustDialogAccepted": True}}
claude_projects[trusted_path] = {"hasTrustDialogAccepted": True}
payload: dict[str, object] = {
claude_config.write_text(json.dumps({
"hasCompletedOnboarding": True,
"theme": "dark",
"bypassPermissionsModeAccepted": True,
"projects": claude_projects,
}
claude_config.write_text(json.dumps(payload, indent=2) + "\n")
}, indent=2) + "\n")
claude_config.chmod(0o600)
files = [
files = (
AgentProvisionFile(claude_config, f"{guest_home}/.claude.json"),
]
claude_settings = state_dir / "claude-settings.json"
claude_settings_payload: dict[str, object] = {}
if label or color:
statusline_script = state_dir / "claude-statusline.sh"
statusline_script.write_text(_status_line_script(label, color))
statusline_script.chmod(0o755)
files.append(AgentProvisionFile(
statusline_script,
f"{guest_home}/.claude/statusline.sh",
mode="755",
))
claude_settings_payload["statusLine"] = {
"type": "command",
"command": "~/.claude/statusline.sh",
}
theme_payload = _custom_theme_payload(color)
if theme_payload is not None:
theme_name = f"bot-bottle-{docker_mod.slugify(label or color)}"
theme_file = state_dir / f"{theme_name}.json"
theme_file.write_text(json.dumps(theme_payload, indent=2) + "\n")
theme_file.chmod(0o644)
files.append(AgentProvisionFile(
theme_file,
f"{guest_home}/.claude/themes/{theme_name}.json",
))
claude_settings_payload["theme"] = f"custom:{theme_name}"
if claude_settings_payload:
claude_settings.write_text(json.dumps(claude_settings_payload, indent=2) + "\n")
claude_settings.chmod(0o600)
files.append(AgentProvisionFile(
claude_settings,
f"{guest_home}/.claude/settings.json",
))
)
egress_routes = (EgressRoute(
host="api.anthropic.com",
auth_scheme="Bearer" if auth_token else "",
token_ref=auth_token,
tls_passthrough=True,
),)
hidden_env_names: frozenset[str] = frozenset()
if auth_token:
env_vars["CLAUDE_CODE_OAUTH_TOKEN"] = "egress-placeholder"
hidden_env_names = frozenset({"CLAUDE_CODE_OAUTH_TOKEN"})
has_prompt = prompt_file.exists() and bool(prompt_file.read_text())
return AgentProvisionPlan(
template=_RUNTIME.template,
command=_RUNTIME.command,
prompt_mode=_RUNTIME.prompt_mode,
image=_RUNTIME.image,
dockerfile=dockerfile,
guest_home=guest_home,
instance_name=instance_name,
prompt_file=prompt_file,
env_vars=env_vars,
guest_env=resolved_guest_env,
has_prompt=has_prompt,
dirs=dirs,
files=tuple(files),
files=files,
egress_routes=egress_routes,
hidden_env_names=hidden_env_names,
)
@@ -263,7 +150,7 @@ class ClaudeAgentProvider(AgentProvider):
user="root",
)
agent = plan.spec.manifest.agents[plan.spec.agent_name]
return prompt_path if plan.agent_provision.has_prompt or agent.prompt else None
return prompt_path if agent.prompt else None
def provision(self, plan: "BottlePlan", bottle: "Bottle") -> None:
"""Apply the claude-side declarative provision steps from
+8 -20
View File
@@ -18,8 +18,8 @@ from ...agent_provider import (
CODEX_HOST_CREDENTIAL_HOSTS,
AgentProvider,
AgentProviderRuntime,
AgentProvisionDir,
AgentProvisionCommand,
AgentProvisionDir,
AgentProvisionFile,
AgentProvisionPlan,
)
@@ -32,6 +32,8 @@ if TYPE_CHECKING:
from ...backend import Bottle, BottlePlan
_REPO_ROOT = Path(__file__).resolve().parents[3]
_SUPERVISE_MCP_NAME = "supervise"
@@ -46,11 +48,11 @@ def _skills_dir(guest_home: str) -> str:
def _prompt_path(guest_home: str) -> str:
return f"{guest_home}/.bot-bottle-prompt.txt"
_RUNTIME = AgentProviderRuntime(
template="codex",
command="codex",
image="bot-bottle-codex:latest",
dockerfile=str(_REPO_ROOT / "Dockerfile.codex"),
prompt_mode="read_prompt_file",
bypass_args=("--dangerously-bypass-approvals-and-sandbox",),
resume_args=("resume", "--last"),
@@ -68,20 +70,15 @@ class CodexAgentProvider(AgentProvider):
*,
dockerfile: str,
state_dir: Path,
instance_name: str,
prompt_file: Path,
guest_home: str,
guest_env: dict[str, str] | None = None,
auth_token: str = "",
forward_host_credentials: bool = False,
host_env: dict[str, str] | None = None,
trusted_project_path: str = "",
label: str = "",
color: str = "",
provider_settings: dict[str, object] | None = None,
) -> AgentProvisionPlan:
del auth_token, label, color, provider_settings
del auth_token # Claude-only knob
resolved_guest_env = dict(guest_env or {})
guest_home = self.guest_home
trusted_path = trusted_project_path or guest_home
env_vars: dict[str, str] = {
@@ -103,11 +100,6 @@ class CodexAgentProvider(AgentProvider):
config_file.write_text(
f'[projects."{toml_path}"]\n'
'trust_level = "trusted"\n'
"\n"
"[tui]\n"
'status_line = ["model-with-reasoning"]\n'
'terminal_title = ["spinner", "project"]\n'
'theme = "ansi"\n'
)
config_file.chmod(0o600)
files.append(AgentProvisionFile(config_file, config_path))
@@ -118,6 +110,7 @@ class CodexAgentProvider(AgentProvider):
host=host,
auth_scheme="Bearer" if forward_host_credentials else "",
token_ref=CODEX_HOST_CREDENTIAL_TOKEN_REF if forward_host_credentials else "",
tls_passthrough=True,
))
if forward_host_credentials:
@@ -150,19 +143,14 @@ class CodexAgentProvider(AgentProvider):
"guest, but Codex did not accept it"
)))
has_prompt = prompt_file.exists() and bool(prompt_file.read_text())
return AgentProvisionPlan(
template=_RUNTIME.template,
command=_RUNTIME.command,
prompt_mode=_RUNTIME.prompt_mode,
image=_RUNTIME.image,
dockerfile=dockerfile,
guest_home=guest_home,
instance_name=instance_name,
prompt_file=prompt_file,
env_vars=env_vars,
guest_env=resolved_guest_env,
has_prompt=has_prompt,
dirs=tuple(dirs),
files=tuple(files),
pre_copy=tuple(pre_copy),
@@ -207,7 +195,7 @@ class CodexAgentProvider(AgentProvider):
user="root",
)
agent = plan.spec.manifest.agents[plan.spec.agent_name]
return prompt_path if plan.agent_provision.has_prompt or agent.prompt else None
return prompt_path if agent.prompt else None
def provision(self, plan: "BottlePlan", bottle: "Bottle") -> None:
"""Apply the codex-side declarative provision steps from
+5 -3
View File
@@ -15,8 +15,8 @@ from datetime import datetime, timezone
from pathlib import Path
from typing import cast
from ...log import die
from ...util import expand_tilde
from bot_bottle.log import die
from bot_bottle.util import expand_tilde
def codex_auth_path(host_env: dict[str, str] | None = None) -> Path:
@@ -153,7 +153,9 @@ def _dummy_jwt_from_host(
return _dummy_jwt(now, exp_ts=exp_ts)
if not isinstance(payload, dict):
return _dummy_jwt(now, exp_ts=exp_ts)
return _encode_dummy_jwt(_redact_jwt_payload(cast(dict[str, object], payload), now=now, exp_ts=exp_ts))
return _encode_dummy_jwt(
_redact_jwt_payload(cast(dict[str, object], payload), now=now, exp_ts=exp_ts)
)
def _encode_dummy_jwt(payload: dict[str, object]) -> str:
@@ -2,13 +2,7 @@
Generates ed25519 keypairs via `ssh-keygen` and registers / deletes
them using the Gitea deploy-key HTTP API. No new Python dependencies
only stdlib `urllib.request` and `subprocess`.
Required token permissions (Gitea "Applications" "Generate Token"):
- Repository: Read & Write
Grants POST /api/v1/repos/{owner}/{repo}/keys (create deploy key)
and DELETE /api/v1/repos/{owner}/{repo}/keys/{id} (revoke deploy key).
No other scopes are needed."""
only stdlib `urllib.request` and `subprocess`."""
from __future__ import annotations
-41
View File
@@ -1,41 +0,0 @@
# bot-bottle Pi provider image.
#
# Node LTS, git/network tooling, and the Pi coding-agent CLI installed globally.
FROM node:22-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git \
ca-certificates \
curl \
fd-find \
ripgrep \
&& ln -s /usr/bin/fdfind /usr/local/bin/fd \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip python3-venv \
&& rm -rf /var/lib/apt/lists/*
RUN npm install -g --ignore-scripts --no-fund --no-audit @earendil-works/pi-coding-agent \
&& npm cache clean --force
RUN mkdir -p /home/node/.pi/agent \
/home/node/.pi/context-mode/sessions \
/tmp/pi-subagents-uid-1000 \
&& chown -R node:node /home/node/.pi /tmp \
&& chmod -R u+rwX /tmp \
&& chown root:root /tmp /var/tmp \
&& chmod 1777 /tmp /var/tmp
USER node
WORKDIR /home/node
RUN pi install npm:@harms-haus/pi-cwd \
&& pi install npm:pi-web-access \
&& pi install npm:context-mode \
&& pi install npm:pi-subagents \
&& pi install npm:pi-mcp-adapter
CMD ["pi"]
-1
View File
@@ -1 +0,0 @@
"""Pi agent provider package."""
-319
View File
@@ -1,319 +0,0 @@
"""Pi agent provider plugin (PRD 0058, contrib).
Pi uses ~/.pi/agent/models.json for custom provider/model settings.
This provider writes an Ollama-compatible default configuration and
lets bottles override the model endpoint and model ids via
agent_provider.settings.
"""
from __future__ import annotations
import json
import os
import shlex
from pathlib import Path
from typing import TYPE_CHECKING
from urllib.parse import urlparse
from ...agent_provider import (
AgentProvider,
AgentProviderRuntime,
AgentProvisionDir,
AgentProvisionFile,
AgentProvisionPlan,
)
from ...egress import EgressRoute
from ...log import die, info
if TYPE_CHECKING:
from ...backend import Bottle, BottlePlan
_DEFAULT_BASE_URL = "http://ollama:11434/v1"
_DEFAULT_MODEL = "qwen2.5-coder:7b"
_DEFAULT_PROVIDER_NAME = "ollama"
_DEFAULT_CONTEXT_WINDOW = 4096
_DEFAULT_MAX_TOKENS = 1024
def _skills_dir(guest_home: str) -> str:
return f"{guest_home}/.pi/agent/skills"
def _prompt_path(guest_home: str) -> str:
return f"{guest_home}/.bot-bottle-prompt.txt"
def _append_system_path(guest_home: str) -> str:
return f"{guest_home}/.pi/agent/APPEND_SYSTEM.md"
def _models_path(guest_home: str) -> str:
return f"{guest_home}/.pi/agent/models.json"
def _runtime_state_repair_script(guest_home: str) -> str:
home = shlex.quote(guest_home)
pi_home = shlex.quote(f"{guest_home}/.pi")
context_sessions = shlex.quote(f"{guest_home}/.pi/context-mode/sessions")
return (
f"mkdir -p {context_sessions} /tmp/pi-subagents-uid-1000 && "
f"chown node:node {home} && "
f"chown -R node:node {pi_home} /tmp && "
"chmod -R u+rwX /tmp && "
f"chmod 755 {home} && "
"chown root:root /tmp /var/tmp && "
"chmod 1777 /tmp /var/tmp"
)
def _settings_value(
settings: dict[str, object],
key: str,
default: object,
) -> object:
value = settings.get(key)
return default if value is None else value
def _settings_int(
settings: dict[str, object],
key: str,
default: int,
) -> int:
value = _settings_value(settings, key, default)
if isinstance(value, bool):
return default
if isinstance(value, (int, str)):
return int(value)
return default
def _pi_models_json(
settings: dict[str, object],
) -> tuple[dict[str, object], str, str, list[str], str]:
provider_name = str(
_settings_value(settings, "provider", _DEFAULT_PROVIDER_NAME)
)
base_url = str(_settings_value(settings, "base_url", _DEFAULT_BASE_URL))
api = str(_settings_value(settings, "api", "openai-completions"))
api_key = settings.get("api_key")
api_key_env = str(settings.get("api_key_env", ""))
models_raw = _settings_value(settings, "models", [_DEFAULT_MODEL])
models = [str(model) for model in models_raw] # type: ignore[union-attr]
supports_developer_role = bool(
_settings_value(settings, "supports_developer_role", False)
)
supports_reasoning_effort = bool(
_settings_value(settings, "supports_reasoning_effort", False)
)
max_tokens_field = str(
_settings_value(settings, "max_tokens_field", "max_tokens")
)
context_window = _settings_int(
settings, "context_window", _DEFAULT_CONTEXT_WINDOW,
)
max_tokens = _settings_int(settings, "max_tokens", _DEFAULT_MAX_TOKENS)
input_context_window = max(1, context_window - max_tokens)
provider: dict[str, object] = {
"baseUrl": base_url,
"api": api,
"compat": {
"supportsDeveloperRole": supports_developer_role,
"supportsReasoningEffort": supports_reasoning_effort,
"maxTokensField": max_tokens_field,
},
"models": [
{
"id": model,
"name": model,
"contextWindow": input_context_window,
"maxTokens": max_tokens,
}
for model in models
],
}
if api_key is not None:
provider["apiKey"] = str(api_key)
elif api_key_env:
provider["apiKey"] = "egress-placeholder"
elif provider_name == _DEFAULT_PROVIDER_NAME:
provider["apiKey"] = "ollama"
payload: dict[str, object] = {
"providers": {
provider_name: provider,
}
}
return payload, base_url, api_key_env, models, provider_name
def _route_host(base_url: str) -> str:
parsed = urlparse(base_url)
if not parsed.scheme or not parsed.hostname:
die(
"agent provider provisioning: pi settings base_url must be an "
f"absolute URL (was {base_url!r})"
)
return parsed.hostname
_RUNTIME = AgentProviderRuntime(
template="pi",
command="pi",
image="bot-bottle-pi:latest",
prompt_mode="append_system_prompt",
bypass_args=(),
resume_args=(),
remote_control_args=(),
)
class PiAgentProvider(AgentProvider):
@property
def runtime(self) -> AgentProviderRuntime:
return _RUNTIME
def provision_plan(
self,
*,
dockerfile: str,
state_dir: Path,
instance_name: str,
prompt_file: Path,
guest_env: dict[str, str] | None = None,
auth_token: str = "",
forward_host_credentials: bool = False,
host_env: dict[str, str] | None = None,
trusted_project_path: str = "",
label: str = "",
color: str = "",
provider_settings: dict[str, object] | None = None,
) -> AgentProvisionPlan:
del auth_token, forward_host_credentials, host_env, trusted_project_path
del label, color
resolved_guest_env = dict(guest_env or {})
guest_home = self.guest_home
settings = dict(provider_settings or {})
models_payload, base_url, api_key_env, models, provider_name = (
_pi_models_json(settings)
)
models_file = state_dir / "pi-models.json"
models_file.write_text(json.dumps(models_payload, indent=2) + "\n")
models_file.chmod(0o600)
has_prompt = prompt_file.exists() and bool(prompt_file.read_text())
auth_scheme = "Bearer" if api_key_env else ""
return AgentProvisionPlan(
template=_RUNTIME.template,
command=_RUNTIME.command,
prompt_mode=_RUNTIME.prompt_mode,
image=_RUNTIME.image,
dockerfile=dockerfile,
guest_home=guest_home,
instance_name=instance_name,
prompt_file=prompt_file,
guest_env=resolved_guest_env,
has_prompt=has_prompt,
startup_args=(
"--models",
",".join(f"{provider_name}/{model}" for model in models),
),
dirs=(AgentProvisionDir(f"{guest_home}/.pi/agent"),),
files=(AgentProvisionFile(models_file, _models_path(guest_home)),),
egress_routes=(EgressRoute(
host=_route_host(base_url),
auth_scheme=auth_scheme,
token_ref=api_key_env,
),),
)
def provision_skills(self, plan: "BottlePlan", bottle: "Bottle") -> None:
from ...backend.util import host_skill_dir
agent = plan.spec.manifest.agents[plan.spec.agent_name]
if not agent.skills:
return
skills_dir = _skills_dir(plan.guest_home)
bottle.exec(f"mkdir -p {skills_dir}", user="root")
for name in agent.skills:
src = host_skill_dir(name)
if not os.path.isdir(src):
die(
f"skill {name!r} disappeared from host between "
f"validation and copy at {src}."
)
dst = f"{skills_dir}/{name}"
info(f"copying skill {name} into {bottle.name}:{dst}")
bottle.exec(f"rm -rf {dst} && mkdir -p {dst}", user="root")
bottle.cp_in(f"{src}/.", f"{dst}/")
bottle.exec(f"chown -R node:node {dst}", user="root")
def provision_prompt(self, plan: "BottlePlan", bottle: "Bottle") -> str | None:
prompt_path = _prompt_path(plan.guest_home)
append_system_path = _append_system_path(plan.guest_home)
bottle.cp_in(str(plan.prompt_file), prompt_path) # type: ignore
bottle.exec(
f"mkdir -p {shlex.quote(plan.guest_home)}/.pi/agent && "
f"cp {shlex.quote(prompt_path)} {shlex.quote(append_system_path)} && "
f"chown node:node {shlex.quote(prompt_path)} "
f"{shlex.quote(append_system_path)} && "
f"chmod 600 {shlex.quote(prompt_path)} "
f"{shlex.quote(append_system_path)}",
user="root",
)
# Pi's `--append-system-prompt` takes literal text, not a file path.
# Use its documented APPEND_SYSTEM.md discovery path instead.
return None
def provision(self, plan: "BottlePlan", bottle: "Bottle") -> None:
provision = plan.agent_provision
_exec(
bottle,
_runtime_state_repair_script(plan.guest_home),
"could not prepare pi runtime state",
)
for d in provision.dirs:
path = shlex.quote(d.guest_path)
_exec(bottle, f"mkdir -p {path}", f"could not create {d.guest_path}")
_exec(
bottle,
f"chown {shlex.quote(d.owner)} {path}",
f"could not chown {d.guest_path}",
)
_exec(
bottle,
f"chmod {shlex.quote(d.mode)} {path}",
f"could not chmod {d.guest_path}",
)
for f in provision.files:
bottle.cp_in(str(f.host_path), f.guest_path)
path = shlex.quote(f.guest_path)
_exec(
bottle,
f"chown {shlex.quote(f.owner)} {path}",
f"could not chown {f.guest_path}",
)
_exec(
bottle,
f"chmod {shlex.quote(f.mode)} {path}",
f"could not chmod {f.guest_path}",
)
def provision_supervise_mcp(
self,
plan: "BottlePlan",
bottle: "Bottle",
supervise_url: str,
) -> None:
del plan, bottle, supervise_url
def _exec(bottle: "Bottle", script: str, error: str) -> None:
result = bottle.exec(script, user="root")
if result.returncode != 0:
detail = (result.stderr or result.stdout).strip()
if detail:
detail = f": {detail}"
die(f"agent provider provisioning: {error}{detail}")
-291
View File
@@ -1,291 +0,0 @@
"""DLP detectors for the egress proxy (PRD 0053).
Pure Python, no mitmproxy dependency. Each detector is a module-level
function returning `ScanResult | None`.
Ships flat into the sidecar bundle image alongside
`egress_addon_core.py` both this file and the package source use
the same try/except import shim pattern.
"""
from __future__ import annotations
import base64
import gzip
import re
import typing
import unicodedata
from urllib.parse import quote as url_quote
try:
from egress_addon_core import ScanResult # type: ignore[import-not-found]
except ImportError: # pragma: no cover - host-side path
from .egress_addon_core import ScanResult
# ---------------------------------------------------------------------------
# Snippet helpers
# ---------------------------------------------------------------------------
SNIPPET_CONTEXT = 40 # chars of surrounding text to include on each side
REDACT = "********" # fixed-width replacement for the matched sensitive value
def _snippet(text: str, start: int, end: int) -> str:
"""Return context around a match with the matched span replaced by REDACT."""
before = text[max(0, start - SNIPPET_CONTEXT):start].replace("\n", " ").replace("\r", " ")
after = text[end:end + SNIPPET_CONTEXT].replace("\n", " ").replace("\r", " ")
return f"{before}{REDACT}{after}"
# ---------------------------------------------------------------------------
# Unicode normalization (defeats confusable-char and combining-mark evasion)
# ---------------------------------------------------------------------------
def _normalize_text(text: str) -> str:
# NFKD separates base characters from combining marks and resolves
# compatibility equivalents (fullwidth ASCII, ligatures, etc.)
decomposed = unicodedata.normalize("NFKD", text)
return "".join(
ch for ch in decomposed
# Strip combining marks inserted between chars to break patterns
if unicodedata.category(ch) != "Mn"
# Strip control chars; keep common whitespace (\n \r \t)
and (unicodedata.category(ch) != "Cc" or ch in "\n\r\t")
)
# ---------------------------------------------------------------------------
# Token patterns detector
# ---------------------------------------------------------------------------
TOKEN_PATTERNS: tuple[tuple[str, re.Pattern[str]], ...] = (
("AWS access key", re.compile(r"AKIA[0-9A-Z]{16}")),
("GitHub token (classic)", re.compile(r"ghp_[A-Za-z0-9_]{36}")),
("GitHub fine-grained token", re.compile(r"github_pat_[A-Za-z0-9_]{82}")),
("Anthropic API key", re.compile(r"sk-ant-[A-Za-z0-9\-_]{93}")),
("OpenAI API key", re.compile(r"sk-[A-Za-z0-9]{48}")),
("OpenAI project API key", re.compile(r"sk-proj-[A-Za-z0-9_\-]{48,}")),
("Stripe live key", re.compile(r"sk_live_[A-Za-z0-9]{24}")),
("Generic Bearer JWT", re.compile(r"Bearer\s+[A-Za-z0-9._\-]{50,}")),
("HuggingFace token", re.compile(r"hf_[A-Za-z0-9]{34,}")),
("Databricks token", re.compile(r"dapi[A-Za-z0-9]{32}")),
("Slack token", re.compile(r"xox[baprs]-[A-Za-z0-9]+-[A-Za-z0-9]+-[A-Za-z0-9]{24,}")),
("npm token", re.compile(r"npm_[A-Za-z0-9]{36}")),
("SendGrid API key", re.compile(r"SG\.[A-Za-z0-9_\-]{22}\.[A-Za-z0-9_\-]{43}")),
("PyPI token", re.compile(r"pypi-[A-Za-z0-9_\-]{80,}")),
("HashiCorp Vault token", re.compile(r"hvs\.[A-Za-z0-9_\-]{24,}")),
)
def scan_token_patterns(text: str, *, location: str = "body") -> ScanResult | None:
normalized = _normalize_text(text)
for name, pattern in TOKEN_PATTERNS:
m = pattern.search(normalized)
if m is not None:
return ScanResult(
severity="block",
reason=f"{name} found in {location}",
location=location,
context=_snippet(text, m.start(), m.end()),
)
return None
def redact_tokens(
text: str,
*,
env: typing.Mapping[str, str] | None = None,
) -> str:
"""Replace token pattern matches and (if env given) provisioned secrets with REDACT."""
for _, pattern in TOKEN_PATTERNS:
text = pattern.sub(REDACT, text)
if env is not None:
for key, value in env.items():
if key.startswith("EGRESS_TOKEN_") and value:
for variant in _encoded_variants(value):
text = text.replace(variant, REDACT)
return text
# ---------------------------------------------------------------------------
# Known secrets detector (Phase 1b)
# ---------------------------------------------------------------------------
def _encoded_variants(secret: str) -> list[str]:
"""Return the secret plus common encoded variants for exfil detection."""
seen: set[str] = {secret}
variants: list[str] = [secret]
def _add(v: str) -> None:
if v not in seen:
seen.add(v)
variants.append(v)
secret_bytes = secret.encode("utf-8")
# Standard base64 — with and without padding
b64 = base64.b64encode(secret_bytes).decode("ascii")
_add(b64)
_add(b64.rstrip("="))
# URL-safe base64 (JWT/OAuth use -_ alphabet) — with and without padding
b64url = base64.urlsafe_b64encode(secret_bytes).decode("ascii")
_add(b64url)
_add(b64url.rstrip("="))
# URL percent-encoding
_add(url_quote(secret, safe=""))
# Hex — lowercase and uppercase
_add(secret_bytes.hex())
_add(secret_bytes.hex().upper())
# Base32 (TOTP seeds, some DNS-exfil channels)
_add(base64.b32encode(secret_bytes).decode("ascii"))
# gzip + base64 (deterministic: mtime=0); recognisable by H4sI prefix
_add(base64.b64encode(gzip.compress(secret_bytes, mtime=0)).decode("ascii"))
return variants
def scan_known_secrets(
text: str,
*,
location: str = "body",
env: typing.Mapping[str, str] | None = None,
) -> ScanResult | None:
if env is None:
return None
for key, value in env.items():
if not key.startswith("EGRESS_TOKEN_") or not value:
continue
for variant in _encoded_variants(value):
pos = text.find(variant)
if pos >= 0:
return ScanResult(
severity="block",
reason=f"provisioned secret from {key} found in {location}",
location=location,
context=_snippet(text, pos, pos + len(variant)),
)
return None
# ---------------------------------------------------------------------------
# Naive prompt injection detector (Phase 2)
# ---------------------------------------------------------------------------
DISCLOSURE_PHRASES: tuple[re.Pattern[str], ...] = (
re.compile(r"(?i)system\s+prompt"),
re.compile(r"(?i)my\s+instructions\s+are"),
re.compile(r"(?i)original\s+instructions"),
re.compile(r"(?i)secret\s+instructions"),
re.compile(r"(?i)hidden\s+rules"),
)
JAILBREAK_PHRASES: tuple[re.Pattern[str], ...] = (
re.compile(r"(?i)ignore\s+previous"),
re.compile(r"(?i)forget\s+everything"),
re.compile(r"(?i)disregard\s+(?:all\s+)?(?:previous|prior)"),
re.compile(r"(?i)pretend\s+you\s+are"),
re.compile(r"(?i)act\s+as\s+(?:if|though)"),
)
PROXIMITY_CHARS = 500
def _closest_pair(
a_matches: list[re.Match[str]],
b_matches: list[re.Match[str]],
) -> tuple[re.Match[str], re.Match[str]] | None:
"""Return the pair (a, b) with the smallest character gap, or None."""
best: tuple[re.Match[str], re.Match[str]] | None = None
best_gap: int | None = None
for a in a_matches:
for b in b_matches:
gap = max(0, max(a.start(), b.start()) - min(a.end(), b.end()))
if best_gap is None or gap < best_gap:
best_gap = gap
best = (a, b)
return best
def scan_naive_injection(text: str) -> ScanResult | None:
location = "response body"
disclosure_hits = [m for p in DISCLOSURE_PHRASES for m in p.finditer(text)]
jailbreak_hits = [m for p in JAILBREAK_PHRASES for m in p.finditer(text)]
if disclosure_hits and jailbreak_hits:
pair = _closest_pair(disclosure_hits, jailbreak_hits)
if pair is not None:
dist = max(0, max(pair[0].start(), pair[1].start()) - min(pair[0].end(), pair[1].end()))
if dist <= PROXIMITY_CHARS:
first = pair[0] if pair[0].start() <= pair[1].start() else pair[1]
return ScanResult(
severity="block",
reason=(
f"disclosure and jailbreak phrases within "
f"{dist} chars in {location}"
),
location=location,
context=_snippet(text, first.start(), first.end()),
)
if disclosure_hits:
m = disclosure_hits[0]
return ScanResult(
severity="warn",
reason=f"prompt disclosure phrase detected in {location}",
location=location,
context=_snippet(text, m.start(), m.end()),
)
if jailbreak_hits:
m = jailbreak_hits[0]
return ScanResult(
severity="warn",
reason=f"jailbreak phrase detected in {location}",
location=location,
context=_snippet(text, m.start(), m.end()),
)
return None
# ---------------------------------------------------------------------------
# CRLF injection detector
# ---------------------------------------------------------------------------
# URL-encoded CRLF is never legitimate in a request URL or header value.
_CRLF_ENCODED_RE = re.compile(r"%0[dD]%0[aA]", re.ASCII)
# Literal CRLF followed by a header-name pattern indicates header injection.
_CRLF_HEADER_INJECT_RE = re.compile(r"\r\n[A-Za-z][A-Za-z0-9\-]+\s*:", re.ASCII)
def scan_crlf_injection(text: str) -> ScanResult | None:
if _CRLF_ENCODED_RE.search(text):
return ScanResult(
severity="block",
reason="URL-encoded CRLF (%0d%0a) in outbound request",
)
if _CRLF_HEADER_INJECT_RE.search(text):
return ScanResult(
severity="block",
reason="CRLF header injection pattern in outbound request",
)
return None
__all__ = [
"REDACT",
"SNIPPET_CONTEXT",
"TOKEN_PATTERNS",
"redact_tokens",
"scan_crlf_injection",
"scan_known_secrets",
"scan_naive_injection",
"scan_token_patterns",
]
+146 -133
View File
@@ -1,10 +1,25 @@
"""Per-bottle egress proxy (PRD 0017, PRD 0053).
"""Per-bottle egress proxy (PRD 0017).
Replaces the cred-proxy sidecar (PRD 0010) with a mitmproxy-based
sidecar that becomes the agent's `HTTP_PROXY` / `HTTPS_PROXY`. It
owns three jobs:
1. MITM the agent's HTTPS with the per-bottle CA (moved from
pipelock).
2. Enforce manifest-declared `path_allowlist` per route.
3. Inject `Authorization` headers for routes that declare an
`auth` block, the same way cred-proxy does today.
This module defines the abstract proxy (`Egress`), its plan
dataclass (`EgressPlan`), and the resolved per-route shape
(`EgressRoute`). The sidecar's start/stop lifecycle is backend-
specific and lives on concrete subclasses (see
`bot_bottle/backend/docker/egress.py`).
Chunks 1+2 of the PRD: this module + the mitmproxy addon + the Docker
lifecycle are wired into the agent's `HTTP_PROXY` path; cred-proxy
has been removed. Chunk 3 retargets the cred-proxy-block remediation
flow (PRD 0014) at egress and renames the MCP tool.
"""
from __future__ import annotations
@@ -15,21 +30,27 @@ from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING
from .egress_addon_core import (
HeaderMatch as CoreHeaderMatch,
MatchEntry as CoreMatchEntry,
PathMatch as CorePathMatch,
Route,
)
from .egress_addon_core import Route
from .log import die
if TYPE_CHECKING:
from .manifest import ManifestBottle
from .manifest import Bottle
CODEX_HOST_CREDENTIAL_TOKEN_REF = "BOT_BOTTLE_CODEX_HOST_ACCESS_TOKEN"
# DNS name agents will dial for the per-bottle egress sidecar.
# Backend-agnostic by contract: every concrete backend (Docker today,
# others later) attaches this name to its sidecar on the bottle's
# internal network. The agent's `HTTP_PROXY` env var resolves to
# `http://egress:<port>` once chunk 2 cuts over.
EGRESS_HOSTNAME = "egress"
# In-container path the addon reads. Pre-created in
# `Dockerfile.sidecars` so the host bind-mount can drop the file
# directly. Content is YAML (hand-rolled by `egress_render_routes`
# in the style of `pipelock_render_yaml`, parsed by `yaml_subset`
# inside the addon).
EGRESS_ROUTES_IN_CONTAINER = "/etc/egress/routes.yaml"
@@ -37,23 +58,68 @@ EGRESS_ROUTES_IN_CONTAINER = "/etc/egress/routes.yaml"
class EgressRoute(Route):
"""Host-side extension of the addon's `Route`.
Inherits `host`, `matches`, `auth_scheme`, and `token_env`
Inherits `host`, `path_allowlist`, `auth_scheme`, and `token_env`
from `egress_addon_core.Route` those are the fields that cross the
YAML wire into the sidecar. The fields below are host-only and
YAML wire into the sidecar. The three fields below are host-only and
are never serialised to the addon.
`token_ref` is the host env var the CLI reads at launch and forwards
into the container's environ under `token_env`.
into the container's environ under `token_env`. Routes that share a
`token_ref` coalesce to one `token_env` slot.
`roles` carries the manifest route's role tuple (reserved for
future use; always empty today)."""
future use; always empty today).
`tls_passthrough` signals that pipelock must not TLS-MITM this
host either because the manifest declared `pipelock.tls_passthrough:
true` (lifted in `egress_manifest_routes`) or because a provider
route set it (e.g. egress injects its own Bearer on that host
after the agent boundary and pipelock's header DLP would block it)."""
token_ref: str = ""
roles: tuple[str, ...] = ()
tls_passthrough: bool = False
@dataclass(frozen=True)
class EgressPlan:
"""Output of Egress.prepare; consumed by .start.
The slug + routes_path + routes + token_env_map fields are
filled at prepare time (host-side, side-effect-free on docker).
The network + CA + pipelock fields are populated by the backend's
launch step via `dataclasses.replace` once those resources
exist. Empty defaults are sentinels meaning "not yet set";
`.start` validates that they are populated.
`token_env_map` is `{<token_env in container>: <token_ref on host>}`.
The backend's start step reads `os.environ[token_ref]` and
forwards the value into the egress container's environ
under `token_env`. The plan itself never holds token values
secrets never land in a dataclass that might be logged.
`mitmproxy_ca_host_path` is the host path of the per-bottle
egress CA (single PEM with cert+key concatenated) minted
by `egress_tls_init`. `.start` docker-cps it into the
sidecar at `~/.mitmproxy/mitmproxy-ca.pem` mitmproxy reads
that file at boot to mint per-host leaf certs.
`mitmproxy_ca_cert_only_host_path` is the cert-only PEM (no
key) for installing into the agent's trust store via
`provision_ca`. Separate file rather than re-parsing the
concat so secrets and trust artefacts stay on distinct paths.
`pipelock_ca_host_path` is the host path of the pipelock CA
(cert only). `.start` docker-cps it into the sidecar so the
proxy's outbound HTTPS client trusts pipelock's MITM on the
egress upstream leg.
`pipelock_proxy_url` is the URL egress sets as `HTTPS_PROXY`
in its environ so outbound HTTPS traverses pipelock keeping
pipelock's hostname allowlist + DLP body scanner on the
egress upstream leg.
"""
slug: str
routes_path: Path
routes: tuple[EgressRoute, ...]
@@ -62,46 +128,40 @@ class EgressPlan:
egress_network: str = ""
mitmproxy_ca_host_path: Path = Path()
mitmproxy_ca_cert_only_host_path: Path = Path()
log: int = 0
pipelock_ca_host_path: Path = Path()
pipelock_proxy_url: str = ""
def egress_manifest_routes(
bottle: ManifestBottle,
bottle: Bottle,
) -> tuple[EgressRoute, ...]:
"""Lift each `bottle.egress.routes[]` manifest entry into an EgressRoute.
Order is preserved. Token slots are not assigned here slot assignment
is a final step in `egress_routes_for_bottle` after provider and manifest
routes are merged."""
out: list[EgressRoute] = []
for r in bottle.egress.routes:
core_matches: list[CoreMatchEntry] = []
for m in r.Matches:
core_paths = tuple(
CorePathMatch(type=p.Type, value=p.Value)
for p in m.Paths
)
core_headers = tuple(
CoreHeaderMatch(name=h.Name, value=h.Value, type=h.Type)
for h in m.Headers
)
core_matches.append(CoreMatchEntry(
paths=core_paths,
methods=m.Methods,
headers=core_headers,
))
out.append(EgressRoute(
host=r.Host,
matches=tuple(core_matches),
path_allowlist=r.PathAllowlist,
auth_scheme=r.AuthScheme,
token_ref=r.TokenRef,
roles=r.Role,
git_fetch=r.GitFetch,
outbound_detectors=r.OutboundDetectors,
inbound_detectors=r.InboundDetectors,
tls_passthrough=r.Pipelock.TlsPassthrough,
))
return tuple(out)
def egress_routes_for_bottle(
bottle: ManifestBottle,
bottle: Bottle,
provider_routes: tuple[EgressRoute, ...] = (),
) -> tuple[EgressRoute, ...]:
"""Effective egress routes for the agent.
Provider routes own their hosts outright; manifest routes for hosts
not claimed by any provider are appended. Token slots are assigned
in a final pass over the merged list in order, so provisioned routes
get the lower slot numbers."""
manifest = egress_manifest_routes(bottle)
provisioned_hosts = {pr.host.lower() for pr in provider_routes}
merged = list(provider_routes) + [
@@ -113,6 +173,10 @@ def egress_routes_for_bottle(
def _assign_token_slots(
routes: list[EgressRoute],
) -> tuple[EgressRoute, ...]:
"""Assign EGRESS_TOKEN_N slots to authenticated routes in order.
Routes sharing a token_ref share a slot. Unauthenticated routes
(no auth_scheme / token_ref) keep token_env empty."""
slot_for_ref: dict[str, str] = {}
out: list[EgressRoute] = []
for r in routes:
@@ -130,6 +194,13 @@ def _assign_token_slots(
def egress_token_env_map(
routes: tuple[EgressRoute, ...],
) -> dict[str, str]:
"""Collapse the route list into `{token_env: token_ref}` for the
authenticated routes. Routes without `auth` contribute no entry.
Conflict detection: two routes that share a `token_env` slot but
name different `token_ref` host vars is a programming error in
`egress_routes_for_bottle`; surface it as a die rather than
silently picking one."""
out: dict[str, str] = {}
for r in routes:
if not (r.auth_scheme and r.token_ref and r.token_env):
@@ -146,94 +217,32 @@ def egress_token_env_map(
def _route_to_yaml_fields(r: Route) -> dict[str, object]:
"""Return the addon-visible fields for one route.
Single authoritative mapping between EgressRoute (host-side) and
egress_addon_core.Route (sidecar-side). When a field is added to
the addon's Route that must appear in the YAML, add it here and
in egress_addon_core._parse_one together."""
fields: dict[str, object] = {"host": r.host}
if r.auth_scheme and r.token_env:
fields["auth_scheme"] = r.auth_scheme
fields["token_env"] = r.token_env
if r.matches:
matches_data: list[dict[str, object]] = []
for entry in r.matches:
entry_data: dict[str, object] = {}
if entry.paths:
paths_data: list[dict[str, str]] = []
for pm in entry.paths:
pd: dict[str, str] = {"value": pm.value}
if pm.type != "prefix":
pd["type"] = pm.type
paths_data.append(pd)
entry_data["paths"] = paths_data
if entry.methods:
entry_data["methods"] = list(entry.methods)
if entry.headers:
headers_data: list[dict[str, str]] = []
for hm in entry.headers:
hd: dict[str, str] = {"name": hm.name, "value": hm.value}
if hm.type != "exact":
hd["type"] = hm.type
headers_data.append(hd)
entry_data["headers"] = headers_data
matches_data.append(entry_data)
fields["matches"] = matches_data
if r.git_fetch:
fields["git"] = {"fetch": True}
if r.outbound_detectors is not None or r.inbound_detectors is not None:
dlp: dict[str, object] = {}
if r.outbound_detectors is not None:
dlp["outbound_detectors"] = (
False if not r.outbound_detectors
else list(r.outbound_detectors)
)
if r.inbound_detectors is not None:
dlp["inbound_detectors"] = (
False if not r.inbound_detectors
else list(r.inbound_detectors)
)
fields["dlp"] = dlp
if r.path_allowlist:
fields["path_allowlist"] = list(r.path_allowlist)
return fields
def _render_match_entry(entry: dict[str, object]) -> list[str]:
lines: list[str] = []
first_key = True
if "paths" in entry:
lines.append(" - paths:")
first_key = False
for pd in entry["paths"]: # type: ignore[union-attr]
pd_dict: dict[str, str] = pd # type: ignore[assignment]
if "type" in pd_dict:
lines.append(f' - type: "{pd_dict["type"]}"')
lines.append(f' value: "{pd_dict["value"]}"')
else:
lines.append(f' - value: "{pd_dict["value"]}"')
if "methods" in entry:
methods_str = ", ".join(f'"{m}"' for m in entry["methods"]) # type: ignore[union-attr]
prefix = " - " if first_key else " "
lines.append(f'{prefix}methods: [{methods_str}]')
first_key = False
if "headers" in entry:
prefix = " - " if first_key else " "
lines.append(f"{prefix}headers:")
first_key = False
for hd in entry["headers"]: # type: ignore[union-attr]
hd_dict: dict[str, str] = hd # type: ignore[assignment]
lines.append(f' - name: "{hd_dict["name"]}"')
lines.append(f' value: "{hd_dict["value"]}"')
if first_key:
lines.append(" - {}")
return lines
def egress_render_routes(
routes: tuple[EgressRoute, ...],
*,
log: int = 0,
) -> str:
lines: list[str] = []
if log:
lines.append(f"log: {log}")
lines.append("routes:")
"""Serialize the route table for the addon to read.
YAML content no token values, no host env-var names. Fields are
determined by `_route_to_yaml_fields`, which is the single point of
truth for the EgressRoute egress_addon_core.Route mapping."""
lines: list[str] = ["routes:"]
if not routes:
lines[-1] = "routes: []"
lines[0] = "routes: []"
return "\n".join(lines) + "\n"
for r in routes:
f = _route_to_yaml_fields(r)
@@ -241,24 +250,10 @@ def egress_render_routes(
if "auth_scheme" in f:
lines.append(f' auth_scheme: "{f["auth_scheme"]}"')
lines.append(f' token_env: "{f["token_env"]}"')
if "matches" in f:
lines.append(" matches:")
for entry in f["matches"]: # type: ignore[union-attr]
lines.extend(_render_match_entry(entry)) # type: ignore[arg-type]
if "git" in f:
git_dict: dict[str, object] = f["git"] # type: ignore
lines.append(" git:")
if git_dict.get("fetch") is True:
lines.append(" fetch: true")
if "dlp" in f:
dlp_dict: dict[str, object] = f["dlp"] # type: ignore
lines.append(" dlp:")
for dk, dv in dlp_dict.items():
if dv is False:
lines.append(f" {dk}: false")
elif isinstance(dv, list):
items_str = ", ".join(f'"{x}"' for x in dv)
lines.append(f" {dk}: [{items_str}]")
if "path_allowlist" in f:
lines.append(" path_allowlist:")
for p in f["path_allowlist"]: # type: ignore
lines.append(f' - "{p}"')
return "\n".join(lines) + "\n"
@@ -266,6 +261,12 @@ def egress_resolve_token_values(
token_env_map: dict[str, str],
host_env: dict[str, str],
) -> dict[str, str]:
"""Read `host_env[TokenRef]` for each entry in `token_env_map` and
return `{token_env: <value>}`. Dies (with a pointer at the missing
var name) if any TokenRef is unset.
Pure function: takes the host env as an argument so tests can pass
a sealed mapping without touching `os.environ`."""
out: dict[str, str] = {}
for token_env, token_ref in token_env_map.items():
value = host_env.get(token_ref)
@@ -286,24 +287,36 @@ def egress_resolve_token_values(
class Egress(ABC):
"""The per-bottle egress proxy. Encapsulates the host-side prepare
(route lift + routes.yaml render + token-env-map derivation); the
sidecar's start/stop lifecycle is backend-specific and lives on
concrete subclasses."""
def prepare(
self,
bottle: ManifestBottle,
bottle: Bottle,
slug: str,
stage_dir: Path,
provider_routes: tuple[EgressRoute, ...] = (),
) -> EgressPlan:
"""Lift `bottle.egress.routes` + `provider_routes` into resolved
routes, render the routes file (mode 600) under `stage_dir`, and
return the plan. Pure host-side, no docker subprocess. The
token-env map records the mapping the launch step uses to
forward values from the host's environ into the sidecar's environ.
Returned plan is incomplete: the launch step must fill
`internal_network` / `egress_network` / `pipelock_proxy_url`
via `dataclasses.replace` before passing it to `.start`."""
routes = egress_routes_for_bottle(bottle, provider_routes)
log = bottle.egress.Log
routes_path = stage_dir / "egress_routes.yaml"
routes_path.write_text(egress_render_routes(routes, log=log))
routes_path.write_text(egress_render_routes(routes))
routes_path.chmod(0o600)
return EgressPlan(
slug=slug,
routes_path=routes_path,
routes=routes,
token_env_map=egress_token_env_map(routes),
log=log,
)
__all__ = [
+89 -195
View File
@@ -1,7 +1,28 @@
"""mitmproxy addon entrypoint for the egress sidecar (PRD 0017, PRD 0053).
"""mitmproxy addon entrypoint for the egress sidecar (PRD 0017).
Loaded by `mitmdump -s /app/egress_addon.py` inside the
egress container."""
egress container. Wraps the pure logic from
`egress_addon_core` with mitmproxy's HTTPFlow API:
- At startup, read `EGRESS_ROUTES` (default
`/etc/egress/routes.yaml`, JSON content) routes table.
- SIGHUP re-reads the file and atomically swaps the in-memory
table. A parse error keeps the old table in place better to
keep serving the old config than to leave the proxy with no
routes after a typo.
- On each `request`: strip the inbound Authorization header, then
consult `decide()` for forward / block / inject-auth and apply
the decision to the flow.
This file imports `mitmproxy` and is never imported on the host
mitmproxy is a container-only dependency. The host's tests target
`egress_addon_core`.
Dockerfile.sidecars copies both this file and
`egress_addon_core.py` flat into `/app/`; the absolute import
below works because mitmdump runs with `/app` on its sys.path. The
parallel file in the package source tree (bot_bottle/) is the
build input not a module the host imports."""
from __future__ import annotations
@@ -12,61 +33,62 @@ import signal
import sys
from pathlib import Path
from mitmproxy import http # type: ignore[import-not-found] # pylint: disable=import-error
from mitmproxy import http # type: ignore[import-not-found]
from egress_addon_core import ( # type: ignore[import-not-found] # pylint: disable=import-error
LOG_BLOCKS,
LOG_FULL,
Config,
build_inbound_scan_text,
build_outbound_scan_text,
# Absolute import (NOT `from .egress_addon_core`) — the
# container drops both files flat into /app/ so they are sibling
# top-level modules to mitmdump's loader, not a package.
from egress_addon_core import ( # type: ignore[import-not-found]
Route,
decide,
decide_git_fetch,
is_git_fetch_request,
is_git_push_request,
load_config,
match_route,
outbound_scan_headers,
scan_inbound,
scan_outbound,
load_routes,
)
try:
from dlp_detectors import redact_tokens # type: ignore[import-not-found]
except ImportError: # pragma: no cover - host-side path
from bot_bottle.dlp_detectors import redact_tokens # type: ignore[import-not-found]
DEFAULT_ROUTES_PATH = "/etc/egress/routes.yaml"
# Magic hostname the addon recognises as an introspection target.
# Requests through the proxy for `_egress.local/<path>` are
# intercepted and answered with synthetic responses (the addon's
# `request` hook sets `flow.response` before any upstream connection).
# The hostname is not in DNS — only clients dialing through this
# specific egress can reach it, and only via HTTP (no TLS).
# Used by the supervise sidecar's `list-egress-routes` MCP
# tool to surface the live route table to the agent.
INTROSPECT_HOST = "_egress.local"
class EgressAddon:
"""The mitmproxy addon. One instance per `mitmdump` process; the
request hook is invoked on every CONNECT-decapsulated HTTP/HTTPS
request the agent makes."""
def __init__(self) -> None:
self.routes_path = os.environ.get("EGRESS_ROUTES", DEFAULT_ROUTES_PATH)
self.config: Config = Config(routes=())
self.routes: tuple[Route, ...] = ()
self._reload(initial=True)
self._install_sighup()
def _reload(self, *, initial: bool = False) -> None:
try:
text = Path(self.routes_path).read_text(encoding="utf-8")
new_config = load_config(text)
new_routes = load_routes(text)
except (OSError, ValueError) as e:
tag = "boot" if initial else "SIGHUP"
sys.stderr.write(
f"egress: {tag} load failed: {e}\n"
)
if initial:
self.config = Config(routes=())
# No baseline to fall back on; serve nothing rather
# than masquerade as a proxy with a route table the
# operator never declared.
self.routes = ()
return
self.config = new_config
log_label = ("off", "blocks", "full")[self.config.log]
self.routes = new_routes
sys.stderr.write(
f"egress: loaded {len(self.config.routes)} route(s): "
f"{', '.join(r.host for r in self.config.routes)}"
f" [log={log_label}]\n"
f"egress: loaded {len(self.routes)} route(s): "
f"{', '.join(r.host for r in self.routes)}\n"
)
def _install_sighup(self) -> None:
@@ -80,9 +102,14 @@ class EgressAddon:
signal.signal(signal.SIGHUP, handler)
def _serve_introspection(self, flow: http.HTTPFlow, path: str) -> None:
"""Synthesize a response for `_egress.local` requests.
Currently supports `/allowlist` which returns the in-memory
route table as JSON (host, path_allowlist, auth_scheme,
token_env per route no token VALUES, those live in the
container's environ)."""
if path == "/allowlist":
payload = json.dumps(
{"routes": [dataclasses.asdict(r) for r in self.config.routes]},
{"routes": [dataclasses.asdict(r) for r in self.routes]},
indent=2,
).encode("utf-8")
flow.response = http.Response.make(
@@ -96,194 +123,61 @@ class EgressAddon:
{"Content-Type": "text/plain; charset=utf-8"},
)
def _req_ctx(self, flow: http.HTTPFlow) -> dict[str, object]:
return {
"host": redact_tokens(flow.request.pretty_host, env=os.environ),
"method": flow.request.method,
"path": redact_tokens(flow.request.path, env=os.environ),
}
def _block(
self,
flow: http.HTTPFlow,
reason: str,
ctx: dict[str, object] | None = None,
) -> None:
if self.config.log >= LOG_BLOCKS:
entry: dict[str, object] = {"event": "egress_block", "reason": reason}
if ctx:
entry.update(ctx)
sys.stderr.write(json.dumps(entry) + "\n")
flow.response = http.Response.make(
403,
reason.encode("utf-8"),
{"Content-Type": "text/plain; charset=utf-8"},
)
def _log_request(self, flow: http.HTTPFlow) -> None:
sys.stderr.write(
json.dumps({
"event": "egress_request",
"host": redact_tokens(flow.request.pretty_host, env=os.environ),
"method": flow.request.method,
"path": redact_tokens(flow.request.path, env=os.environ),
"headers": dict(flow.request.headers),
"body": flow.request.get_text(strict=False) or "",
})
+ "\n"
)
def _log_response(self, flow: http.HTTPFlow) -> None:
sys.stderr.write(
json.dumps({
"event": "egress_response",
"host": flow.request.pretty_host,
"status": flow.response.status_code,
"headers": dict(flow.response.headers),
"body": flow.response.get_text(strict=False) or "",
})
+ "\n"
)
# mitmproxy's addon API: this method name + signature is how
# mitmdump discovers the request hook.
def request(self, flow: http.HTTPFlow) -> None:
request_path, _, query = flow.request.path.partition("?")
# Introspection: requests to the magic `_egress.local`
# host are answered locally with a synthetic response. Check
# before the strip-auth + route logic — these requests aren't
# real upstream traffic, the agent isn't injecting auth, and
# the addon's own decide() would 403 the magic host (it's
# never in the routes table).
if flow.request.pretty_host == INTROSPECT_HOST:
self._serve_introspection(flow, request_path)
return
# DLP outbound scan BEFORE stripping auth — catches tokens the
# agent tried to smuggle in any header, path, query param, or body.
# Hostname is included to catch DNS-tunnelling exfiltration attempts.
route = match_route(self.config.routes, flow.request.pretty_host)
if route is not None:
body = flow.request.get_text(strict=False) or ""
scan_text = build_outbound_scan_text(
flow.request.pretty_host,
request_path,
query,
outbound_scan_headers(route, dict(flow.request.headers)),
body,
)
dlp_result = scan_outbound(route, scan_text, os.environ)
if dlp_result is not None and dlp_result.severity == "block":
ctx = self._req_ctx(flow)
if dlp_result.context:
ctx = {**ctx, "context": dlp_result.context}
self._block(flow, f"egress DLP: {dlp_result.reason}", ctx=ctx)
return
# Inbound Authorization is always stripped — the agent cannot
# smuggle a stolen token through the proxy. If the matched
# route declares an auth pair, a fresh header is injected
# below.
flow.request.headers.pop("authorization", None)
# Universal HTTPS git-push block. Defense-in-depth: git-gate
# (PRD 0008) is the only sanctioned outbound path for git
# writes — its pre-receive runs gitleaks. Letting HTTPS push
# through egress + auth injection would route around
# that scan, so we 403 before any route logic.
if is_git_push_request(request_path, query):
self._block(
flow,
"egress: git push over HTTPS is not supported; "
"use the bottle.git SSH path (gitleaks-scanned by "
"git-gate's pre-receive hook).",
ctx=self._req_ctx(flow),
flow.response = http.Response.make(
403,
(
b"egress: git push over HTTPS is not supported; "
b"use the bottle.git SSH path (gitleaks-scanned by "
b"git-gate's pre-receive hook)."
),
{"Content-Type": "text/plain; charset=utf-8"},
)
return
if is_git_fetch_request(request_path, query):
git_decision = decide_git_fetch(
self.config.routes, flow.request.pretty_host,
)
if git_decision.action == "block":
self._block(
flow,
git_decision.reason,
ctx=self._req_ctx(flow),
)
return
# Strip agent-set Authorization after DLP scan so smuggled tokens
# are caught above; the route may inject sidecar-owned auth below.
flow.request.headers.pop("authorization", None)
# Build headers mapping for match evaluation
req_headers = {k.lower(): v for k, v in flow.request.headers.items()}
decision = decide(
self.config.routes,
self.routes,
flow.request.pretty_host,
request_path,
os.environ,
request_method=flow.request.method,
request_headers=req_headers,
)
if decision.action == "block":
self._block(flow, decision.reason, ctx=self._req_ctx(flow))
flow.response = http.Response.make(
403,
decision.reason.encode("utf-8"),
{"Content-Type": "text/plain; charset=utf-8"},
)
return
if decision.inject_authorization is not None:
flow.request.headers["authorization"] = decision.inject_authorization
if self.config.log >= LOG_FULL:
self._log_request(flow)
def response(self, flow: http.HTTPFlow) -> None:
"""DLP inbound scan on response headers and body."""
route = match_route(self.config.routes, flow.request.pretty_host)
if route is None:
return
if flow.response is None:
return
if self.config.log >= LOG_FULL:
self._log_response(flow)
resp_headers = {k.lower(): v for k, v in flow.response.headers.items()}
body = flow.response.get_text(strict=False) or ""
scan_text = build_inbound_scan_text(resp_headers, body)
if not scan_text:
return
result = scan_inbound(route, scan_text)
if result is None:
return
resp_ctx: dict[str, object] = {
**self._req_ctx(flow),
"response_status": flow.response.status_code,
}
if result.context:
resp_ctx = {**resp_ctx, "context": result.context}
if result.severity == "block":
self._block(flow, f"egress DLP: {result.reason}", ctx=resp_ctx)
elif result.severity == "warn" and self.config.log >= LOG_BLOCKS:
sys.stderr.write(
json.dumps({
"event": "egress_warn",
"reason": f"egress DLP: {result.reason}",
**resp_ctx,
})
+ "\n"
)
def websocket_message(self, flow: http.HTTPFlow) -> None:
"""DLP scan on WebSocket frames.
Outbound frames (from_client) are scanned for credential leakage;
inbound frames are scanned for prompt injection. On a block the
entire connection is killed there is no HTTP response surface to
write to after the upgrade.
"""
if flow.websocket is None: # type: ignore[union-attr]
return
route = match_route(self.config.routes, flow.request.pretty_host)
if route is None:
return
message = flow.websocket.messages[-1] # type: ignore[union-attr]
content = message.content.decode("utf-8", errors="replace")
if message.from_client:
result = scan_outbound(route, content, os.environ)
if result is not None and result.severity == "block":
sys.stderr.write(f"egress DLP: {result.reason}\n")
flow.kill() # type: ignore[union-attr]
else:
result = scan_inbound(route, content)
if result is not None:
if result.severity == "block":
sys.stderr.write(f"egress DLP: {result.reason}\n")
flow.kill() # type: ignore[union-attr]
elif result.severity == "warn":
sys.stderr.write(f"egress DLP warn: {result.reason}\n")
addons = [EgressAddon()]
+119 -571
View File
@@ -1,4 +1,4 @@
"""Pure logic for the egress mitmproxy addon (PRD 0017, PRD 0053).
"""Pure logic for the egress mitmproxy addon (PRD 0017).
Split out of `egress_addon.py` so the host's unit tests can
exercise the parse + decision functions without depending on the
@@ -8,268 +8,74 @@ container.
Imports: stdlib + `yaml_subset` (which is itself stdlib-only and
ships flat into the sidecar bundle image alongside this file
see `Dockerfile.sidecars`)."""
see `Dockerfile.sidecars`).
"""
from __future__ import annotations
import re
import typing
from dataclasses import dataclass
# Absolute import — `yaml_subset.py` is copied flat into the bundle
# image's `/app/` next to this file (via `Dockerfile.sidecars`).
# The host-side unit tests run with the repo on sys.path, where the
# import resolves under the `bot_bottle` package. The try/except
# shim picks whichever import works.
try:
from yaml_subset import YamlSubsetError, parse_yaml_subset # type: ignore[import-not-found]
except ImportError: # pragma: no cover - host-side path
from .yaml_subset import YamlSubsetError, parse_yaml_subset
# ---------------------------------------------------------------------------
# Match types (Gateway API HTTPRoute vocabulary, PRD 0053)
# ---------------------------------------------------------------------------
PATH_MATCH_TYPES = ("exact", "prefix", "regex")
HEADER_MATCH_TYPES = ("exact", "regex")
VALID_METHODS = frozenset({
"GET", "HEAD", "POST", "PUT", "DELETE", "PATCH", "OPTIONS", "TRACE",
"CONNECT",
})
OUTBOUND_DETECTOR_NAMES = frozenset({"token_patterns", "known_secrets"})
INBOUND_DETECTOR_NAMES = frozenset({"naive_injection_detection"})
@dataclass(frozen=True)
class PathMatch:
type: str # "exact" | "prefix" | "regex"
value: str
compiled: re.Pattern[str] | None = None
@dataclass(frozen=True)
class HeaderMatch:
name: str
value: str
type: str = "exact" # "exact" | "regex"
compiled: re.Pattern[str] | None = None
@dataclass(frozen=True)
class MatchEntry:
paths: tuple[PathMatch, ...] = ()
methods: tuple[str, ...] = ()
headers: tuple[HeaderMatch, ...] = ()
@dataclass(frozen=True)
class Route:
"""One row of the egress route table.
`host` is the request's `Host` header (or SNI hostname) to match
against. `path_allowlist` is an optional tuple of absolute path
prefixes the request path must start with; empty tuple means no
path constraint. `auth_scheme` and `token_env` together form the
credential-injection pair (both set or both empty); a non-empty
pair tells the addon to overwrite the inbound Authorization with
`<auth_scheme> <value-of-environ[token_env]>`.
"""
host: str
matches: tuple[MatchEntry, ...] = ()
path_allowlist: tuple[str, ...] = ()
auth_scheme: str = ""
token_env: str = ""
git_fetch: bool = False
outbound_detectors: tuple[str, ...] | None = None
inbound_detectors: tuple[str, ...] | None = None
LOG_OFF = 0 # no logging
LOG_BLOCKS = 1 # log block/warn events with request context
LOG_FULL = 2 # log block/warn events + full request and response bodies
@dataclass(frozen=True)
class Config:
routes: tuple[Route, ...]
log: int = LOG_OFF
@dataclass(frozen=True)
class Decision:
"""The result of `decide()`. Either forward (with optional
`inject_authorization` header) or block (with a `reason` to surface
to the agent)."""
action: str # "forward" or "block"
reason: str = ""
inject_authorization: str | None = None
@dataclass(frozen=True)
class ScanResult:
severity: str # "block" or "warn"
reason: str
location: str = "" # where the match was found, e.g. "body", "authorization header"
context: str = "" # surrounding text with the match replaced by REDACT
# ---------------------------------------------------------------------------
# Parsing
# ---------------------------------------------------------------------------
def _parse_path_match(idx: int, j: int, raw: object) -> PathMatch:
label = f"route[{idx}] matches paths[{j}]"
if not isinstance(raw, dict):
raise ValueError(f"{label}: must be an object")
raw_dict: dict[str, object] = typing.cast(dict[str, object], raw)
ptype = raw_dict.get("type", "prefix")
if not isinstance(ptype, str) or ptype not in PATH_MATCH_TYPES:
raise ValueError(
f"{label}: 'type' must be one of {', '.join(PATH_MATCH_TYPES)} "
f"(got {ptype!r})"
)
value = raw_dict.get("value")
if not isinstance(value, str) or not value:
raise ValueError(f"{label}: 'value' must be a non-empty string")
if ptype in ("exact", "prefix") and not value.startswith("/"):
raise ValueError(
f"{label}: value {value!r} must start with '/' for "
f"type {ptype!r}"
)
compiled: re.Pattern[str] | None = None
if ptype == "regex":
try:
compiled = re.compile(value)
except re.error as e:
raise ValueError(
f"{label}: regex {value!r} failed to compile: {e}"
) from e
for k in raw_dict:
if k not in ("type", "value"):
raise ValueError(f"{label}: unknown key {k!r}")
return PathMatch(type=ptype, value=value, compiled=compiled)
def _parse_header_match(idx: int, j: int, raw: object) -> HeaderMatch:
label = f"route[{idx}] matches headers[{j}]"
if not isinstance(raw, dict):
raise ValueError(f"{label}: must be an object")
raw_dict: dict[str, object] = typing.cast(dict[str, object], raw)
name = raw_dict.get("name")
if not isinstance(name, str) or not name:
raise ValueError(f"{label}: 'name' must be a non-empty string")
value = raw_dict.get("value")
if not isinstance(value, str):
raise ValueError(f"{label}: 'value' must be a string")
htype = raw_dict.get("type", "exact")
if not isinstance(htype, str) or htype not in HEADER_MATCH_TYPES:
raise ValueError(
f"{label}: 'type' must be one of {', '.join(HEADER_MATCH_TYPES)} "
f"(got {htype!r})"
)
compiled: re.Pattern[str] | None = None
if htype == "regex":
try:
compiled = re.compile(value)
except re.error as e:
raise ValueError(
f"{label}: regex {value!r} failed to compile: {e}"
) from e
for k in raw_dict:
if k not in ("name", "value", "type"):
raise ValueError(f"{label}: unknown key {k!r}")
return HeaderMatch(name=name, value=value, type=htype, compiled=compiled)
def _parse_match_entry(idx: int, k: int, raw: object) -> MatchEntry:
label = f"route[{idx}] matches[{k}]"
if not isinstance(raw, dict):
raise ValueError(f"{label}: must be an object")
raw_dict: dict[str, object] = typing.cast(dict[str, object], raw)
paths: tuple[PathMatch, ...] = ()
paths_raw = raw_dict.get("paths")
if paths_raw is not None:
if not isinstance(paths_raw, list):
raise ValueError(f"{label}: 'paths' must be a list")
paths_list = typing.cast(list[object], paths_raw)
paths = tuple(_parse_path_match(idx, j, p) for j, p in enumerate(paths_list))
methods: tuple[str, ...] = ()
methods_raw = raw_dict.get("methods")
if methods_raw is not None:
if not isinstance(methods_raw, list):
raise ValueError(f"{label}: 'methods' must be a list")
methods_list = typing.cast(list[object], methods_raw)
normalised: list[str] = []
for j, m in enumerate(methods_list):
if not isinstance(m, str):
raise ValueError(f"{label}: methods[{j}] must be a string")
upper = m.upper()
if upper not in VALID_METHODS:
raise ValueError(
f"{label}: methods[{j}] {m!r} is not a valid HTTP method"
)
normalised.append(upper)
methods = tuple(normalised)
headers: tuple[HeaderMatch, ...] = ()
headers_raw = raw_dict.get("headers")
if headers_raw is not None:
if not isinstance(headers_raw, list):
raise ValueError(f"{label}: 'headers' must be a list")
headers_list = typing.cast(list[object], headers_raw)
headers = tuple(
_parse_header_match(idx, j, h) for j, h in enumerate(headers_list)
)
for key in raw_dict:
if key not in ("paths", "methods", "headers"):
raise ValueError(f"{label}: unknown key {key!r}")
return MatchEntry(paths=paths, methods=methods, headers=headers)
def _parse_detectors(
idx: int,
host: str,
raw_dict: dict[str, object],
) -> tuple[tuple[str, ...] | None, tuple[str, ...] | None]:
"""Parse the optional `dlp` block on a route, returning
(outbound_detectors, inbound_detectors)."""
dlp_raw = raw_dict.get("dlp")
if dlp_raw is None:
return None, None
label = f"route[{idx}] ({host})"
if not isinstance(dlp_raw, dict):
raise ValueError(f"{label}: 'dlp' must be an object")
dlp = typing.cast(dict[str, object], dlp_raw)
def _parse_detector_field(
field: str,
valid_names: frozenset[str],
) -> tuple[str, ...] | None:
val = dlp.get(field)
if val is None:
return None
if val is False:
return ()
if not isinstance(val, list):
raise ValueError(
f"{label}: dlp.{field} must be false, a list, or omitted"
)
items = typing.cast(list[object], val)
names: list[str] = []
for j, item in enumerate(items):
if not isinstance(item, str):
raise ValueError(
f"{label}: dlp.{field}[{j}] must be a string"
)
if item not in valid_names:
raise ValueError(
f"{label}: dlp.{field}[{j}] {item!r} is not a valid "
f"detector name; valid names: {', '.join(sorted(valid_names))}"
)
names.append(item)
return tuple(names)
outbound = _parse_detector_field("outbound_detectors", OUTBOUND_DETECTOR_NAMES)
inbound = _parse_detector_field("inbound_detectors", INBOUND_DETECTOR_NAMES)
for k in dlp:
if k not in ("outbound_detectors", "inbound_detectors"):
raise ValueError(
f"{label}: dlp has unknown key {k!r}; accepted keys "
f"are 'outbound_detectors', 'inbound_detectors'"
)
return outbound, inbound
def parse_routes(payload: object) -> tuple[Route, ...]:
"""Parse the routes-file payload (already JSON-decoded) into a
tuple of `Route`s. Raises `ValueError` on any malformed entry
the caller decides whether to keep the old table or refuse to
start.
Schema:
{
"routes": [
{
"host": "api.github.com",
"path_allowlist": ["/repos/x/", "/users/x"], # optional
"auth_scheme": "Bearer", # optional
"token_env": "EGRESS_TOKEN_0" # optional
},
...
]
}
"""
if not isinstance(payload, dict):
raise ValueError("routes payload: top-level must be an object")
payload_dict: dict[str, object] = typing.cast(dict[str, object], payload)
@@ -292,24 +98,32 @@ def _parse_one(idx: int, raw: object) -> Route:
if not isinstance(host, str) or not host:
raise ValueError(f"{label}: 'host' must be a non-empty string")
# matches
matches: tuple[MatchEntry, ...] = ()
matches_raw = raw_dict.get("matches")
if matches_raw is not None:
if not isinstance(matches_raw, list):
raise ValueError(f"{label} ({host}): 'matches' must be a list")
matches_list = typing.cast(list[object], matches_raw)
matches = tuple(
_parse_match_entry(idx, k, m) for k, m in enumerate(matches_list)
)
path_allow_raw: object = raw_dict.get("path_allowlist", [])
if not isinstance(path_allow_raw, list):
raise ValueError(f"{label} ({host}): 'path_allowlist' must be a list")
path_allow_list: list[object] = typing.cast(list[object], path_allow_raw)
prefixes: list[str] = []
for j, p in enumerate(path_allow_list):
if not isinstance(p, str):
raise ValueError(
f"{label} ({host}): path_allowlist[{j}] must be a string"
)
if not p.startswith("/"):
raise ValueError(
f"{label} ({host}): path_allowlist[{j}] {p!r} must be an "
f"absolute path prefix starting with '/'"
)
prefixes.append(p)
# auth (unchanged wire format)
auth_scheme: object = raw_dict.get("auth_scheme", "")
token_env: object = raw_dict.get("token_env", "")
if not isinstance(auth_scheme, str):
raise ValueError(f"{label} ({host}): 'auth_scheme' must be a string")
if not isinstance(token_env, str):
raise ValueError(f"{label} ({host}): 'token_env' must be a string")
# Both-or-neither: 'auth' on the manifest side renders to this
# pair atomically. A partial pair here means the renderer or a
# hand-edited file is broken.
if bool(auth_scheme) != bool(token_env):
raise ValueError(
f"{label} ({host}): 'auth_scheme' and 'token_env' must be both "
@@ -317,50 +131,19 @@ def _parse_one(idx: int, raw: object) -> Route:
f"token_env={token_env!r})"
)
# git-over-HTTPS policy
git_fetch = False
git_raw = raw_dict.get("git")
if git_raw is not None:
if not isinstance(git_raw, dict):
raise ValueError(f"{label} ({host}): 'git' must be an object")
git_dict: dict[str, object] = typing.cast(dict[str, object], git_raw)
fetch_raw = git_dict.get("fetch", False)
if fetch_raw is True or fetch_raw is False:
git_fetch = fetch_raw
else:
raise ValueError(f"{label} ({host}): 'git.fetch' must be a boolean")
for k in git_dict:
if k != "fetch":
raise ValueError(
f"{label} ({host}): git has unknown key {k!r}; "
"accepted key is 'fetch'"
)
# dlp detectors
outbound_detectors, inbound_detectors = _parse_detectors(
idx, host, raw_dict,
)
for k in raw_dict:
if k not in ("host", "matches", "auth_scheme", "token_env", "dlp", "git"):
raise ValueError(
f"{label} ({host}): unknown key {k!r}; accepted keys "
f"are 'host', 'matches', 'auth_scheme', 'token_env', 'dlp', 'git'"
)
return Route(
host=host,
matches=matches,
path_allowlist=tuple(prefixes),
auth_scheme=auth_scheme,
token_env=token_env,
git_fetch=git_fetch,
outbound_detectors=outbound_detectors,
inbound_detectors=inbound_detectors,
)
def load_routes(text: str) -> tuple[Route, ...]:
"""Parse YAML text → routes."""
"""Parse YAML text → routes. Raises `ValueError` for both
decode and shape errors so callers handle them uniformly.
`YamlSubsetError` from the parser is a `ValueError` subclass so
it already satisfies the same surface; we let it propagate."""
try:
payload = parse_yaml_subset(text)
except YamlSubsetError as e:
@@ -368,102 +151,29 @@ def load_routes(text: str) -> tuple[Route, ...]:
return parse_routes(payload)
def parse_config(payload: object) -> "Config":
"""Parse a full egress config payload (top-level log level + routes)."""
if not isinstance(payload, dict):
raise ValueError("routes payload: top-level must be an object")
payload_dict: dict[str, object] = typing.cast(dict[str, object], payload)
log_raw: object = payload_dict.get("log", LOG_OFF)
if log_raw is True or log_raw is False or not isinstance(log_raw, int) \
or log_raw not in (LOG_OFF, LOG_BLOCKS, LOG_FULL):
raise ValueError(
f"routes payload: 'log' must be {LOG_OFF}, {LOG_BLOCKS}, or {LOG_FULL}"
)
routes = parse_routes(payload)
return Config(routes=routes, log=log_raw)
def load_config(text: str) -> "Config":
"""Parse YAML text → Config (routes + log flag)."""
try:
payload = parse_yaml_subset(text)
except YamlSubsetError as e:
raise ValueError(f"routes payload: invalid YAML: {e}") from e
return parse_config(payload)
# ---------------------------------------------------------------------------
# Match evaluation
# ---------------------------------------------------------------------------
def _path_matches(pm: PathMatch, request_path: str) -> bool:
if pm.type == "exact":
return request_path == pm.value
if pm.type == "prefix":
if request_path == pm.value:
return True
if not pm.value.endswith("/"):
return request_path.startswith(pm.value + "/")
return request_path.startswith(pm.value)
if pm.type == "regex" and pm.compiled is not None:
return pm.compiled.search(request_path) is not None
return False
def _entry_matches(
entry: MatchEntry,
request_path: str,
request_method: str,
request_headers: typing.Mapping[str, str],
) -> bool:
"""All predicates within a MatchEntry are ANDed."""
if entry.paths:
if not any(_path_matches(pm, request_path) for pm in entry.paths):
return False
if entry.methods:
if request_method.upper() not in entry.methods:
return False
if entry.headers:
for hm in entry.headers:
header_val = request_headers.get(hm.name.lower())
if header_val is None:
return False
if hm.type == "exact":
if header_val != hm.value:
return False
elif hm.type == "regex" and hm.compiled is not None:
if not hm.compiled.search(header_val):
return False
return True
def evaluate_matches(
route: Route,
request_path: str,
request_method: str = "GET",
request_headers: typing.Mapping[str, str] | None = None,
) -> bool:
"""Return True if the request matches this route's match entries.
Empty matches tuple means all requests match (bare-pass route)."""
if not route.matches:
return True
hdrs: typing.Mapping[str, str] = request_headers or {}
return any(
_entry_matches(entry, request_path, request_method, hdrs)
for entry in route.matches
)
# ---------------------------------------------------------------------------
# Git push detection (unchanged)
# ---------------------------------------------------------------------------
def is_git_push_request(path: str, query: str) -> bool:
"""Return True if the request is a git smart-HTTP push.
git push over HTTPS hits two endpoints:
GET <repo>/info/refs?service=git-receive-pack (capabilities)
POST <repo>/git-receive-pack (the push)
Fetches use `service=git-upload-pack` / `/git-upload-pack` and
are unaffected. Egress-proxy refuses HTTPS push because git-gate's
pre-receive gitleaks scan is the gate for outbound git data;
routing push through egress would bypass that. Use the
bottle.git SSH path if you need to push.
Universal across routes the block fires even when no
egress route matches the host. A bare-pass route (host with
no auth, no path_allowlist) would otherwise let push through to
pipelock + upstream untouched.
"""
if path.endswith("/git-receive-pack"):
return True
if path.endswith("/info/refs"):
# Query string is parsed leniently — `service=git-receive-pack`
# may appear with other params in any order.
for pair in query.split("&"):
k, _, v = pair.partition("=")
if k == "service" and v == "git-receive-pack":
@@ -471,25 +181,18 @@ def is_git_push_request(path: str, query: str) -> bool:
return False
def is_git_fetch_request(path: str, query: str) -> bool:
if path.endswith("/git-upload-pack"):
return True
if path.endswith("/info/refs"):
for pair in query.split("&"):
k, _, v = pair.partition("=")
if k == "service" and v == "git-upload-pack":
return True
return False
# ---------------------------------------------------------------------------
# Route lookup + decision
# ---------------------------------------------------------------------------
def match_route(
routes: typing.Sequence[Route],
request_host: str,
) -> Route | None:
"""Return the first route whose `host` matches `request_host`
exactly (case-insensitive). DNS names are case-insensitive.
Wildcard hosts (`*.foo.com`) are NOT supported they caused
too many edge cases (apex match? cert validation? pipelock
mirror mismatch?) for too little payoff. Operators that need
multiple subdomains declare them individually (or one common
parent host as a bare-pass route)."""
target = request_host.lower()
for r in routes:
if r.host.lower() == target:
@@ -502,10 +205,24 @@ def decide(
request_host: str,
request_path: str,
environ: typing.Mapping[str, str],
*,
request_method: str = "GET",
request_headers: typing.Mapping[str, str] | None = None,
) -> Decision:
"""Pure decision: given a route table + request host + path + env,
return what the addon should do with the request.
- No matching route BLOCK. The route table is the bottle's
egress allowlist; defense-in-depth complements pipelock's
hostname gate on the downstream leg. A bottle that wants a
host reachable from the agent must declare a route for it
(bare-pass route no `auth`, no `path_allowlist` is fine
for hosts that just need passthrough).
- Matching route with `path_allowlist` set, request path doesn't
start with any of the allowed prefixes block with a clear
reason.
- Matching route with an auth pair forward + inject
Authorization. Token comes from `environ[route.token_env]`;
missing/empty values block (route declared auth but the secret
isn't here — operator misconfig).
"""
route = match_route(routes, request_host)
if route is None:
return Decision(
@@ -517,15 +234,15 @@ def decide(
),
)
if not evaluate_matches(route, request_path, request_method, request_headers):
return Decision(
action="block",
reason=(
f"egress: request {request_method} {request_path!r} "
f"does not match any entry in matches for "
f"{route.host!r}"
),
)
if route.path_allowlist:
if not any(request_path.startswith(p) for p in route.path_allowlist):
return Decision(
action="block",
reason=(
f"egress: path {request_path!r} not in "
f"path_allowlist for {route.host!r}"
),
)
if route.auth_scheme and route.token_env:
token = environ.get(route.token_env, "")
@@ -545,181 +262,12 @@ def decide(
return Decision(action="forward")
def decide_git_fetch(
routes: typing.Sequence[Route],
request_host: str,
) -> Decision:
route = match_route(routes, request_host)
if route is not None and route.git_fetch:
return Decision(action="forward")
return Decision(
action="block",
reason=(
"egress: git fetch/clone over HTTPS is not allowed by default; "
"use git-gate for declared repos or set "
"egress.routes[].git.fetch=true for explicit read-only "
"HTTPS Git access."
),
)
# ---------------------------------------------------------------------------
# DLP scan dispatch (PRD 0053)
# ---------------------------------------------------------------------------
def build_outbound_scan_text(
host: str,
path: str,
query: str,
headers: typing.Mapping[str, str],
body: str,
) -> str:
"""Assemble all outbound request surfaces into one string for DLP scanning.
Covers hostname (DNS tunnelling), path, query params, all headers, body.
"""
parts: list[str] = [host, path]
if query:
parts.append(query)
for name, value in headers.items():
parts.append(f"{name}: {value}")
if body:
parts.append(body)
return "\n".join(parts)
def outbound_scan_headers(
route: Route,
headers: typing.Mapping[str, str],
) -> dict[str, str]:
"""Return request headers that should be included in outbound DLP.
Routes that inject sidecar-owned auth always strip the agent's
Authorization header before forwarding. Scanning that header first
creates false positives for provider clients that insist on sending
their own bearer-shaped placeholder, while still not changing what
reaches the upstream.
"""
out: dict[str, str] = {}
skip_auth = bool(route.auth_scheme and route.token_env)
for name, value in headers.items():
if skip_auth and name.lower() == "authorization":
continue
out[name] = value
return out
def build_inbound_scan_text(
headers: typing.Mapping[str, str],
body: str,
) -> str:
"""Assemble inbound response surfaces into one string for DLP scanning.
Covers all response headers plus body.
"""
parts: list[str] = []
for name, value in headers.items():
parts.append(f"{name}: {value}")
if body:
parts.append(body)
return "\n".join(parts)
def _detector_enabled(
configured: tuple[str, ...] | None,
name: str,
) -> bool:
"""Check if a named detector is enabled for a route direction.
None means all enabled; empty tuple means all disabled."""
if configured is None:
return True
return name in configured
def scan_outbound(
route: Route,
body: str | bytes,
environ: typing.Mapping[str, str],
) -> ScanResult | None:
# Lazy import to avoid circular deps and keep dlp_detectors optional
# at import time (the sidecar copies it flat alongside this file).
try:
from dlp_detectors import ( # type: ignore[import-not-found]
scan_crlf_injection,
scan_known_secrets,
scan_token_patterns,
)
except ImportError: # pragma: no cover - host-side path
from .dlp_detectors import ( # type: ignore[import-not-found]
scan_crlf_injection,
scan_known_secrets,
scan_token_patterns,
)
text = body if isinstance(body, str) else body.decode("utf-8", errors="replace")
# CRLF injection is never legitimate — runs unconditionally, not gated
# by outbound_detectors config.
result = scan_crlf_injection(text)
if result is not None:
return result
if _detector_enabled(route.outbound_detectors, "token_patterns"):
result = scan_token_patterns(text, location="body")
if result is not None:
return result
if _detector_enabled(route.outbound_detectors, "known_secrets"):
result = scan_known_secrets(text, location="body", env=environ)
if result is not None:
return result
return None
def scan_inbound(
route: Route,
body: str | bytes,
) -> ScanResult | None:
try:
from dlp_detectors import scan_naive_injection # type: ignore[import-not-found]
except ImportError: # pragma: no cover - host-side path
from .dlp_detectors import scan_naive_injection # type: ignore[import-not-found]
text = body if isinstance(body, str) else body.decode("utf-8", errors="replace")
if _detector_enabled(route.inbound_detectors, "naive_injection_detection"):
result = scan_naive_injection(text)
if result is not None:
return result
return None
__all__ = [
"LOG_BLOCKS",
"LOG_FULL",
"LOG_OFF",
"Config",
"Decision",
"HeaderMatch",
"MatchEntry",
"PathMatch",
"Route",
"ScanResult",
"build_inbound_scan_text",
"build_outbound_scan_text",
"decide",
"decide_git_fetch",
"evaluate_matches",
"is_git_push_request",
"is_git_fetch_request",
"load_config",
"load_routes",
"match_route",
"outbound_scan_headers",
"parse_config",
"parse_routes",
"scan_inbound",
"scan_outbound",
]
+18 -11
View File
@@ -6,15 +6,15 @@
# call it as a normal child. Behavior is unchanged:
#
# * Upstream proxy: when EGRESS_UPSTREAM_PROXY is set, switch
# to `--mode upstream:URL` to chain through an upstream proxy.
# mitmproxy does NOT honor HTTPS_PROXY on its outbound side,
# so the upstream wiring has to be the mitmproxy mode flag,
# not env.
# to `--mode upstream:URL` to forward all post-MITM traffic
# through pipelock. mitmproxy does NOT honor HTTPS_PROXY on
# its outbound side, so the upstream wiring has to be the
# mitmproxy mode flag, not env.
# * Upstream trust: when EGRESS_UPSTREAM_CA is set, build a
# combined trust bundle (system roots + upstream CA) and point
# combined trust bundle (system roots + pipelock CA) and point
# mitmproxy at it. The option REPLACES mitmproxy's default
# trust store, so passing the upstream CA alone would break
# non-chained hosts.
# trust store, so passing pipelock's CA alone would break
# route-configured pipelock passthrough hosts.
# * `-s /app/egress_addon.py` loads the addon that reads
# /etc/egress/routes.yaml.
@@ -38,7 +38,11 @@ fi
# Bind address. Docker backend wants `0.0.0.0` (agent dials egress
# directly via the docker network alias). Smolmachines backend
# uses EGRESS_LISTEN_HOST when a non-default binding is needed.
# wants `127.0.0.1` because the agent dials pipelock — not egress
# — and egress is pipelock's localhost-only upstream inside the
# bundle. TSI's IP-only allowlist would otherwise let the agent
# reach `<bundle-ip>:9099` and bypass pipelock's DLP; binding
# 127.0.0.1 inside the bundle closes that gap (PRD 0023 chunk 3).
LISTEN_HOST_FLAG=""
if [ -n "$EGRESS_LISTEN_HOST" ]; then
LISTEN_HOST_FLAG="--listen-host $EGRESS_LISTEN_HOST"
@@ -52,10 +56,13 @@ if [ -n "$EGRESS_UPSTREAM_CA" ] && [ -f "$EGRESS_UPSTREAM_CA" ]; then
fi
# Scope the proxy env to this process tree only. In the bundle
# image (PRD 0024) multiple daemons share one container — setting
# image (PRD 0024) the four daemons share one container — setting
# HTTPS_PROXY at the container level would route git-gate's git
# pushes through an upstream proxy unintentionally. Setting them
# here means only mitmdump's subprocess inherits them.
# pushes through pipelock, which is wrong (pipelock doesn't proxy
# SSH and would block public git repos). Setting them here means
# only mitmdump's subprocess inherits them. In the legacy
# four-sidecar setup these env vars are also set in compose; here
# they're additionally defensive.
if [ -n "$EGRESS_UPSTREAM_PROXY" ]; then
export HTTPS_PROXY="$EGRESS_UPSTREAM_PROXY"
export HTTP_PROXY="$EGRESS_UPSTREAM_PROXY"
+26 -50
View File
@@ -15,9 +15,9 @@ a bare repo on the gate; `git daemon` serves the bare repos over
The agent never sees the upstream credential under either path.
Why a separate sidecar (not folded into egress or ssh-gate): the
Why a third sidecar (not folded into pipelock or ssh-gate): the
gate is the only one of the three that holds upstream push
credentials. Mixing it with egress would put push creds in the
credentials. Mixing it with pipelock would put push creds in the
same blast radius as internet-facing TLS interception; mixing it
with ssh-gate would force ssh-gate above L4 and into git-protocol
land. See `docs/prds/0008-git-gate.md`.
@@ -37,7 +37,7 @@ from dataclasses import dataclass
from pathlib import Path
from .log import info
from .manifest import ManifestBottle, ManifestGitEntry
from .manifest import Bottle, GitEntry
# Short network alias for git-gate inside the sidecar bundle. The
@@ -96,9 +96,9 @@ class GitGatePlan:
egress_network: str = ""
def git_gate_upstreams_for_bottle(bottle: ManifestBottle) -> tuple[GitGateUpstream, ...]:
def git_gate_upstreams_for_bottle(bottle: Bottle) -> tuple[GitGateUpstream, ...]:
"""Lift each `bottle.git` entry into a GitGateUpstream. Unique-Name
validation already ran in `manifest.ManifestBottle.from_dict`."""
validation already ran in `manifest.Bottle.from_dict`."""
return tuple(
GitGateUpstream(
name=e.Name,
@@ -113,7 +113,7 @@ def git_gate_upstreams_for_bottle(bottle: ManifestBottle) -> tuple[GitGateUpstre
def git_gate_render_gitconfig(
entries: tuple[ManifestGitEntry, ...], gate_host: str, *, scheme: str = "git",
entries: tuple[GitEntry, ...], gate_host: str, *, scheme: str = "git",
) -> str:
"""Render the agent's ~/.gitconfig content for git-gate
`insteadOf` rewrites. Pure host-side, no docker / smolvm;
@@ -204,7 +204,6 @@ def git_gate_render_entrypoint(upstreams: tuple[GitGateUpstream, ...]) -> str:
" git -C \"$repo\" config git-gate.identityFile \"$keyfile\"",
" git -C \"$repo\" config git-gate.knownHosts \"$hostsfile\"",
" git -C \"$repo\" config receive.denyCurrentBranch ignore",
" git -C \"$repo\" config receive.advertisePushOptions true",
" git -C \"$repo\" config http.receivepack true",
" install -m 755 /etc/git-gate/pre-receive \"$repo/hooks/pre-receive\"",
"}",
@@ -281,32 +280,15 @@ if [ ! -f "$hostsfile" ]; then
fi
ssh_cmd="ssh -i $keyfile -o UserKnownHostsFile=$hostsfile -o StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o BatchMode=yes -o ConnectTimeout=10"
push_option_count=${GIT_PUSH_OPTION_COUNT:-0}
case "$push_option_count" in
''|*[!0-9]*)
echo "git-gate: invalid GIT_PUSH_OPTION_COUNT=$push_option_count" >&2
exit 1
;;
esac
set --
i=0
while [ "$i" -lt "$push_option_count" ]; do
opt=$(printenv "GIT_PUSH_OPTION_$i" || :)
set -- "$@" --push-option="$opt"
i=$((i + 1))
done
while IFS=' ' read -r old new ref; do
[ -z "$ref" ] && continue
if [ "$new" = "$zero" ]; then
refspec=":$ref"
elif [ "$old" != "$zero" ] && ! git merge-base --is-ancestor "$old" "$new" 2>/dev/null; then
refspec="+$new:$ref"
else
refspec="$new:$ref"
fi
echo "git-gate: forwarding $ref to origin" >&2
if ! GIT_SSH_COMMAND="$ssh_cmd" git push "$@" origin "$refspec" 1>&2; then
if ! GIT_SSH_COMMAND="$ssh_cmd" git push origin "$refspec" 1>&2; then
echo "git-gate: upstream push failed for $ref" >&2
exit 1
fi
@@ -379,7 +361,7 @@ exit 0
def _provision_dynamic_key(
entry: ManifestGitEntry,
entry: GitEntry,
slug: str,
stage_dir: Path,
) -> str:
@@ -389,12 +371,13 @@ def _provision_dynamic_key(
Returns the host-side path to the private key file so the caller
can inject it into the GitGateUpstream as `identity_file`."""
from .deploy_key_provisioner import get_provisioner
pk = entry.Key
token = os.environ.get(pk.forge_token_env)
pk = entry.ProvisionedKey
assert pk is not None
token = os.environ.get(pk.token_env)
if token is None:
raise RuntimeError(
f"git-gate.repos[{entry.Name!r}] key.forge_token_env"
f" = {pk.forge_token_env!r}: env var is not set"
f"git-gate.repos[{entry.Name!r}] provisioned_key.token_env"
f" = {pk.token_env!r}: env var is not set"
)
api_url = pk.api_url or f"https://{entry.UpstreamHost}"
provisioner = get_provisioner(pk.provider, token, api_url)
@@ -419,7 +402,7 @@ def _provision_dynamic_key(
return str(key_file)
def revoke_git_gate_provisioned_keys(bottle: ManifestBottle, stage_dir: Path) -> None:
def revoke_git_gate_provisioned_keys(bottle: Bottle, stage_dir: Path) -> None:
"""Revoke all deploy keys provisioned for `bottle` during prepare.
Called at teardown after containers stop. Raises if any revocation
@@ -427,18 +410,18 @@ def revoke_git_gate_provisioned_keys(bottle: ManifestBottle, stage_dir: Path) ->
address manually."""
from .deploy_key_provisioner import get_provisioner
for entry in bottle.git:
if entry.Key.provider != "gitea":
if entry.ProvisionedKey is None:
continue
pk = entry.Key
pk = entry.ProvisionedKey
id_file = stage_dir / f"{entry.Name}-deploy-key-id"
if not id_file.exists():
continue
key_id = id_file.read_text().strip()
token = os.environ.get(pk.forge_token_env)
token = os.environ.get(pk.token_env)
if token is None:
raise RuntimeError(
f"git-gate.repos[{entry.Name!r}] key.forge_token_env"
f" = {pk.forge_token_env!r}: env var is not set;"
f"git-gate.repos[{entry.Name!r}] provisioned_key.token_env"
f" = {pk.token_env!r}: env var is not set;"
f" cannot revoke deploy key {key_id}"
)
api_url = pk.api_url or f"https://{entry.UpstreamHost}"
@@ -451,26 +434,18 @@ def revoke_git_gate_provisioned_keys(bottle: ManifestBottle, stage_dir: Path) ->
info(f"revoked deploy key {key_id} for git-gate.repos[{entry.Name!r}]")
def _resolve_identity_file(entry: ManifestGitEntry, slug: str, stage_dir: Path) -> str:
"""Return the host-side SSH identity file path for this entry.
For gitea entries, provisions a fresh deploy key first."""
if entry.Key.provider == "gitea":
return _provision_dynamic_key(entry, slug, stage_dir)
return entry.IdentityFile
class GitGate(ABC):
"""The per-agent git-gate. Encapsulates the host-side prepare
(upstream lift + entrypoint/hook render); the sidecar's
start/stop lifecycle is backend-specific and lives on concrete
subclasses."""
def prepare(self, bottle: ManifestBottle, slug: str, stage_dir: Path) -> GitGatePlan:
def prepare(self, bottle: Bottle, slug: str, stage_dir: Path) -> GitGatePlan:
"""Compute the upstream table from `bottle.git` and write the
entrypoint, pre-receive hook, and access-hook scripts (mode
600) under `stage_dir`. Pure host-side, no docker subprocess.
For `gitea` key entries, also generates and registers
For `provisioned_key` entries, also generates and registers
a fresh deploy key via the forge API and writes the private key
+ key ID to `stage_dir`.
@@ -479,10 +454,11 @@ class GitGate(ABC):
before passing the plan to `.start`."""
upstreams_list = list(git_gate_upstreams_for_bottle(bottle))
for i, entry in enumerate(bottle.git):
upstreams_list[i] = dataclasses.replace(
upstreams_list[i],
identity_file=_resolve_identity_file(entry, slug, stage_dir),
)
if entry.ProvisionedKey is not None:
key_file = _provision_dynamic_key(entry, slug, stage_dir)
upstreams_list[i] = dataclasses.replace(
upstreams_list[i], identity_file=key_file
)
upstreams = tuple(upstreams_list)
entrypoint = stage_dir / "git_gate_entrypoint.sh"
entrypoint.write_text(git_gate_render_entrypoint(upstreams))
+2 -2
View File
@@ -19,8 +19,8 @@ from urllib.parse import urlsplit
DEFAULT_PORT = 9420
# Bound memory use while still allowing ordinary git push packfiles.
MAX_BODY_BYTES = 100 * 1024 * 1024
# Body-size cap matching supervise_server.py's 1 MiB limit.
MAX_BODY_BYTES = 1 * 1024 * 1024
class GitHttpHandler(BaseHTTPRequestHandler):
+40 -37
View File
@@ -18,7 +18,8 @@ Bottle schema (frontmatter):
user: { name: <str>, email: <str> } # optional
repos: { <name>: <git-gate-entry>, ... } # optional
egress: { routes: [ <egress-route>, ... ] }
# route keys: host, matches, auth, role, dlp
# route keys: host, path_allowlist, auth, role, pipelock
# pipelock: { tls_passthrough: <bool>, ssrf_ip_allowlist: [<cidr>, ...] }
supervise: <bool> # optional
Agent schema (frontmatter):
@@ -50,27 +51,28 @@ from pathlib import Path
from typing import Mapping
from .manifest_util import ManifestError, as_json_object
from .manifest_agent import ManifestAgent, ManifestAgentProvider
from .manifest_agent import Agent, AgentProvider
from .manifest_egress import (
EGRESS_AUTH_SCHEMES,
ManifestEgressConfig,
ManifestEgressRoute,
EgressConfig,
EgressRoute,
PipelockRoutePolicy,
)
from .manifest_git import ManifestGitEntry, ManifestGitUser, ManifestKeyConfig, parse_git_gate_config
from .manifest_git import GitEntry, GitUser, parse_git_gate_config
from .manifest_schema import BOTTLE_KEYS
# Re-export everything that callers currently import from this module.
__all__ = [
"ManifestError",
"ManifestGitEntry",
"ManifestGitUser",
"ManifestKeyConfig",
"ManifestAgentProvider",
"GitEntry",
"GitUser",
"AgentProvider",
"EGRESS_AUTH_SCHEMES",
"ManifestEgressRoute",
"ManifestEgressConfig",
"ManifestAgent",
"ManifestBottle",
"PipelockRoutePolicy",
"EgressRoute",
"EgressConfig",
"Agent",
"Bottle",
"Manifest",
]
@@ -87,26 +89,27 @@ def _section_dict(value: object, label: str) -> dict[str, object]:
@dataclass(frozen=True)
class ManifestBottle:
class Bottle:
env: Mapping[str, str] = field(default_factory=_empty_str_dict)
agent_provider: ManifestAgentProvider = field(default_factory=ManifestAgentProvider)
git: tuple[ManifestGitEntry, ...] = ()
agent_provider: AgentProvider = field(default_factory=AgentProvider)
git: tuple[GitEntry, ...] = ()
# Per-bottle git identity (issue #86). Empty default — bottles
# that don't set `git-gate.user:` in the manifest skip the
# `git config --global` step entirely. A bottle can declare a user
# identity without any git-gate.repos upstreams, and vice versa.
git_user: ManifestGitUser = field(default_factory=ManifestGitUser)
egress: ManifestEgressConfig = field(default_factory=ManifestEgressConfig)
git_user: GitUser = field(default_factory=GitUser)
egress: EgressConfig = field(default_factory=EgressConfig)
# Opt-in per-bottle stuck-recovery sidecar (PRD 0013). When true,
# the launch step brings up a supervise sidecar that exposes MCP
# tools to the agent (egress-block, capability-block) plus mounts
# the current-config dir read-only into the agent at
# /etc/bot-bottle/current-config. False (the default) skips the
# sidecar and mount.
# the launch step brings up a supervise sidecar that exposes three
# MCP tools to the agent (cred-proxy-block, pipelock-block,
# capability-block; the cred-proxy-block tool is renamed and
# retargeted at egress in PRD 0017 chunk 3) plus mounts the
# current-config dir read-only into the agent at /etc/bot-bottle/
# current-config. False (the default) skips the sidecar and mount.
supervise: bool = False
@classmethod
def from_dict(cls, name: str, raw: object) -> "ManifestBottle":
def from_dict(cls, name: str, raw: object) -> "Bottle":
d = as_json_object(raw, f"bottle '{name}'")
if "runtime" in d:
@@ -158,22 +161,22 @@ class ManifestBottle:
)
env[var] = value
git: tuple[ManifestGitEntry, ...] = ()
git_user = ManifestGitUser()
git: tuple[GitEntry, ...] = ()
git_user = GitUser()
git_raw = d.get("git-gate")
if git_raw is not None:
git, git_user = parse_git_gate_config(name, git_raw)
agent_provider = (
ManifestAgentProvider.from_dict(name, d["agent_provider"])
AgentProvider.from_dict(name, d["agent_provider"])
if "agent_provider" in d
else ManifestAgentProvider()
else AgentProvider()
)
egress = (
ManifestEgressConfig.from_dict(name, d["egress"])
EgressConfig.from_dict(name, d["egress"])
if "egress" in d
else ManifestEgressConfig()
else EgressConfig()
)
supervise_raw = d.get("supervise", False)
@@ -191,8 +194,8 @@ class ManifestBottle:
@dataclass(frozen=True)
class Manifest:
bottles: Mapping[str, ManifestBottle]
agents: Mapping[str, ManifestAgent]
bottles: Mapping[str, Bottle]
agents: Mapping[str, Agent]
@classmethod
def resolve(cls, cwd: str, *, missing_ok: bool = False) -> "Manifest":
@@ -306,8 +309,8 @@ class Manifest:
bottles = resolve_bottles(raw_bottles)
bottle_names = set(bottles.keys())
agents: dict[str, ManifestAgent] = {
n: ManifestAgent.from_dict(n, a, bottle_names) for n, a in raw_agents.items()
agents: dict[str, Agent] = {
n: Agent.from_dict(n, a, bottle_names) for n, a in raw_agents.items()
}
return cls(bottles=bottles, agents=agents)
@@ -339,7 +342,7 @@ class Manifest:
)
raise ManifestError(f"bottle '{name}' not defined in bot-bottle.json (no bottles defined).")
def _effective_git_user(self, agent_name: str) -> ManifestGitUser:
def _effective_git_user(self, agent_name: str) -> GitUser:
"""Merge the agent's git.user over the referenced bottle's,
per-field, agent-wins-on-non-empty (issue #94). Same overlay
the `extends:` resolver applies between bottles
@@ -349,12 +352,12 @@ class Manifest:
over = agent.git_user
if over.is_empty():
return base
return ManifestGitUser(
return GitUser(
name=over.name or base.name,
email=over.email or base.email,
)
def bottle_for(self, agent_name: str) -> ManifestBottle:
def bottle_for(self, agent_name: str) -> Bottle:
"""Resolve the Bottle the named agent references, with the
agent's git.user overlaid on top. The validator guarantees both
lookups succeed for a manifest built via from_json_obj.
+17 -118
View File
@@ -2,17 +2,17 @@
from __future__ import annotations
from dataclasses import dataclass, field
from dataclasses import dataclass
from typing import cast
from .agent_provider import PROVIDER_TEMPLATES
from .manifest_util import ManifestError, as_json_object
from .manifest_git import ManifestGitUser
from .manifest_git import GitUser
from .manifest_schema import AGENT_MODEL_KEYS
@dataclass(frozen=True)
class ManifestAgentProvider:
class AgentProvider:
"""Provider/template for the agent process inside a bottle.
`template` selects a built-in launch/runtime contract. `dockerfile`
@@ -33,23 +33,15 @@ class ManifestAgentProvider:
dockerfile: str = ""
auth_token: str = ""
forward_host_credentials: bool = False
settings: dict[str, object] = field(default_factory=dict)
@classmethod
def from_dict(cls, bottle_name: str, raw: object) -> "ManifestAgentProvider":
def from_dict(cls, bottle_name: str, raw: object) -> "AgentProvider":
d = as_json_object(raw, f"bottle '{bottle_name}' agent_provider")
for k in d:
if k not in {
"template",
"dockerfile",
"auth_token",
"forward_host_credentials",
"settings",
}:
if k not in {"template", "dockerfile", "auth_token", "forward_host_credentials"}:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider has unknown key {k!r}; "
"allowed: template, dockerfile, auth_token, "
"forward_host_credentials, settings"
f"allowed: template, dockerfile, auth_token, forward_host_credentials"
)
template = d.get("template", "claude")
if not isinstance(template, str) or not template:
@@ -57,6 +49,11 @@ class ManifestAgentProvider:
f"bottle '{bottle_name}' agent_provider.template must be a "
f"non-empty string"
)
if template not in PROVIDER_TEMPLATES:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.template {template!r} "
f"is not one of {', '.join(sorted(PROVIDER_TEMPLATES))}"
)
dockerfile = d.get("dockerfile", "")
if not isinstance(dockerfile, str):
raise ManifestError(
@@ -69,12 +66,6 @@ class ManifestAgentProvider:
f"bottle '{bottle_name}' agent_provider.auth_token must be a "
f"string (was {type(auth_token).__name__})"
)
if auth_token and template not in PROVIDER_TEMPLATES:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.auth_token is only "
f"supported for built-in templates "
f"({', '.join(sorted(PROVIDER_TEMPLATES))})"
)
if auth_token and template != "claude":
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.auth_token is only "
@@ -86,29 +77,21 @@ class ManifestAgentProvider:
f"bottle '{bottle_name}' agent_provider.forward_host_credentials "
f"must be a boolean (was {type(forward_host_credentials).__name__})"
)
if forward_host_credentials and template not in PROVIDER_TEMPLATES:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.forward_host_credentials "
f"is only supported for built-in templates "
f"({', '.join(sorted(PROVIDER_TEMPLATES))})"
)
if forward_host_credentials and template != "codex":
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.forward_host_credentials "
"is currently only supported for template 'codex'"
)
settings = _parse_provider_settings(bottle_name, template, d.get("settings"))
return cls(
template=template,
dockerfile=dockerfile,
auth_token=auth_token,
forward_host_credentials=forward_host_credentials,
settings=settings,
)
@dataclass(frozen=True)
class ManifestAgent:
class Agent:
bottle: str
skills: tuple[str, ...] = ()
prompt: str = ""
@@ -116,10 +99,10 @@ class ManifestAgent:
# bottle's git-gate.user per-field at `Manifest.bottle_for`. Only
# `user` is allowed at the agent level; `repos` stays bottle-only
# because it carries credentials and host trust.
git_user: ManifestGitUser = ManifestGitUser()
git_user: GitUser = GitUser()
@classmethod
def from_dict(cls, name: str, raw: object, bottle_names: set[str]) -> "ManifestAgent":
def from_dict(cls, name: str, raw: object, bottle_names: set[str]) -> "Agent":
d = as_json_object(raw, f"agent '{name}'")
unknown = set(d.keys()) - AGENT_MODEL_KEYS
if unknown:
@@ -174,11 +157,11 @@ class ManifestAgent:
# git-gate: agents may declare only `git-gate.user` (name/email).
# `git-gate.repos` is bottle-only — it carries credentials and host trust.
git_user = ManifestGitUser()
git_user = GitUser()
git_raw = d.get("git-gate")
if git_raw is not None:
gd = as_json_object(git_raw, f"agent '{name}' git-gate")
for k in gd:
for k in gd.keys():
if k != "user":
raise ManifestError(
f"agent '{name}' git-gate.{k} is not allowed at the "
@@ -187,90 +170,6 @@ class ManifestAgent:
f"(it carries credentials and host trust)."
)
if "user" in gd:
git_user = ManifestGitUser.from_dict(name, gd["user"])
git_user = GitUser.from_dict(name, gd["user"])
return cls(bottle=bottle, skills=skills, prompt=prompt, git_user=git_user)
def _parse_provider_settings(
bottle_name: str,
template: str,
raw: object,
) -> dict[str, object]:
if raw is None:
return {}
if template != "pi":
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings is only "
"supported for template 'pi'"
)
settings = as_json_object(raw, f"bottle '{bottle_name}' agent_provider.settings")
allowed = {
"provider",
"base_url",
"api",
"api_key",
"api_key_env",
"models",
"context_window",
"max_tokens_field",
"max_tokens",
"supports_developer_role",
"supports_reasoning_effort",
}
for key in settings:
if key not in allowed:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings has unknown "
f"key {key!r}; allowed: {', '.join(sorted(allowed))}"
)
for key in ("provider", "base_url", "api", "api_key", "api_key_env"):
value = settings.get(key)
if value is not None and (not isinstance(value, str) or not value):
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.{key} must "
"be a non-empty string"
)
max_tokens_field = settings.get("max_tokens_field")
if max_tokens_field is not None and max_tokens_field not in (
"max_tokens", "max_completion_tokens",
):
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.max_tokens_field "
"must be 'max_tokens' or 'max_completion_tokens'"
)
if settings.get("api_key") is not None and settings.get("api_key_env") is not None:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings may set either "
"api_key or api_key_env, not both"
)
models = settings.get("models")
if models is not None:
if not isinstance(models, list) or not models:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.models must "
"be a non-empty array of strings"
)
for i, model in enumerate(models):
if not isinstance(model, str) or not model:
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.models[{i}] "
"must be a non-empty string"
)
for key in ("supports_developer_role", "supports_reasoning_effort"):
value = settings.get(key)
if value is not None and not isinstance(value, bool):
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.{key} must "
f"be a boolean (was {type(value).__name__})"
)
for key in ("context_window", "max_tokens"):
value = settings.get(key)
if value is not None and (
not isinstance(value, int) or isinstance(value, bool) or value <= 0
):
raise ManifestError(
f"bottle '{bottle_name}' agent_provider.settings.{key} must "
f"be a positive integer (was {type(value).__name__})"
)
return dict(settings)
+143 -263
View File
@@ -1,31 +1,33 @@
"""Egress routing manifest dataclasses and helpers (PRD 0017, PRD 0053)."""
"""Egress routing manifest dataclasses and helpers."""
from __future__ import annotations
import re
from dataclasses import dataclass
import ipaddress
from dataclasses import dataclass, field
from typing import cast
from .manifest_util import ManifestError, as_json_object
# Auth schemes for the egress route's optional `auth` block.
# Same values cred-proxy accepts today; `token` sidesteps the Gitea
# token-not-Bearer quirk (go-gitea/gitea#16734).
EGRESS_AUTH_SCHEMES = ("Bearer", "token")
PATH_MATCH_TYPES = ("exact", "prefix", "regex")
HEADER_MATCH_TYPES = ("exact", "regex")
VALID_METHODS = frozenset({
"GET", "HEAD", "POST", "PUT", "DELETE", "PATCH", "OPTIONS", "TRACE",
"CONNECT",
})
OUTBOUND_DETECTOR_NAMES = frozenset({"token_patterns", "known_secrets"})
INBOUND_DETECTOR_NAMES = frozenset({"naive_injection_detection"})
def validate_egress_routes(
bottle_name: str,
routes: tuple[ManifestEgressRoute, ...],
routes: tuple[EgressRoute, ...],
) -> None:
"""Cross-validation for `bottle.egress.routes`: hosts must be unique.
The proxy matches by exact-host (v1); duplicate hosts leave the
route choice ambiguous so we reject them up front.
No cross-validation against `bottle.git-gate.repos` is performed.
git-gate (SSH push/fetch) and egress (HTTPS) broker different
protocols; declaring both for the same host is a legitimate dev
setup."""
seen_hosts: dict[str, None] = {}
for r in routes:
key = r.Host.lower()
@@ -38,62 +40,132 @@ def validate_egress_routes(
@dataclass(frozen=True)
class ManifestPathMatch:
Type: str = "prefix"
Value: str = ""
class PipelockRoutePolicy:
"""Per-route pipelock policy overrides.
`TlsPassthrough` adds the route host to pipelock's
`tls_interception.passthrough_domains`, so pipelock still enforces
the hostname allowlist but does not MITM/decrypt request bodies or
headers for that host.
`SsrfIpAllowlist` adds explicit IPs/CIDRs to pipelock's SSRF
allowlist for private/internal destinations behind this route.
"""
TlsPassthrough: bool = False
SsrfIpAllowlist: tuple[str, ...] = ()
@classmethod
def from_dict(
cls, bottle_name: str, idx: int, raw: object,
) -> "PipelockRoutePolicy":
label = f"bottle '{bottle_name}' egress.routes[{idx}] pipelock"
d = as_json_object(raw, label)
for k in d:
if k not in ("tls_passthrough", "ssrf_ip_allowlist"):
raise ManifestError(
f"{label} has unknown key {k!r}; "
f"only 'tls_passthrough' and 'ssrf_ip_allowlist' "
f"are accepted"
)
tls_passthrough_raw = d.get("tls_passthrough", False)
if not isinstance(tls_passthrough_raw, bool):
raise ManifestError(
f"{label}.tls_passthrough must be a boolean "
f"(was {type(tls_passthrough_raw).__name__})"
)
ssrf_raw = d.get("ssrf_ip_allowlist", [])
if not isinstance(ssrf_raw, list):
raise ManifestError(
f"{label}.ssrf_ip_allowlist must be an array "
f"(was {type(ssrf_raw).__name__})"
)
ssrf_ip_allowlist: list[str] = []
for j, item in enumerate(ssrf_raw):
if not isinstance(item, str) or not item:
raise ManifestError(
f"{label}.ssrf_ip_allowlist[{j}] must be a non-empty "
f"string (was {type(item).__name__})"
)
try:
ipaddress.ip_network(item, strict=False)
except ValueError as e:
raise ManifestError(
f"{label}.ssrf_ip_allowlist[{j}] must be an IP address "
f"or CIDR (was {item!r}): {e}"
) from e
ssrf_ip_allowlist.append(item)
return cls(
TlsPassthrough=tls_passthrough_raw,
SsrfIpAllowlist=tuple(ssrf_ip_allowlist),
)
@dataclass(frozen=True)
class ManifestHeaderMatch:
Name: str = ""
Value: str = ""
Type: str = "exact"
class EgressRoute:
"""One route on the per-bottle egress sidecar (PRD 0017).
`Host` matches the request's hostname (case-insensitive). The
optional `PathAllowlist` constrains the URL path to a set of
prefixes; empty tuple means no path-level filtering. The optional
`AuthScheme` / `TokenRef` pair drives credential injection:
when set, the proxy strips any inbound Authorization and injects
`<AuthScheme> <value-of-host-env-named-by-TokenRef>`. When the
manifest's `auth` block is omitted both fields are empty strings —
no Authorization is written, no token forwarded.
@dataclass(frozen=True)
class ManifestMatchEntry:
Paths: tuple[ManifestPathMatch, ...] = ()
Methods: tuple[str, ...] = ()
Headers: tuple[ManifestHeaderMatch, ...] = ()
`Role` is reserved for future use; all role strings are currently
rejected by the validator.
Validation rules (enforced in `from_dict`):
- `host` required, non-empty.
- `path_allowlist` optional, list of absolute path prefixes.
- `auth` optional. If present, MUST carry both `scheme` and
`token_ref` as non-empty strings; an empty `auth: {}` is an
error rather than a synonym for "no auth" (omit `auth` for
that case).
- `role` optional, reserved any non-empty value is rejected.
"""
@dataclass(frozen=True)
class ManifestEgressRoute:
Host: str
Matches: tuple[ManifestMatchEntry, ...] = ()
PathAllowlist: tuple[str, ...] = ()
AuthScheme: str = ""
TokenRef: str = ""
Role: tuple[str, ...] = ()
GitFetch: bool = False
OutboundDetectors: tuple[str, ...] | None = None
InboundDetectors: tuple[str, ...] | None = None
Pipelock: PipelockRoutePolicy = field(default_factory=PipelockRoutePolicy)
@classmethod
def from_dict(cls, bottle_name: str, idx: int, raw: object) -> "ManifestEgressRoute":
def from_dict(cls, bottle_name: str, idx: int, raw: object) -> "EgressRoute":
label = f"bottle '{bottle_name}' egress.routes[{idx}]"
d = as_json_object(raw, label)
host = d.get("host")
if not isinstance(host, str) or not host:
raise ManifestError(f"{label} missing required string field 'host'")
# --- matches ---
matches: tuple[ManifestMatchEntry, ...] = ()
matches_raw = d.get("matches")
if matches_raw is not None:
if not isinstance(matches_raw, list):
path_allow_raw = d.get("path_allowlist")
prefixes: tuple[str, ...] = ()
if path_allow_raw is not None:
if not isinstance(path_allow_raw, list):
raise ManifestError(
f"{label} matches must be an array "
f"(was {type(matches_raw).__name__})"
f"{label} path_allowlist must be an array "
f"(was {type(path_allow_raw).__name__})"
)
matches_list = cast(list[object], matches_raw)
entries: list[ManifestMatchEntry] = []
for k, entry_raw in enumerate(matches_list):
entries.append(
_parse_match_entry(label, k, entry_raw)
)
matches = tuple(entries)
path_list = cast(list[object], path_allow_raw)
collected: list[str] = []
for j, p in enumerate(path_list):
if not isinstance(p, str):
raise ManifestError(
f"{label} path_allowlist[{j}] must be a string "
f"(was {type(p).__name__})"
)
if not p.startswith("/"):
raise ManifestError(
f"{label} path_allowlist[{j}] {p!r} must be an "
f"absolute path prefix starting with '/'"
)
collected.append(p)
prefixes = tuple(collected)
# --- auth ---
auth_scheme = ""
token_ref = ""
if "auth" in d:
@@ -131,7 +203,6 @@ class ManifestEgressRoute:
auth_scheme = auth_scheme_raw
token_ref = token_ref_raw
# --- role (reserved) ---
role_raw = d.get("role")
roles: tuple[str, ...] = ()
if role_raw is None:
@@ -158,228 +229,43 @@ class ManifestEgressRoute:
f"the 'role' field is reserved for future use"
)
# --- dlp ---
outbound_detectors: tuple[str, ...] | None = None
inbound_detectors: tuple[str, ...] | None = None
if "dlp" in d:
outbound_detectors, inbound_detectors = _parse_dlp_block(
label, d.get("dlp"),
)
# --- git-over-HTTPS policy ---
git_fetch = False
if "git" in d:
git_d = as_json_object(d.get("git"), f"{label} git")
raw_fetch = git_d.get("fetch", False)
if isinstance(raw_fetch, bool):
git_fetch = raw_fetch
else:
raise ManifestError(
f"{label} git.fetch must be a boolean "
f"(was {type(raw_fetch).__name__})"
)
for k in git_d:
if k != "fetch":
raise ManifestError(
f"{label} git has unknown key {k!r}; "
f"only 'fetch' is accepted"
)
pipelock = (
PipelockRoutePolicy.from_dict(bottle_name, idx, d["pipelock"])
if "pipelock" in d
else PipelockRoutePolicy()
)
for k in d:
if k not in ("host", "matches", "auth", "role", "dlp", "git"):
if k not in ("host", "path_allowlist", "auth", "role", "pipelock"):
raise ManifestError(
f"{label} has unknown key {k!r}; accepted keys are "
f"'host', 'matches', 'auth', 'role', 'dlp', 'git'"
f"'host', 'path_allowlist', 'auth', 'role', 'pipelock'"
)
return cls(
Host=host,
Matches=matches,
PathAllowlist=prefixes,
AuthScheme=auth_scheme,
TokenRef=token_ref,
Role=roles,
GitFetch=git_fetch,
OutboundDetectors=outbound_detectors,
InboundDetectors=inbound_detectors,
Pipelock=pipelock,
)
def _parse_match_entry(
route_label: str, k: int, raw: object,
) -> ManifestMatchEntry:
label = f"{route_label} matches[{k}]"
d = as_json_object(raw, label)
paths: tuple[ManifestPathMatch, ...] = ()
paths_raw = d.get("paths")
if paths_raw is not None:
if not isinstance(paths_raw, list):
raise ManifestError(f"{label} paths must be an array")
paths_list = cast(list[object], paths_raw)
parsed_paths: list[ManifestPathMatch] = []
for j, p_raw in enumerate(paths_list):
parsed_paths.append(_parse_path_match(label, j, p_raw))
paths = tuple(parsed_paths)
methods: tuple[str, ...] = ()
methods_raw = d.get("methods")
if methods_raw is not None:
if not isinstance(methods_raw, list):
raise ManifestError(f"{label} methods must be an array")
methods_list = cast(list[object], methods_raw)
normalised: list[str] = []
for j, m in enumerate(methods_list):
if not isinstance(m, str):
raise ManifestError(
f"{label} methods[{j}] must be a string"
)
upper = m.upper()
if upper not in VALID_METHODS:
raise ManifestError(
f"{label} methods[{j}] {m!r} is not a valid HTTP method"
)
normalised.append(upper)
methods = tuple(normalised)
headers: tuple[ManifestHeaderMatch, ...] = ()
headers_raw = d.get("headers")
if headers_raw is not None:
if not isinstance(headers_raw, list):
raise ManifestError(f"{label} headers must be an array")
headers_list = cast(list[object], headers_raw)
parsed_headers: list[ManifestHeaderMatch] = []
for j, h_raw in enumerate(headers_list):
parsed_headers.append(_parse_header_match(label, j, h_raw))
headers = tuple(parsed_headers)
for key in d:
if key not in ("paths", "methods", "headers"):
raise ManifestError(f"{label} has unknown key {key!r}")
return ManifestMatchEntry(Paths=paths, Methods=methods, Headers=headers)
def _parse_path_match(
entry_label: str, j: int, raw: object,
) -> ManifestPathMatch:
label = f"{entry_label} paths[{j}]"
d = as_json_object(raw, label)
ptype = d.get("type", "prefix")
if not isinstance(ptype, str) or ptype not in PATH_MATCH_TYPES:
raise ManifestError(
f"{label} type must be one of {', '.join(PATH_MATCH_TYPES)} "
f"(got {ptype!r})"
)
value = d.get("value")
if not isinstance(value, str) or not value:
raise ManifestError(f"{label} value must be a non-empty string")
if ptype in ("exact", "prefix") and not value.startswith("/"):
raise ManifestError(
f"{label} value {value!r} must start with '/' for type {ptype!r}"
)
if ptype == "regex":
try:
re.compile(value)
except re.error as e:
raise ManifestError(
f"{label} regex {value!r} failed to compile: {e}"
) from e
for k in d:
if k not in ("type", "value"):
raise ManifestError(f"{label} has unknown key {k!r}")
return ManifestPathMatch(Type=ptype, Value=value)
def _parse_header_match(
entry_label: str, j: int, raw: object,
) -> ManifestHeaderMatch:
label = f"{entry_label} headers[{j}]"
d = as_json_object(raw, label)
name = d.get("name")
if not isinstance(name, str) or not name:
raise ManifestError(f"{label} name must be a non-empty string")
value = d.get("value")
if not isinstance(value, str):
raise ManifestError(f"{label} value must be a string")
htype = d.get("type", "exact")
if not isinstance(htype, str) or htype not in HEADER_MATCH_TYPES:
raise ManifestError(
f"{label} type must be one of {', '.join(HEADER_MATCH_TYPES)} "
f"(got {htype!r})"
)
if htype == "regex":
try:
re.compile(value)
except re.error as e:
raise ManifestError(
f"{label} regex {value!r} failed to compile: {e}"
) from e
for k in d:
if k not in ("name", "value", "type"):
raise ManifestError(f"{label} has unknown key {k!r}")
return ManifestHeaderMatch(Name=name, Value=value, Type=htype)
def _parse_dlp_block(
route_label: str,
raw: object,
) -> tuple[tuple[str, ...] | None, tuple[str, ...] | None]:
label = f"{route_label} dlp"
d = as_json_object(raw, label)
def _parse_field(
field: str,
valid_names: frozenset[str],
) -> tuple[str, ...] | None:
val = d.get(field)
if val is None:
return None
if val is False:
return ()
if not isinstance(val, list):
raise ManifestError(
f"{label} {field} must be false, a list, or omitted"
)
items = cast(list[object], val)
names: list[str] = []
for j, item in enumerate(items):
if not isinstance(item, str):
raise ManifestError(
f"{label} {field}[{j}] must be a string"
)
if item not in valid_names:
raise ManifestError(
f"{label} {field}[{j}] {item!r} is not a valid "
f"detector; valid: {', '.join(sorted(valid_names))}"
)
names.append(item)
return tuple(names)
outbound = _parse_field("outbound_detectors", OUTBOUND_DETECTOR_NAMES)
inbound = _parse_field("inbound_detectors", INBOUND_DETECTOR_NAMES)
for k in d:
if k not in ("outbound_detectors", "inbound_detectors"):
raise ManifestError(
f"{label} has unknown key {k!r}; accepted keys are "
f"'outbound_detectors', 'inbound_detectors'"
)
return outbound, inbound
LOG_LEVELS = frozenset({0, 1, 2})
@dataclass(frozen=True)
class ManifestEgressConfig:
routes: tuple[ManifestEgressRoute, ...] = ()
Log: int = 0
class EgressConfig:
"""Per-bottle egress configuration. Today this is just the
route table; the nesting under `egress:` leaves room for
per-bottle proxy settings (port override, log level, etc.) in
follow-ups."""
routes: tuple[EgressRoute, ...] = ()
@classmethod
def from_dict(cls, bottle_name: str, raw: object) -> "ManifestEgressConfig":
def from_dict(cls, bottle_name: str, raw: object) -> "EgressConfig":
d = as_json_object(raw, f"bottle '{bottle_name}' egress")
routes_raw = d.get("routes")
routes: tuple[ManifestEgressRoute, ...] = ()
routes: tuple[EgressRoute, ...] = ()
if routes_raw is not None:
if not isinstance(routes_raw, list):
raise ManifestError(
@@ -388,20 +274,14 @@ class ManifestEgressConfig:
)
routes_list = cast(list[object], routes_raw)
routes = tuple(
ManifestEgressRoute.from_dict(bottle_name, i, entry)
EgressRoute.from_dict(bottle_name, i, entry)
for i, entry in enumerate(routes_list)
)
validate_egress_routes(bottle_name, routes)
log_raw = d.get("log", 0)
if isinstance(log_raw, bool) or not isinstance(log_raw, int) \
or log_raw not in LOG_LEVELS:
raise ManifestError(
f"bottle '{bottle_name}' egress.log must be 0, 1, or 2"
)
for k in d:
if k not in ("routes", "log"):
if k != "routes":
raise ManifestError(
f"bottle '{bottle_name}' egress has unknown key {k!r}; "
f"accepted keys are 'routes', 'log'"
f"only 'routes' is accepted"
)
return cls(routes=routes, Log=log_raw)
return cls(routes=routes)
+24 -46
View File
@@ -5,13 +5,12 @@ from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from .manifest import ManifestBottle, ManifestGitEntry
from .manifest_egress import ManifestEgressConfig
from .manifest import Bottle, GitEntry
def resolve_bottles(raws: dict[str, dict[str, object]]) -> dict[str, ManifestBottle]:
"""Apply `extends:` chains and return resolved ManifestBottle objects."""
cache: dict[str, ManifestBottle] = {}
def resolve_bottles(raws: dict[str, dict[str, object]]) -> dict[str, Bottle]:
"""Apply `extends:` chains and return resolved Bottle objects."""
cache: dict[str, Bottle] = {}
for name in raws:
if name not in cache:
_resolve_one_bottle(name, raws, cache, ())
@@ -21,10 +20,10 @@ def resolve_bottles(raws: dict[str, dict[str, object]]) -> dict[str, ManifestBot
def _resolve_one_bottle(
name: str,
raws: dict[str, dict[str, object]],
cache: dict[str, ManifestBottle],
cache: dict[str, Bottle],
seen: tuple[str, ...],
) -> ManifestBottle:
from .manifest import ManifestBottle, ManifestError
) -> Bottle:
from .manifest import Bottle, ManifestError
if name in cache:
return cache[name]
@@ -33,13 +32,13 @@ def _resolve_one_bottle(
raise ManifestError(f"bottle '{name}' is in an extends cycle: {chain}")
raw = raws[name]
parent_name_raw = raw.get("extends")
# Strip `extends:` before passing to ManifestBottle.from_dict so it
# is not accidentally treated as a real ManifestBottle field by future
# Strip `extends:` before passing to Bottle.from_dict so it
# is not accidentally treated as a real Bottle field by future
# schema additions. It is only meaningful here.
child_raw = {k: v for k, v in raw.items() if k != "extends"}
if parent_name_raw is None:
bottle = ManifestBottle.from_dict(name, child_raw)
bottle = Bottle.from_dict(name, child_raw)
cache[name] = bottle
return bottle
@@ -67,27 +66,27 @@ def _resolve_one_bottle(
def _merge_bottles(
parent: ManifestBottle,
parent: Bottle,
child_raw: dict[str, object],
name: str,
) -> ManifestBottle:
) -> Bottle:
"""Apply PRD 0025 merge rules."""
from .manifest import ManifestBottle, ManifestGitUser
from .manifest import Bottle, GitUser
from .manifest_egress import validate_egress_routes
# Parse the child's declared fields into a ManifestBottle (with the
# Parse the child's declared fields into a Bottle (with the
# usual defaults for anything missing). Validation runs the same
# way it would for a leaf bottle: typos / wrong types die here.
child = ManifestBottle.from_dict(name, child_raw)
child = Bottle.from_dict(name, child_raw)
# env: dict merge, child wins on collision.
merged_env = {**parent.env, **child.env}
# git-gate.user: per-field overlay. Each non-empty field on child
# wins; empties fall through to parent. The default ManifestGitUser()
# wins; empties fall through to parent. The default GitUser()
# is two empty strings, so a child that omits git-gate.user
# inherits the parent's user verbatim.
merged_git_user = ManifestGitUser(
merged_git_user = GitUser(
name=child.git_user.name or parent.git_user.name,
email=child.git_user.email or parent.git_user.email,
)
@@ -100,16 +99,9 @@ def _merge_bottles(
else:
merged_git = parent.git
# egress.routes: missing means inherit; otherwise parent and child
# route lists concatenate. Other egress scalar fields remain
# presence-driven overlays.
merged_egress = (
_merge_egress(parent.egress, child.egress, child_raw)
if "egress" in child_raw
else parent.egress
)
# Presence-driven full-replace for the remaining scalar fields.
# Presence-driven full-replace for the remaining list-valued +
# scalar fields.
merged_egress = child.egress if "egress" in child_raw else parent.egress
merged_agent_provider = (
child.agent_provider
if "agent_provider" in child_raw
@@ -120,7 +112,7 @@ def _merge_bottles(
)
validate_egress_routes(name, merged_egress.routes)
return ManifestBottle(
return Bottle(
env=merged_env,
agent_provider=merged_agent_provider,
git=merged_git,
@@ -141,24 +133,10 @@ def _child_declares_git_gate_repos(child_raw: dict[str, object]) -> bool:
def _merge_git_remotes(
parent: tuple[ManifestGitEntry, ...],
child: tuple[ManifestGitEntry, ...],
) -> tuple[ManifestGitEntry, ...]:
parent: tuple[GitEntry, ...],
child: tuple[GitEntry, ...],
) -> tuple[GitEntry, ...]:
by_host = {entry.UpstreamHost: entry for entry in parent}
for entry in child:
by_host[entry.UpstreamHost] = entry
return tuple(by_host.values())
def _merge_egress(
parent: ManifestEgressConfig,
child: ManifestEgressConfig,
child_raw: dict[str, object],
) -> ManifestEgressConfig:
from .manifest_egress import ManifestEgressConfig
from .manifest_util import as_json_object
child_egress_raw = as_json_object(child_raw.get("egress"), "child egress")
routes = parent.routes + child.routes
log = child.Log if "log" in child_egress_raw else parent.Log
return ManifestEgressConfig(routes=routes, Log=log)
+78 -85
View File
@@ -4,6 +4,7 @@ from __future__ import annotations
import re
from dataclasses import dataclass
from typing import Optional
from .manifest_util import ManifestError, as_json_object
@@ -12,8 +13,6 @@ from .manifest_util import ManifestError, as_json_object
# defence; this regex is belt-and-suspenders and documents intent).
_GIT_NAME_RE = re.compile(r"^[A-Za-z0-9._-]+$")
_KEY_PROVIDERS = {"static", "gitea"}
def _opt_str(value: object, label: str) -> str:
if value is None:
@@ -58,7 +57,7 @@ def parse_git_upstream(url: str, label: str) -> tuple[str, str, str, str]:
return (user, host, port, path)
def validate_unique_git_names(bottle_name: str, git: tuple[ManifestGitEntry, ...]) -> None:
def validate_unique_git_names(bottle_name: str, git: tuple[GitEntry, ...]) -> None:
seen: dict[str, None] = {}
for g in git:
if g.Name in seen:
@@ -70,27 +69,25 @@ def validate_unique_git_names(bottle_name: str, git: tuple[ManifestGitEntry, ...
@dataclass(frozen=True)
class ManifestKeyConfig:
"""Configuration for a repo's SSH key in git-gate.repos.
class ProvisionedKeyConfig:
"""Configuration for automatic deploy-key lifecycle management
(PRD 0048). Used when a git-gate.repos entry opts out of a
static identity file and instead wants a fresh SSH keypair
generated at spin-up and revoked at teardown.
`provider` is either `"static"` (a pre-existing key on the host) or
`"gitea"` (automatic deploy-key lifecycle via the Gitea API).
For `static`: `path` is the host-side absolute path to the SSH private key.
For `gitea`: `forge_token_env` is the name of a host-side env var
carrying the Gitea API token; the value is read at provision time,
never stored on the plan. `api_url` is the forge's HTTP API root; if
empty, it is derived from the upstream URL's host at provision time."""
`provider` names the contrib sub-package to load (e.g. `gitea`).
`token_env` is the name of a host-side env var carrying the API
token; the value is read at provision time, never stored on the
plan. `api_url` is the forge's HTTP API root; if empty, it is
derived from the upstream URL's host at provision time."""
provider: str
path: str = ""
forge_token_env: str = ""
token_env: str
api_url: str = ""
@dataclass(frozen=True)
class ManifestGitEntry:
class GitEntry:
"""One upstream the per-agent git-gate (PRD 0008) is allowed to
talk to. `Upstream` is the real remote URL the agent would push to
if there were no gate; the gate hosts a bare repo at /git/<Name>.git
@@ -102,16 +99,15 @@ class ManifestGitEntry:
stashed in the `Upstream*` fields so the git-gate render step
doesn't have to re-parse.
Manifest source: `git-gate.repos.<Name>` (PRD 0047/0048). A `key`
block is required; `key.provider` is `"static"` or `"gitea"`. For
`static`, `IdentityFile` is populated at parse time from `key.path`.
For `gitea`, `IdentityFile` is populated at provision time."""
Manifest source: `git-gate.repos.<Name>` (PRD 0047/0048). Exactly
one of `identity` (static key path) or `provisioned_key` (automatic
lifecycle) must be present. The internal field names are stable."""
Name: str
Upstream: str
Key: ManifestKeyConfig = ManifestKeyConfig(provider="")
IdentityFile: str = ""
KnownHostKey: str = ""
ProvisionedKey: Optional[ProvisionedKeyConfig] = None
RemoteKey: str = ""
UpstreamUser: str = ""
UpstreamHost: str = ""
@@ -121,11 +117,11 @@ class ManifestGitEntry:
@classmethod
def from_repos_entry(
cls, bottle_name: str, repo_name: str, raw: object
) -> "ManifestGitEntry":
) -> "GitEntry":
"""Parse one entry from `git-gate.repos.<repo_name>`.
YAML keys: `url` (required), `key` (required object with
`provider`, and provider-specific fields), `host_key` (optional).
YAML keys: `url` (required), exactly one of `identity` or
`provisioned_key` (required), `host_key` (optional).
The repo_name becomes `Name`."""
if not repo_name:
raise ManifestError(
@@ -139,10 +135,10 @@ class ManifestGitEntry:
label = f"git-gate.repos[{repo_name!r}]"
d = as_json_object(raw, f"bottle '{bottle_name}' {label}")
for k in d:
if k not in {"url", "key", "host_key"}:
if k not in {"url", "identity", "provisioned_key", "host_key"}:
raise ManifestError(
f"bottle '{bottle_name}' {label} has unknown key {k!r}; "
f"allowed: url, key, host_key"
f"allowed: url, identity, provisioned_key, host_key"
)
upstream = d.get("url")
if not isinstance(upstream, str) or not upstream:
@@ -150,13 +146,32 @@ class ManifestGitEntry:
f"bottle '{bottle_name}' {label} missing required string field 'url'"
)
if "key" not in d:
has_identity = "identity" in d
has_provisioned = "provisioned_key" in d
if has_identity and has_provisioned:
raise ManifestError(
f"bottle '{bottle_name}' {label} missing required 'key' block"
f"bottle '{bottle_name}' {label} must set exactly one of "
f"'identity' or 'provisioned_key'; got both."
)
if not has_identity and not has_provisioned:
raise ManifestError(
f"bottle '{bottle_name}' {label} must set exactly one of "
f"'identity' or 'provisioned_key'; got neither."
)
key_config = _parse_key_config(bottle_name, label, d["key"])
ident = key_config.path if key_config.provider == "static" else ""
ident = ""
provisioned_key: Optional[ProvisionedKeyConfig] = None
if has_identity:
raw_ident = d.get("identity")
if not isinstance(raw_ident, str) or not raw_ident:
raise ManifestError(
f"bottle '{bottle_name}' {label} 'identity' must be a non-empty string"
)
ident = raw_ident
else:
provisioned_key = _parse_provisioned_key_config(
bottle_name, label, d["provisioned_key"]
)
khk = _opt_str(
d.get("host_key"),
@@ -168,9 +183,9 @@ class ManifestGitEntry:
return cls(
Name=repo_name,
Upstream=upstream,
Key=key_config,
IdentityFile=ident,
KnownHostKey=khk,
ProvisionedKey=provisioned_key,
RemoteKey=host,
UpstreamUser=user,
UpstreamHost=host,
@@ -179,64 +194,42 @@ class ManifestGitEntry:
)
def _parse_key_config(
def _parse_provisioned_key_config(
bottle_name: str, label: str, raw: object
) -> ManifestKeyConfig:
d = as_json_object(raw, f"bottle '{bottle_name}' {label}.key")
) -> ProvisionedKeyConfig:
d = as_json_object(raw, f"bottle '{bottle_name}' {label}.provisioned_key")
for k in d:
if k not in {"provider", "token_env", "api_url"}:
raise ManifestError(
f"bottle '{bottle_name}' {label}.provisioned_key has unknown key {k!r}; "
f"allowed: provider, token_env, api_url"
)
provider = d.get("provider")
if not isinstance(provider, str) or not provider:
raise ManifestError(
f"bottle '{bottle_name}' {label}.key missing required "
f"bottle '{bottle_name}' {label}.provisioned_key missing required "
f"string field 'provider'"
)
if provider not in _KEY_PROVIDERS:
token_env = d.get("token_env")
if not isinstance(token_env, str) or not token_env:
raise ManifestError(
f"bottle '{bottle_name}' {label}.key provider {provider!r} is unknown; "
f"allowed: {', '.join(sorted(_KEY_PROVIDERS))}"
f"bottle '{bottle_name}' {label}.provisioned_key missing required "
f"string field 'token_env'"
)
if provider == "gitea":
for k in d:
if k not in {"provider", "forge_token_env", "api_url"}:
raise ManifestError(
f"bottle '{bottle_name}' {label}.key has unknown key {k!r} "
f"for provider 'gitea'; allowed: provider, forge_token_env, api_url"
)
forge_token_env = d.get("forge_token_env")
if not isinstance(forge_token_env, str) or not forge_token_env:
raise ManifestError(
f"bottle '{bottle_name}' {label}.key missing required "
f"string field 'forge_token_env' for provider 'gitea'"
)
api_url_raw = d.get("api_url", "")
if not isinstance(api_url_raw, str):
raise ManifestError(
f"bottle '{bottle_name}' {label}.key 'api_url' must be a string"
)
return ManifestKeyConfig(
provider=provider,
forge_token_env=forge_token_env,
api_url=api_url_raw,
)
# provider == "static"
for k in d:
if k not in {"provider", "path"}:
raise ManifestError(
f"bottle '{bottle_name}' {label}.key has unknown key {k!r} "
f"for provider 'static'; allowed: provider, path"
)
path = d.get("path")
if not isinstance(path, str) or not path:
api_url_raw = d.get("api_url", "")
if not isinstance(api_url_raw, str):
raise ManifestError(
f"bottle '{bottle_name}' {label}.key missing required "
f"string field 'path' for provider 'static'"
f"bottle '{bottle_name}' {label}.provisioned_key 'api_url' must be a string"
)
return ManifestKeyConfig(provider=provider, path=path)
return ProvisionedKeyConfig(
provider=provider,
token_env=token_env,
api_url=api_url_raw,
)
@dataclass(frozen=True)
class ManifestGitUser:
class GitUser:
"""Per-bottle `git config --global user.name` / `user.email`
pair (issue #86). The agent's commits inside the bottle are
attributed to this identity rather than the agent image's
@@ -251,9 +244,9 @@ class ManifestGitUser:
email: str = ""
@classmethod
def from_dict(cls, bottle_name: str, raw: object) -> "ManifestGitUser":
def from_dict(cls, bottle_name: str, raw: object) -> "GitUser":
d = as_json_object(raw, f"bottle '{bottle_name}' git-gate.user")
for k in d:
for k in d.keys():
if k not in {"name", "email"}:
raise ManifestError(
f"bottle '{bottle_name}' git-gate.user has unknown key {k!r}; "
@@ -286,9 +279,9 @@ class ManifestGitUser:
def parse_git_gate_config(
bottle_name: str,
raw: object,
) -> tuple[tuple[ManifestGitEntry, ...], ManifestGitUser]:
) -> tuple[tuple[GitEntry, ...], GitUser]:
d = as_json_object(raw, f"bottle '{bottle_name}' git-gate")
for k in d:
for k in d.keys():
if k not in {"user", "repos"}:
raise ManifestError(
f"bottle '{bottle_name}' git-gate has unknown key {k!r}; "
@@ -296,17 +289,17 @@ def parse_git_gate_config(
)
git_user = (
ManifestGitUser.from_dict(bottle_name, d["user"])
GitUser.from_dict(bottle_name, d["user"])
if "user" in d
else ManifestGitUser()
else GitUser()
)
git: tuple[ManifestGitEntry, ...] = ()
git: tuple[GitEntry, ...] = ()
repos_raw = d.get("repos")
if repos_raw is not None:
repos = as_json_object(repos_raw, f"bottle '{bottle_name}' git-gate.repos")
git = tuple(
ManifestGitEntry.from_repos_entry(bottle_name, name, entry)
GitEntry.from_repos_entry(bottle_name, name, entry)
for name, entry in repos.items()
)
validate_unique_git_names(bottle_name, git)
+6 -6
View File
@@ -14,7 +14,7 @@ from .manifest_schema import (
from .yaml_subset import YamlSubsetError, parse_frontmatter
if TYPE_CHECKING:
from .manifest import ManifestAgent, ManifestBottle
from .manifest import Agent, Bottle
def check_stale_json(dir_path: Path, md_dir: Path, label: str) -> None:
@@ -34,7 +34,7 @@ def check_stale_json(dir_path: Path, md_dir: Path, label: str) -> None:
)
def load_bottles_from_dir(bottles_dir: Path) -> dict[str, ManifestBottle]:
def load_bottles_from_dir(bottles_dir: Path) -> dict[str, Bottle]:
"""Walk `<bottles_dir>/*.md`, parse each as a bottle, and return
`{name: Bottle}`. Missing dir returns an empty dict."""
from .manifest import ManifestError
@@ -67,13 +67,13 @@ def load_agents_from_dir(
bottle_names: set[str],
*,
source: str, # noqa: F841 — unused, but required by interface
) -> dict[str, ManifestAgent]:
) -> dict[str, Agent]:
"""Walk `<agents_dir>/*.md`, parse each as an agent, and return
`{name: Agent}`. The Markdown body becomes the agent's prompt.
Missing dir returns an empty dict."""
from .manifest import ManifestAgent, ManifestError
from .manifest import Agent, ManifestError
out: dict[str, ManifestAgent] = {}
out: dict[str, Agent] = {}
if not agents_dir.is_dir():
return out
for path in sorted(agents_dir.glob("*.md")):
@@ -101,5 +101,5 @@ def load_agents_from_dir(
}
if "git-gate" in fm:
agent_dict["git-gate"] = fm["git-gate"]
out[name] = ManifestAgent.from_dict(name, agent_dict, bottle_names)
out[name] = Agent.from_dict(name, agent_dict, bottle_names)
return out
+541
View File
@@ -0,0 +1,541 @@
"""Pipelock sidecar lifecycle for the per-agent egress topology.
Pipelock (https://github.com/luckyPipewrench/pipelock) is an HTTP
forward proxy with hostname allowlisting + DLP scanning + URL-entropy
checks. One sidecar per agent, attached to the agent's --internal
network and a per-agent user-defined egress bridge.
Post-PRD-0017 topology: the agent's HTTP_PROXY points at egress
(not pipelock); egress sets `HTTPS_PROXY=pipelock` on its
outbound leg. So pipelock no longer sees the agent's connections
directly it sees the egress upstream leg, applies the
hostname allowlist + DLP body scan there, and forwards to the real
upstream.
Image pin: ghcr.io/luckypipewrench/pipelock@sha256:<digest> for tag 2.3.0.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import cast
from .egress import EgressRoute, egress_routes_for_bottle
from .supervise import SUPERVISE_HOSTNAME
from .manifest import Bottle
# Hosts pipelock should NOT TLS-MITM, even when tls_interception is
# enabled. This is now route-owned manifest policy via
# `egress.routes[].pipelock.tls_passthrough`; no provider hosts are
# injected implicitly.
DEFAULT_TLS_PASSTHROUGH: tuple[str, ...] = ()
# In-container paths the rendered pipelock YAML references under
# `tls_interception`. The pipelock binary expects the per-bottle CA
# cert + key at these exact paths inside its container — independent
# of how the daemon is wrapped (own container, sidecar bundle, etc.),
# which is why they live in the platform-neutral module.
PIPELOCK_CA_CERT_IN_CONTAINER = "/etc/pipelock-ca.pem"
PIPELOCK_CA_KEY_IN_CONTAINER = "/etc/pipelock-ca-key.pem"
# Short network alias for pipelock inside the sidecar bundle. The
# agent's HTTP_PROXY (when no egress is declared) and any in-bundle
# consumer's URL both reference this name.
PIPELOCK_HOSTNAME = "pipelock"
# --- Allowlist resolution --------------------------------------------------
def pipelock_effective_allowlist(
bottle: Bottle,
provider_routes: tuple[EgressRoute, ...] = (),
) -> list[str]:
"""Hostnames pipelock allows. Sorted for stability.
Always mirrors `egress_routes_for_bottle(bottle, provider_routes)`
egress is the single allowlist surface, and pipelock's allowlist is
the downstream copy for defense-in-depth + DLP body scanning. For
bottles without any `egress.routes[]` declared, this is empty except
for supervise sidecar traffic when `supervise: true`.
The supervise sidecar's hostname is auto-added when supervise
is enabled (sibling-sidecar traffic that flows through pipelock
would otherwise be 403'd). Git upstreams declared in
`bottle.git` do NOT contribute here git traffic flows
through git-gate (PRD 0008), not pipelock."""
seen: dict[str, None] = {}
for r in egress_routes_for_bottle(bottle, provider_routes):
if r.host:
seen.setdefault(r.host, None)
if bottle.supervise:
seen.setdefault(SUPERVISE_HOSTNAME, None)
return sorted(seen.keys())
def pipelock_seed_phrase_detection_enabled(bottle: Bottle) -> bool:
"""Whether pipelock's BIP-39 seed-phrase detector stays on.
LLM conversation bodies legitimately trip the detector any 12+
English words that pass the BIP-39 checksum match so agents can
get blocked on ordinary prompts/responses regardless of provider
(Claude, Codex/OpenAI, or future harnesses). We tried two narrower
knobs first:
- `suppress: [{rule, path}]` pipelock accepts the schema
but the entry only silences the alert; the body_dlp block
still fires.
- `rules.disabled: ["dlp:BIP-39 Seed Phrase"]` same shape,
same outcome: 403 still returned.
Empirically only `seed_phrase_detection.enabled: false`
actually stops the block (verified by sending a 12-word BIP-39
body through three pipelock instances). It is a global toggle
no per-path / per-host knob in pipelock 2.3.0 so we turn off
only this detector for every bottle. The rest of pipelock's DLP
defaults and request-body/header scanning remain enabled."""
del bottle # kept for call-site stability and future policy knobs.
return False
def pipelock_effective_tls_passthrough(
bottle: Bottle,
provider_routes: tuple[EgressRoute, ...] = (),
) -> list[str]:
"""Hostnames pipelock should pass through (no TLS MITM).
A manifest route opts in with `pipelock.tls_passthrough: true`
(lifted into `EgressRoute.tls_passthrough` in `egress_manifest_routes`).
Provider routes that set `tls_passthrough=True` (e.g. Codex credential
routes where egress injects the host bearer after the agent boundary)
are also included. Both arrive via `egress_routes_for_bottle` no
provider-specific branching needed here.
"""
seen: dict[str, None] = {host: None for host in DEFAULT_TLS_PASSTHROUGH}
for route in egress_routes_for_bottle(bottle, provider_routes):
if route.tls_passthrough:
seen.setdefault(route.host, None)
return sorted(seen.keys())
def pipelock_effective_ssrf_ip_allowlist(
bottle: Bottle,
extra: tuple[str, ...] = (),
) -> list[str]:
"""IP/CIDR entries that bypass pipelock's SSRF destination guard.
Launch code can pass backend-owned entries through `extra`, while
route-owned entries come from `pipelock.ssrf_ip_allowlist`.
"""
seen: dict[str, None] = {ip: None for ip in extra}
for route in bottle.egress.routes:
for ip in route.Pipelock.SsrfIpAllowlist:
seen.setdefault(ip, None)
return sorted(seen.keys())
# --- Config build + YAML render --------------------------------------------
def pipelock_build_config(
bottle: Bottle,
*,
ca_cert_path: str = "",
ca_key_path: str = "",
ssrf_ip_allowlist: tuple[str, ...] = (),
provider_routes: tuple[EgressRoute, ...] = (),
) -> dict[str, object]:
"""Build the structured pipelock config dict the sidecar will load.
Deliberately carries no env values, no secrets, no per-agent
customization beyond the resolved hostname list. The shape mirrors
the YAML pipelock expects on disk; `pipelock_render_yaml` serializes
it. Tests assert on this dict; production code renders it.
`ca_cert_path` / `ca_key_path` are the **in-container** paths the
pipelock sidecar will read its CA from at runtime (they're
populated into the container at start time via `docker cp`).
Pass both or neither: both emit `tls_interception` block with
`enabled: true`; neither omit the block entirely (pipelock
falls back to its built-in default of `enabled: false`). Used
by PRD 0006 to turn on pipelock's native TLS interception.
`ssrf_ip_allowlist` is the list of IPs / CIDRs that bypass
pipelock's SSRF guard. Pipelock blocks RFC1918-resolved
destinations by default, which would catch sibling-sidecar
traffic on the bottle's internal Docker network in 172.x space
(e.g. egress pipelock on the upstream leg). Pass the
bottle's internal network CIDR here so internal-network requests
pass through pipelock while api_allowlist + body-scanning still
apply. Empty by default; omitted from the rendered yaml when
empty so pipelock keeps its built-in SSRF defaults."""
cfg: dict[str, object] = {
"version": 1,
"mode": "strict",
"enforce": True,
"api_allowlist": pipelock_effective_allowlist(bottle, provider_routes),
"forward_proxy": {"enabled": True},
}
if not pipelock_seed_phrase_detection_enabled(bottle):
cfg["seed_phrase_detection"] = {"enabled": False}
cfg["dlp"] = {"include_defaults": True, "scan_env": True}
# Body-scan enforcement is a separate pipelock section (each DLP
# "surface" — body, MCP, response — has its own action). Pipelock's
# built-in default for request_body_scanning is "warn" (forward
# with a log line); bot-bottle hard-codes "block" so a hit
# actually stops the request from leaving the egress network.
#
# `scan_headers: true` + `header_mode: all` extends the scan to
# every request header — pipelock's default `header_mode:
# sensitive` only checks Authorization / Cookie / X-Api-Key /
# X-Token / Proxy-Authorization / X-Goog-Api-Key, which an
# agent attempting to exfil could trivially avoid by picking
# a non-sensitive header name. "all" closes the gap; pipelock
# caps it at the same max_body_bytes the body scan uses.
cfg["request_body_scanning"] = {
"action": "block",
"scan_headers": True,
"header_mode": "all",
}
if ca_cert_path or ca_key_path:
if not (ca_cert_path and ca_key_path):
raise ValueError(
"pipelock_build_config: pass both ca_cert_path and ca_key_path "
"to enable tls_interception, or neither to leave it off"
)
cfg["tls_interception"] = {
"enabled": True,
"ca_cert": ca_cert_path,
"ca_key": ca_key_path,
"passthrough_domains": pipelock_effective_tls_passthrough(bottle, provider_routes),
}
effective_ssrf_ip_allowlist = pipelock_effective_ssrf_ip_allowlist(
bottle, ssrf_ip_allowlist,
)
if effective_ssrf_ip_allowlist:
cfg["ssrf"] = {"ip_allowlist": effective_ssrf_ip_allowlist}
return cfg
_PIPELOCK_TOP_LEVEL_KEYS = {
"version",
"mode",
"enforce",
"api_allowlist",
"seed_phrase_detection",
"forward_proxy",
"dlp",
"request_body_scanning",
"tls_interception",
"ssrf",
}
def _pipelock_render_error(section: str, key: str, expected: str) -> ValueError:
return ValueError(
f"pipelock_render_yaml: {section}.{key} must be {expected}"
)
def _reject_unknown_keys(
section: str,
obj: dict[str, object],
allowed: set[str],
) -> None:
for key in sorted(set(obj) - allowed):
raise ValueError(f"pipelock_render_yaml: {section}.{key} is unsupported")
def _required_dict(
obj: dict[str, object],
section: str,
key: str,
) -> dict[str, object]:
value = obj.get(key)
if not isinstance(value, dict):
raise _pipelock_render_error(section, key, "a mapping")
return cast(dict[str, object], value)
def _required_bool(obj: dict[str, object], section: str, key: str) -> bool:
value = obj.get(key)
if not isinstance(value, bool):
raise _pipelock_render_error(section, key, "a boolean")
return value
def _required_int(obj: dict[str, object], section: str, key: str) -> int:
value = obj.get(key)
if isinstance(value, bool) or not isinstance(value, int):
raise _pipelock_render_error(section, key, "an integer")
return value
def _required_str(obj: dict[str, object], section: str, key: str) -> str:
value = obj.get(key)
if not isinstance(value, str):
raise _pipelock_render_error(section, key, "a string")
return value
def _required_str_list(
obj: dict[str, object],
section: str,
key: str,
) -> list[str]:
value = obj.get(key)
if not isinstance(value, list):
raise _pipelock_render_error(section, key, "a list of strings")
value_list = cast(list[object], value)
if not all(isinstance(v, str) for v in value_list):
raise _pipelock_render_error(section, key, "a list of strings")
return cast(list[str], value)
def _optional_str_list(
obj: dict[str, object],
section: str,
key: str,
) -> list[str]:
if key not in obj:
return []
return _required_str_list(obj, section, key)
def _optional_bool(
obj: dict[str, object],
section: str,
key: str,
) -> bool | None:
if key not in obj:
return None
return _required_bool(obj, section, key)
def _optional_str(
obj: dict[str, object],
section: str,
key: str,
) -> str | None:
if key not in obj:
return None
return _required_str(obj, section, key)
def _validate_pipelock_render_config(cfg: dict[str, object]) -> dict[str, object]:
_reject_unknown_keys("config", cfg, _PIPELOCK_TOP_LEVEL_KEYS)
normalized: dict[str, object] = {
"version": _required_int(cfg, "config", "version"),
"mode": _required_str(cfg, "config", "mode"),
"enforce": _required_bool(cfg, "config", "enforce"),
"api_allowlist": _required_str_list(cfg, "config", "api_allowlist"),
}
if "seed_phrase_detection" in cfg:
spd = _required_dict(cfg, "config", "seed_phrase_detection")
_reject_unknown_keys("seed_phrase_detection", spd, {"enabled"})
normalized["seed_phrase_detection"] = {
"enabled": _required_bool(spd, "seed_phrase_detection", "enabled"),
}
fp = _required_dict(cfg, "config", "forward_proxy")
_reject_unknown_keys("forward_proxy", fp, {"enabled"})
normalized["forward_proxy"] = {
"enabled": _required_bool(fp, "forward_proxy", "enabled"),
}
dlp = _required_dict(cfg, "config", "dlp")
_reject_unknown_keys("dlp", dlp, {"include_defaults", "scan_env"})
normalized["dlp"] = {
"include_defaults": _required_bool(dlp, "dlp", "include_defaults"),
"scan_env": _required_bool(dlp, "dlp", "scan_env"),
}
rbs = _required_dict(cfg, "config", "request_body_scanning")
_reject_unknown_keys(
"request_body_scanning",
rbs,
{"action", "scan_headers", "header_mode"},
)
normalized_rbs: dict[str, object] = {
"action": _required_str(rbs, "request_body_scanning", "action"),
}
scan_headers = _optional_bool(rbs, "request_body_scanning", "scan_headers")
if scan_headers is not None:
normalized_rbs["scan_headers"] = scan_headers
header_mode = _optional_str(rbs, "request_body_scanning", "header_mode")
if header_mode is not None:
normalized_rbs["header_mode"] = header_mode
normalized["request_body_scanning"] = normalized_rbs
if "tls_interception" in cfg:
tls = _required_dict(cfg, "config", "tls_interception")
_reject_unknown_keys(
"tls_interception",
tls,
{"enabled", "ca_cert", "ca_key", "passthrough_domains"},
)
normalized["tls_interception"] = {
"enabled": _required_bool(tls, "tls_interception", "enabled"),
"ca_cert": _required_str(tls, "tls_interception", "ca_cert"),
"ca_key": _required_str(tls, "tls_interception", "ca_key"),
"passthrough_domains": _optional_str_list(
tls, "tls_interception", "passthrough_domains",
),
}
if "ssrf" in cfg:
ssrf = _required_dict(cfg, "config", "ssrf")
_reject_unknown_keys("ssrf", ssrf, {"ip_allowlist"})
normalized["ssrf"] = {
"ip_allowlist": _required_str_list(ssrf, "ssrf", "ip_allowlist"),
}
return normalized
def pipelock_render_yaml(cfg: dict[str, object]) -> str:
"""Render a pipelock config dict (as produced by
`pipelock_build_config`) as YAML. Hand-rolled so we don't take a
YAML-parser dependency for a fixed, narrow shape."""
def _bool(b: object) -> str:
return "true" if b else "false"
cfg = _validate_pipelock_render_config(cfg)
lines: list[str] = []
lines.append(f"version: {cfg['version']}")
lines.append(f"mode: {cfg['mode']}")
lines.append(f"enforce: {_bool(cast(bool, cfg['enforce']))}")
lines.append("")
lines.append("api_allowlist:")
api_allowlist = cast(list[str], cfg["api_allowlist"])
for h in api_allowlist:
lines.append(f' - "{h}"')
lines.append("")
if "seed_phrase_detection" in cfg:
lines.append("seed_phrase_detection:")
spd = cast(dict[str, object], cfg["seed_phrase_detection"])
lines.append(f" enabled: {_bool(cast(bool, spd['enabled']))}")
lines.append("")
lines.append("forward_proxy:")
fp = cast(dict[str, object], cfg["forward_proxy"])
lines.append(f" enabled: {_bool(cast(bool, fp['enabled']))}")
lines.append("")
lines.append("dlp:")
dlp = cast(dict[str, object], cfg["dlp"])
lines.append(f" include_defaults: {_bool(cast(bool, dlp['include_defaults']))}")
lines.append(f" scan_env: {_bool(cast(bool, dlp['scan_env']))}")
lines.append("")
lines.append("request_body_scanning:")
rbs = cast(dict[str, object], cfg["request_body_scanning"])
lines.append(f' action: "{cast(str, rbs["action"])}"')
if "scan_headers" in rbs:
lines.append(f" scan_headers: {_bool(cast(bool, rbs['scan_headers']))}")
if "header_mode" in rbs:
lines.append(f' header_mode: "{cast(str, rbs["header_mode"])}"')
if "tls_interception" in cfg:
lines.append("")
lines.append("tls_interception:")
tls = cast(dict[str, object], cfg["tls_interception"])
lines.append(f" enabled: {_bool(cast(bool, tls['enabled']))}")
lines.append(f' ca_cert: "{cast(str, tls["ca_cert"])}"')
lines.append(f' ca_key: "{cast(str, tls["ca_key"])}"')
passthrough = cast(list[str], tls["passthrough_domains"])
if passthrough:
lines.append(" passthrough_domains:")
for d in passthrough:
lines.append(f' - "{d}"')
if "ssrf" in cfg:
lines.append("")
lines.append("ssrf:")
ssrf = cast(dict[str, object], cfg["ssrf"])
lines.append(" ip_allowlist:")
ip_allowlist = cast(list[str], ssrf["ip_allowlist"])
for ip in ip_allowlist:
lines.append(f' - "{ip}"')
return "\n".join(lines) + "\n"
# --- Proxy class -----------------------------------------------------------
@dataclass(frozen=True)
class PipelockProxyPlan:
"""Output of PipelockProxy.prepare; consumed by .start when the
sidecar needs to be brought up.
yaml_path + slug are filled in at prepare time (host-side, side-
effect-free; the YAML references the in-container CA paths
already so it doesn't need the host paths to be valid). The
remaining fields are populated by the backend's launch step
via `dataclasses.replace`: internal/egress networks once
those networks exist, the CA host paths once the one-shot
`pipelock tls init` has run, and `internal_network_cidr` once
Docker has assigned a subnet to the internal network. Empty
defaults are sentinels meaning "not yet set"; `.start` validates
that they are populated.
`internal_network_cidr` ends up on pipelock's `ssrf.ip_allowlist`
so traffic from sibling sidecars (egress pipelock on the
upstream leg, etc.) bypasses pipelock's RFC1918 SSRF guard while
api_allowlist and body-scanning still apply."""
yaml_path: Path
slug: str
internal_network: str = ""
internal_network_cidr: str = ""
egress_network: str = ""
ca_cert_host_path: Path = Path()
ca_key_host_path: Path = Path()
class PipelockProxy:
"""The pipelock egress proxy. Encapsulates the YAML-config
generation; the container lifecycle is owned by whatever
wraps the daemon (compose-managed pipelock container on docker,
sidecar-bundle PID 1 on smolmachines).
Backends instantiate the class directly there are no
platform-specific subclasses; the in-container CA paths are
universal module-level constants
(`PIPELOCK_CA_CERT_IN_CONTAINER` / `PIPELOCK_CA_KEY_IN_CONTAINER`)."""
def prepare(
self,
bottle: Bottle,
slug: str,
stage_dir: Path,
provider_routes: tuple[EgressRoute, ...] = (),
) -> PipelockProxyPlan:
"""Write the pipelock yaml config (mode 600) under `stage_dir`
and return the plan for launch. Pure host-side, no docker
subprocess.
`slug` is the agent-derived identifier (lowercased,
hyphen-normalized) used as the suffix in every per-agent
resource name the agent container, the sidecar bundle
container, the internal/egress networks. It's stored on the
returned plan so the backend's launch step can derive those
names.
The CA paths the YAML references are the module-level
in-container constants. The host-side counterparts are
generated by the launch step (not here, so prepare stays
side-effect-free on docker) and added to the plan via
`dataclasses.replace` before the daemon starts."""
yaml_path = stage_dir / "pipelock.yaml"
cfg = pipelock_build_config(
bottle,
ca_cert_path=PIPELOCK_CA_CERT_IN_CONTAINER,
ca_key_path=PIPELOCK_CA_KEY_IN_CONTAINER,
provider_routes=provider_routes,
)
yaml_path.write_text(pipelock_render_yaml(cfg))
yaml_path.chmod(0o600)
return PipelockProxyPlan(yaml_path=yaml_path, slug=slug)
+39 -27
View File
@@ -1,7 +1,7 @@
"""Per-bottle sidecar supervisor (PRD 0024 chunk 1).
PID 1 inside the `bot-bottle-sidecars` bundle image. Spawns
the configured daemons (egress, git-gate, supervise),
the configured daemons (egress, pipelock, git-gate, supervise),
forwards SIGTERM/SIGINT to each child, and propagates per-daemon
stdout+stderr to the container log with a `[name] ` prefix.
@@ -19,7 +19,7 @@ PR; the interim policy is "don't take the bundle down for one
sick daemon."
Daemon subset is env-driven. The compose renderer narrows it via
`BOT_BOTTLE_SIDECAR_DAEMONS=egress` for bottles that
`BOT_BOTTLE_SIDECAR_DAEMONS=egress,pipelock` for bottles that
don't use git-gate or supervise. Default: all daemons.
Stdlib-only by design adding supervisord/s6/runit for four
@@ -57,9 +57,15 @@ class _DaemonSpec:
# Env-var name prefixes that carry egress-only credentials.
# `egress_apply.py` assigns `EGRESS_TOKEN_<n>` slots that egress
# reads to inject `Authorization` headers on configured routes;
# no other daemon in the bundle should see these values.
# every other daemon in the bundle (especially pipelock with
# `scan_env: true`) MUST NOT see these values or it'll match the
# injected token in the request egress just sent and 403-block
# the legitimate traffic (issue #84). The agent itself runs in a
# different machine and never has access to these slots in the
# first place, so stripping them from non-egress daemons loses no
# DLP coverage — pipelock can't catch the exfil of a value the
# agent doesn't have.
_EGRESS_ONLY_ENV_PREFIXES: tuple[str, ...] = ("EGRESS_TOKEN_",)
_READY_GATED_DAEMONS: tuple[str, ...] = ("git-gate", "git-http")
def _env_for_daemon(name: str, base_env: dict[str, str]) -> dict[str, str]:
@@ -75,30 +81,28 @@ def _env_for_daemon(name: str, base_env: dict[str, str]) -> dict[str, str]:
}
# Order matters only for first-launch race-window reasons: egress
# starts first so pipelock's upstream connect succeeds during
# pipelock's own startup. git-gate and supervise are independent.
# Pipelock binds 0.0.0.0:8888 explicitly. Without `--listen` it
# defaults to 127.0.0.1 which would be unreachable from sibling
# services on the docker network. The legacy four-sidecar
# compose renderer passed the same flag; the bundle keeps the
# explicit binding.
_DAEMONS: tuple[_DaemonSpec, ...] = (
_DaemonSpec("egress", ("/bin/sh", "/app/egress-entrypoint.sh")),
_DaemonSpec(
"pipelock",
("/usr/local/bin/pipelock", "run",
"--config", "/etc/pipelock.yaml",
"--listen", "0.0.0.0:8888"),
),
_DaemonSpec("git-gate", ("/bin/sh", "/git-gate-entrypoint.sh")),
_DaemonSpec("git-http", ("python3", "/app/git_http_backend.py")),
_DaemonSpec("supervise", ("python3", "/app/supervise_server.py")),
)
def _argv_for_daemon(name: str, argv: Sequence[str], env: dict[str, str]) -> list[str]:
ready_file = env.get("BOT_BOTTLE_GIT_GATE_READY_FILE", "").strip()
if name not in _READY_GATED_DAEMONS or not ready_file:
return list(argv)
return [
"/bin/sh",
"-c",
"while [ ! -f \"$BOT_BOTTLE_GIT_GATE_READY_FILE\" ]; do "
"sleep 0.1; "
"done; "
"exec \"$@\"",
name,
*argv,
]
def _selected_daemons(
env: dict[str, str],
all_daemons: Sequence[_DaemonSpec] | None = None,
@@ -135,13 +139,12 @@ def _pump(name: str, stream: IO[bytes]) -> None:
def _spawn(spec: _DaemonSpec) -> subprocess.Popen[bytes]:
env = _env_for_daemon(spec.name, dict(os.environ))
proc = subprocess.Popen(
_argv_for_daemon(spec.name, spec.argv, env),
list(spec.argv),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=0,
env=env,
env=_env_for_daemon(spec.name, dict(os.environ)),
)
threading.Thread(
target=_pump, args=(spec.name, proc.stdout), daemon=True
@@ -300,8 +303,10 @@ class _Supervisor:
def restart_daemon(self, daemon_name: str, *, grace: float = 5.0) -> bool:
"""Terminate one named child and spawn a fresh one, leaving
the other daemons running. A daemon that has no in-process
reload can be restarted this way after its config file changes.
the other daemons running. Used by the pipelock-apply path:
pipelock has no in-process reload, so apply_allowlist_change
runs `docker kill --signal USR1 <bundle>` after writing the
new yaml; the supervisor catches SIGUSR1 and calls this.
Behavior: SIGTERM wait up to `grace` seconds SIGKILL if
still alive spawn a replacement under the same DaemonSpec.
@@ -309,8 +314,8 @@ class _Supervisor:
forward_signal / shutdown calls reach the new pid.
Returns True iff a daemon by that name was running and a
replacement spawned; False if no such daemon (not wired
for this bottle)."""
replacement spawned; False if no such daemon (the
compose-renderer subset said this bottle doesn't run it)."""
if self.shutdown_at is not None:
_log(f"restart {daemon_name} skipped; supervisor is shutting down")
return False
@@ -362,6 +367,13 @@ def main(argv: Sequence[str] | None = None) -> int:
# delivers SIGHUP to PID 1 (this supervisor); forward it to
# mitmdump so it reloads its addon.
signal.signal(signal.SIGHUP, lambda *_: sup.forward_signal(signal.SIGHUP, "egress")) # type: ignore
# SIGUSR1 pipelock-restart path: pipelock_apply.py runs
# `docker kill --signal USR1 <bundle>` after writing
# pipelock.yaml. Pipelock has no in-process reload, so the
# supervisor restarts the pipelock daemon in place (other
# daemons keep running — specifically supervise, whose MCP
# socket would drop on a whole-container `docker restart`).
signal.signal(signal.SIGUSR1, lambda *_: sup.request_restart("pipelock")) # type: ignore
while not sup.tick():
time.sleep(_POLL_INTERVAL)
+20 -4
View File
@@ -6,7 +6,8 @@ sits on the bottle's internal network and exposes three MCP tools the
agent calls when it hits a stuck-recovery category:
* egress-block agent proposes a new routes.yaml
* capability-block agent proposes a new agent Dockerfile
* pipelock-block agent proposes a new pipelock allowlist
* capability-block agent proposes a new agent Dockerfile
Each tool call: the agent passes the full proposed file plus a
justification text. The sidecar validates the proposal syntactically,
@@ -48,9 +49,13 @@ from pathlib import Path
SUPERVISE_HOSTNAME = "supervise"
SUPERVISE_PORT = 9100
TOOL_EGRESS_BLOCK = "egress-block"
TOOL_PIPELOCK_BLOCK = "pipelock-block"
TOOL_CAPABILITY_BLOCK = "capability-block"
TOOL_LIST_EGRESS_ROUTES = "list-egress-routes"
TOOLS: tuple[str, ...] = (
TOOL_EGRESS_BLOCK,
TOOL_PIPELOCK_BLOCK,
TOOL_CAPABILITY_BLOCK,
TOOL_LIST_EGRESS_ROUTES,
)
@@ -68,8 +73,11 @@ EGRESS_INTROSPECT_URL = "http://_egress.local/allowlist"
# capability-block has no on-disk config the operator edits in place
# (the Dockerfile is rebuilt, not patched), so it has no audit log
# here — those changes are captured by git history + the rebuild
# record laid down in PRD 0016. egress-block was removed in issue #198.
COMPONENT_FOR_TOOL: dict[str, str] = {}
# record laid down in PRD 0016.
COMPONENT_FOR_TOOL: dict[str, str] = {
TOOL_EGRESS_BLOCK: "egress",
TOOL_PIPELOCK_BLOCK: "pipelock",
}
STATUS_APPROVED = "approved"
STATUS_MODIFIED = "modified"
@@ -77,7 +85,8 @@ STATUS_REJECTED = "rejected"
STATUSES: tuple[str, ...] = (STATUS_APPROVED, STATUS_MODIFIED, STATUS_REJECTED)
# Operator-initiated audit entries (no tool call). PRD 0014's
# `routes edit <bottle>` verb writes entries with this action.
# `routes edit <bottle>` and PRD 0015's `pipelock edit <bottle>`
# verbs write entries with this action.
ACTION_OPERATOR_EDIT = "operator-edit"
QUEUE_DIR_IN_CONTAINER = "/run/supervise/queue"
@@ -465,6 +474,8 @@ class Supervise(ABC):
self,
slug: str,
stage_dir: Path,
*,
dockerfile_content: str = "",
) -> SupervisePlan:
"""Stage the per-bottle queue dir on the host and the
current-config dir under `stage_dir`. Returns the plan;
@@ -474,6 +485,9 @@ class Supervise(ABC):
queue_dir.mkdir(parents=True, exist_ok=True)
current_config_dir = stage_dir / "current-config"
current_config_dir.mkdir(parents=True, exist_ok=True)
dockerfile_path = current_config_dir / CURRENT_CONFIG_DOCKERFILE
dockerfile_path.write_text(dockerfile_content)
dockerfile_path.chmod(0o644)
return SupervisePlan(
slug=slug,
queue_dir=queue_dir,
@@ -546,7 +560,9 @@ __all__ = [
"EGRESS_FORWARD_PROXY",
"EGRESS_INTROSPECT_URL",
"TOOL_CAPABILITY_BLOCK",
"TOOL_EGRESS_BLOCK",
"TOOL_LIST_EGRESS_ROUTES",
"TOOL_PIPELOCK_BLOCK",
"archive_proposal",
"audit_dir",
"audit_log_path",
+229 -17
View File
@@ -1,10 +1,8 @@
"""Supervise sidecar HTTP server (PRD 0013).
Per-bottle MCP server exposing tools the agent calls to propose config
changes when stuck. The egress-block tool was removed in issue #198;
the remaining tools are `capability-block` and `list-egress-routes`.
Each queued tool call:
Per-bottle MCP server exposing three tools `egress-block`,
`pipelock-block`, `capability-block` that the agent calls to
propose config changes when stuck. Each tool call:
1. Validates the proposed file syntactically.
2. Writes a Proposal to /run/supervise/queue/ (bind-mounted from
@@ -20,7 +18,7 @@ Speaks MCP over HTTP+JSON-RPC. Methods handled:
* `initialize` handshake; returns server info + caps.
* `notifications/initialized` ack-only.
* `tools/list` returns the tool definitions.
* `tools/list` returns the three tool definitions.
* `tools/call` validates, queues, blocks, returns.
Everything else returns JSON-RPC error -32601 (method not found).
@@ -40,6 +38,7 @@ import sys
import time
import typing
import urllib.error
import urllib.parse
import urllib.request
from dataclasses import dataclass
from pathlib import Path
@@ -135,15 +134,84 @@ def jsonrpc_error(request_id: object, code: int, message: str) -> bytes:
TOOL_DEFINITIONS: list[dict[str, object]] = [
{
"name": _sv.TOOL_EGRESS_BLOCK,
"description": (
"Call when egress refused your HTTPS request — host "
"without a matching route, or a path outside the route's "
"path_allowlist (typically a 403 from the proxy). Propose "
"a SINGLE route to add: the host you need + (optionally) "
"a path_allowlist + (optionally) an auth block. The "
"supervisor merges the route into the live table at "
"approval time — you do NOT need to see or reproduce the "
"existing routes, and you do not pass a full routes file. "
"If the host already has a route, the proposed "
"path_allowlist entries are unioned with the existing "
"ones (host stays single-route). The operator approves "
"or rejects in the supervise TUI. On approval the "
"supervisor writes the merged routes.yaml, SIGHUPs "
"egress (atomic swap, no dropped connections), and "
"mirrors the host onto pipelock's allowlist for the "
"downstream gate."
),
"inputSchema": {
"type": "object",
"properties": {
"host": {
"type": "string",
"description": (
"The hostname to allow (e.g. 'api.github.com'). "
"Case-insensitive on match."
),
},
"path_allowlist": {
"type": "array",
"items": {"type": "string"},
"description": (
"Optional URL path prefixes the route permits. "
"Each must start with '/'. Omit to allow all "
"paths under this host (bare-pass route)."
),
},
"auth": {
"type": "object",
"description": (
"Optional credential injection. {scheme, "
"token_ref}: scheme is 'Bearer' or 'token'; "
"token_ref names the host env var holding the "
"secret value. Omit to add a host without "
"credential injection. Ignored if the host "
"already has a route (operator decides auth "
"changes, not the agent)."
),
"properties": {
"scheme": {"type": "string"},
"token_ref": {"type": "string"},
},
"required": ["scheme", "token_ref"],
"additionalProperties": False,
},
"justification": {
"type": "string",
"description": "Why this host needs to be allowed.",
},
},
"required": ["host", "justification"],
},
},
{
"name": _sv.TOOL_LIST_EGRESS_ROUTES,
"description": (
"List the current egress route table — the bottle's "
"allowlist. Returns JSON with one entry per allowed host, "
"each carrying its matches rules (if any) and whether "
"the proxy injects Authorization for the route. Use this "
"before composing an `egress-block` proposal so the new "
"routes file extends the live one rather than replacing it."
"primary egress allowlist. Returns JSON with one entry "
"per allowed host, each carrying its path_allowlist (if "
"any) and whether the proxy injects Authorization for "
"the route. Use this before composing an "
"`egress-block` proposal so the new routes file "
"extends the live one rather than replacing it. "
"Pipelock's allowlist is a mirror of this set — every "
"host listed here is also reachable through pipelock's "
"downstream hostname gate."
),
"inputSchema": {
"type": "object",
@@ -151,12 +219,48 @@ TOOL_DEFINITIONS: list[dict[str, object]] = [
"additionalProperties": False,
},
},
{
"name": _sv.TOOL_PIPELOCK_BLOCK,
"description": (
"Call when pipelock refused your outbound request and "
"the failing host is genuinely missing from the bottle's "
"allowlist (vs. blocked for DLP reasons — those need a "
"different remediation). In practice pipelock's allowlist "
"is now a mirror of the egress routes set by "
"`egress-block`, so prefer that tool when you want "
"to add a host. This tool stays available for the rare "
"case where pipelock and egress have diverged. "
"Pass the full URL you tried to hit (scheme + host + "
"path); the supervisor extracts the hostname and merges "
"it into pipelock's allowlist. On approval the "
"supervisor restarts pipelock."
),
"inputSchema": {
"type": "object",
"properties": {
"failed_url": {
"type": "string",
"description": (
"The full URL pipelock blocked, e.g. "
"https://api.github.com/repos/foo/bar. Scheme "
"and hostname are required; path is recorded "
"as operator context."
),
},
"justification": {
"type": "string",
"description": "Why the new host should be allowed.",
},
},
"required": ["failed_url", "justification"],
},
},
{
"name": _sv.TOOL_CAPABILITY_BLOCK,
"description": (
"Call when the bottle is missing a tool, skill, permission, "
"or env var you need — something that lives in the agent "
"Dockerfile rather than in the egress routes. "
"Dockerfile rather than in routes or the pipelock allowlist. "
"Read the current Dockerfile from "
"/etc/bot-bottle/current-config/Dockerfile, compose a "
"modified version, and pass the full new file plus a "
@@ -182,10 +286,27 @@ TOOL_DEFINITIONS: list[dict[str, object]] = [
]
# Map each non-egress tool to the input field that carries the agent's
# payload (stored in Proposal.proposed_file). egress-block builds its
# payload from structured input fields in `handle_egress_block`.
# Map each tool to the input field that carries the agent's
# tool-specific payload (stored in Proposal.proposed_file as
# free-form text the apply path interprets per tool).
#
# egress-block: JSON object describing a SINGLE route to
# add — `{host, path_allowlist?, auth?}`. The
# supervisor merges this into the live routes
# file at approval time.
# pipelock-block: the full failed URL (scheme + host + path) —
# supervisor extracts the host, merges into the
# bottle's current allowlist; the path is shown
# to the operator for context (pipelock doesn't
# do path-level matching).
# capability-block: full proposed Dockerfile
#
# Egress-proxy-block doesn't use a single "field name" → the JSON
# payload is constructed from multiple structured input fields in
# `handle_egress_block`. The mapping stays one-entry-per-tool
# so the generic dispatch keeps working for the other two.
PROPOSED_FILE_FIELD: dict[str, str] = {
_sv.TOOL_PIPELOCK_BLOCK: "failed_url",
_sv.TOOL_CAPABILITY_BLOCK: "dockerfile",
}
@@ -193,13 +314,34 @@ PROPOSED_FILE_FIELD: dict[str, str] = {
# --- Validation ------------------------------------------------------------
# Auth schemes accepted on egress-block proposals — match the
# manifest-side EGRESS_AUTH_SCHEMES.
_AUTH_SCHEMES = ("Bearer", "token")
def validate_proposed_file(tool: str, content: str) -> None:
"""Syntactic validation. The operator is the real gate; this just
catches obvious paste-errors / wrong-tool selections before they
enter the queue."""
if not content.strip():
raise _RpcError(ERR_INVALID_PARAMS, f"{tool}: proposed file is empty")
if tool == _sv.TOOL_CAPABILITY_BLOCK:
if tool == _sv.TOOL_PIPELOCK_BLOCK:
# `content` is the full failed URL. Require scheme + host so
# the supervisor can extract a hostname for the allowlist
# merge; the path is preserved for operator context.
parsed = urllib.parse.urlsplit(content.strip())
if parsed.scheme not in ("http", "https"):
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: failed_url must start with http:// or https:// "
f"(got {content!r})",
)
if not parsed.hostname:
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: failed_url is missing a hostname (got {content!r})",
)
elif tool == _sv.TOOL_CAPABILITY_BLOCK:
# Dockerfiles are too varied to validate syntactically beyond
# non-empty. The operator reads the diff in the TUI.
pass
@@ -207,6 +349,70 @@ def validate_proposed_file(tool: str, content: str) -> None:
raise _RpcError(ERR_INVALID_PARAMS, f"unknown tool {tool!r}")
def _validate_and_bundle_egress_route(
args: dict[str, object],
) -> str:
"""Validate egress-block input fields and bundle them into
a JSON string that becomes the Proposal.proposed_file. Raises
_RpcError on bad input the agent retries with a fixed shape."""
tool = _sv.TOOL_EGRESS_BLOCK
host = args.get("host")
if not isinstance(host, str) or not host.strip():
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: 'host' is required and must be a non-empty string",
)
payload: dict[str, object] = {"host": host}
path_allow_raw = args.get("path_allowlist")
if path_allow_raw is not None:
if not isinstance(path_allow_raw, list):
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: 'path_allowlist' must be an array of strings",
)
prefixes: list[str] = []
for i, p in enumerate(path_allow_raw):
if not isinstance(p, str):
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: path_allowlist[{i}] must be a string",
)
if not p.startswith("/"):
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: path_allowlist[{i}] {p!r} must start with '/'",
)
prefixes.append(p)
if prefixes:
payload["path_allowlist"] = prefixes
auth_raw = args.get("auth")
if auth_raw is not None:
if not isinstance(auth_raw, dict):
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: 'auth' must be an object with 'scheme' and 'token_ref'",
)
scheme = auth_raw.get("scheme")
token_ref = auth_raw.get("token_ref")
if not isinstance(scheme, str) or scheme not in _AUTH_SCHEMES:
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: auth.scheme must be one of "
f"{', '.join(_AUTH_SCHEMES)} (got {scheme!r})",
)
if not isinstance(token_ref, str) or not token_ref:
raise _RpcError(
ERR_INVALID_PARAMS,
f"{tool}: auth.token_ref must be a non-empty string "
f"naming the host env var holding the token",
)
payload["auth"] = {"scheme": scheme, "token_ref": token_ref}
return json.dumps(payload, indent=2) + "\n"
# --- MCP handlers ----------------------------------------------------------
@@ -292,7 +498,13 @@ def handle_tools_call(
f"{name}: 'justification' is required and must be a non-empty string",
)
if name in PROPOSED_FILE_FIELD:
if name == _sv.TOOL_EGRESS_BLOCK:
# Structured input → JSON bundle on Proposal.proposed_file.
# The dashboard's apply step (egress_apply.add_route)
# parses this JSON, fetches the current routes, merges in
# the new one, and writes the merged file.
proposed_file = _validate_and_bundle_egress_route(args_raw)
elif name in PROPOSED_FILE_FIELD:
file_field = PROPOSED_FILE_FIELD[name]
proposed_file = args_raw.get(file_field)
if not isinstance(proposed_file, str):
+37 -31
View File
@@ -63,12 +63,18 @@ from typing import cast
class YamlSubsetError(ValueError):
"""Raised when input violates the YAML subset's rules. Callers
that want fatal-exit semantics (manifest loader, egress-apply,
that want fatal-exit semantics (manifest loader, pipelock-apply,
etc.) catch this at their own boundary and forward to `die`;
callers running outside the bot-bottle CLI process (the
egress sidecar's addon) handle it as a normal exception."""
def die(msg: str) -> None:
"""Module-local helper so the parser body reads cleanly. Just
raises YamlSubsetError the `bot-bottle: error: ` prefix
is added by the boundary `die` in `bot_bottle.log`."""
raise YamlSubsetError(msg)
# --- Tokenizer / line preprocessing ----------------------------------------
@@ -113,7 +119,7 @@ def _tokenize(text: str) -> list[_Line]:
# editors render them differently and the spec says spaces.
leading = len(raw) - len(raw.lstrip(" \t"))
if "\t" in raw[:leading]:
raise YamlSubsetError(f"yaml-subset: tab character in indent on line {n}")
die(f"yaml-subset: tab character in indent on line {n}")
stripped = raw.strip()
if not stripped:
continue
@@ -163,14 +169,14 @@ def _parse_scalar(s: str, lineno: int) -> object:
s.startswith("'") and s.endswith("'")
):
if len(s) < 2:
raise YamlSubsetError(f"yaml-subset: unterminated quoted string on line {lineno}")
die(f"yaml-subset: unterminated quoted string on line {lineno}")
body = s[1:-1]
if s.startswith('"'):
# JSON-style escapes for double quotes.
try:
return body.encode("utf-8").decode("unicode_escape")
except UnicodeDecodeError as e:
raise YamlSubsetError(f"yaml-subset: bad escape on line {lineno}: {e}")
die(f"yaml-subset: bad escape on line {lineno}: {e}")
else:
# Single quotes: only '' → ' (standard YAML); no other escapes.
return body.replace("''", "'")
@@ -180,7 +186,7 @@ def _parse_scalar(s: str, lineno: int) -> object:
if s in _RESERVED_BOOL_LIKE:
if s in ("true", "false"):
return s == "true"
raise YamlSubsetError(
die(
f"yaml-subset: bare {s!r} on line {lineno} is ambiguous "
f"(use literal `true` / `false`, or quote it as a string)"
)
@@ -197,22 +203,22 @@ def _parse_scalar(s: str, lineno: int) -> object:
# Look-alikes that we reject to keep the user in control.
if _DATE_RX.match(s):
raise YamlSubsetError(
die(
f"yaml-subset: bare {s!r} on line {lineno} looks like a "
f"date — quote it as a string or use an explicit int"
)
if _OCTAL_RX.match(s):
raise YamlSubsetError(
die(
f"yaml-subset: bare {s!r} on line {lineno} looks like an "
f"octal/0-prefixed integer — quote it as a string"
)
if _HEX_RX.match(s):
raise YamlSubsetError(
die(
f"yaml-subset: bare {s!r} on line {lineno} looks like a "
f"hex integer — quote it as a string"
)
if _FLOAT_RX.match(s):
raise YamlSubsetError(
die(
f"yaml-subset: floats not supported (line {lineno}, "
f"value {s!r}); use an int or quote as a string"
)
@@ -235,7 +241,7 @@ def _parse_inline(s: str, lineno: int) -> object:
s = s.strip()
if s.startswith("["):
if not s.endswith("]"):
raise YamlSubsetError(f"yaml-subset: unterminated `[` on line {lineno}")
die(f"yaml-subset: unterminated `[` on line {lineno}")
body = s[1:-1].strip()
if not body:
return []
@@ -246,21 +252,21 @@ def _parse_inline(s: str, lineno: int) -> object:
return items
if s.startswith("{"):
if not s.endswith("}"):
raise YamlSubsetError(f"yaml-subset: unterminated `{{` on line {lineno}")
die(f"yaml-subset: unterminated `{{` on line {lineno}")
body = s[1:-1].strip()
if not body:
return {}
out: dict[str, object] = {}
for raw in _split_flow(body, lineno, "dict"):
if ":" not in raw:
raise YamlSubsetError(
die(
f"yaml-subset: inline dict entry on line {lineno} "
f"missing `:` ({raw!r})"
)
k, _, v = raw.partition(":")
k = k.strip()
if not _BARE_RX.match(k):
raise YamlSubsetError(
die(
f"yaml-subset: inline dict key on line {lineno} "
f"must be a bare identifier ({k!r})"
)
@@ -290,7 +296,7 @@ def _split_flow(body: str, lineno: int, kind: str) -> list[str]:
elif ch in "]}":
depth_b -= 1
if depth_b > 0:
raise YamlSubsetError(
die(
f"yaml-subset: nested flow {kind} on line "
f"{lineno} (only one level of flow allowed)"
)
@@ -324,7 +330,7 @@ def _split_key_value(content: str, lineno: int) -> tuple[str, str]:
# ambiguous with URLs etc.).
if i + 1 >= len(content) or content[i + 1] in (" ", "\t"):
return content[:i].strip(), content[i + 1:].lstrip()
raise YamlSubsetError(f"yaml-subset: line {lineno} missing `: ` separator: {content!r}")
die(f"yaml-subset: line {lineno} missing `: ` separator: {content!r}")
return "", "" # unreachable, but needed for type checker
@@ -335,15 +341,15 @@ def _parse_block(
to live at `base_indent`. Returns (value, new_idx) where
`new_idx` is the index of the first unconsumed line."""
if idx >= len(lines):
raise YamlSubsetError("yaml-subset: unexpected end of document")
die("yaml-subset: unexpected end of document")
first = lines[idx]
if first.indent < base_indent:
raise YamlSubsetError(
die(
f"yaml-subset: line {first.lineno} indented less than "
f"expected (got {first.indent}, expected >= {base_indent})"
)
if first.indent > base_indent:
raise YamlSubsetError(
die(
f"yaml-subset: line {first.lineno} indented more than "
f"expected (got {first.indent}, expected {base_indent})"
)
@@ -360,18 +366,18 @@ def _parse_block_mapping(
while idx < len(lines) and lines[idx].indent == base_indent:
line = lines[idx]
if line.content.startswith("- "):
raise YamlSubsetError(
die(
f"yaml-subset: line {line.lineno} unexpected list "
f"item at mapping indent (got `-`, expected `key:`)"
)
key, value_text = _split_key_value(line.content, line.lineno)
if not _BARE_RX.match(key):
raise YamlSubsetError(
die(
f"yaml-subset: line {line.lineno} key {key!r} is not "
f"a bare identifier"
)
if key in out:
raise YamlSubsetError(
die(
f"yaml-subset: line {line.lineno} duplicate key {key!r}"
)
if value_text:
@@ -411,7 +417,7 @@ def _parse_block_list(
content_col = base_indent + 2
first_key, first_value_text = _split_key_value(rest, line.lineno)
if not _BARE_RX.match(first_key):
raise YamlSubsetError(
die(
f"yaml-subset: line {line.lineno} key {first_key!r} "
f"is not a bare identifier"
)
@@ -434,12 +440,12 @@ def _parse_block_list(
break # next list item, not a sibling key
k, v_text = _split_key_value(ln.content, ln.lineno)
if not _BARE_RX.match(k):
raise YamlSubsetError(
die(
f"yaml-subset: line {ln.lineno} key {k!r} is "
f"not a bare identifier"
)
if k in item:
raise YamlSubsetError(f"yaml-subset: line {ln.lineno} duplicate key {k!r}")
die(f"yaml-subset: line {ln.lineno} duplicate key {k!r}")
if v_text:
item[k] = _parse_inline(v_text, ln.lineno)
idx += 1
@@ -495,7 +501,7 @@ def parse_yaml_subset(text: str) -> dict[str, object]:
for n, raw in enumerate(text.splitlines(), start=1):
s = raw.strip()
if s.startswith("|") or s.startswith(">") or s.startswith("- |") or s.startswith("- >"):
raise YamlSubsetError(
die(
f"yaml-subset: line {n} uses a multi-line block "
f"scalar (`|` / `>`) — not supported. Use a quoted "
f"single-line string instead."
@@ -505,12 +511,12 @@ def parse_yaml_subset(text: str) -> dict[str, object]:
# not when it's inside a quoted string. Cheap check: any
# bare `&foo:` / `*foo` at the start of a value position.
if re.search(r"(^|\s)[&*][A-Za-z0-9_]+", s):
raise YamlSubsetError(
die(
f"yaml-subset: line {n} uses anchors / aliases "
f"(`&` / `*`) — not supported."
)
if "!!" in s and not (s.count("'") % 2 or s.count('"') % 2):
raise YamlSubsetError(
die(
f"yaml-subset: line {n} uses a YAML tag (`!!`) — not "
f"supported."
)
@@ -520,18 +526,18 @@ def parse_yaml_subset(text: str) -> dict[str, object]:
return {}
base_indent = lines[0].indent
if base_indent != 0:
raise YamlSubsetError(
die(
f"yaml-subset: top-level content must start in column 0 "
f"(got column {base_indent} on line {lines[0].lineno})"
)
value, consumed = _parse_block(lines, 0, 0)
if consumed < len(lines):
raise YamlSubsetError(
die(
f"yaml-subset: trailing content starting on line "
f"{lines[consumed].lineno}"
)
if not isinstance(value, dict):
raise YamlSubsetError("yaml-subset: top-level value must be a mapping")
die("yaml-subset: top-level value must be a mapping")
return cast(dict[str, object], value)
@@ -570,7 +576,7 @@ def parse_frontmatter(text: str) -> tuple[dict[str, object], str]:
fm_end_lineno = line_idx
break
if body_start < 0:
raise YamlSubsetError("frontmatter: opening `---` has no matching closing `---`")
die("frontmatter: opening `---` has no matching closing `---`")
fm_text = text[line_starts[1]:line_starts[fm_end_lineno]] if fm_end_lineno > 1 else ""
fm = parse_yaml_subset(fm_text)
+4 -6
View File
@@ -22,9 +22,7 @@ mounted in. That topology breaks two assumptions those tests make:
`http://127.0.0.1:<host_port>` from inside the job time out.
The affected tests (`test_orphan_cleanup.test_create_and_remove`,
`test_sidecar_bundle_image.TestSidecarBundleImage`,
`test_sidecar_bundle_compose.TestSidecarBundleCompose`) still run
locally where the test process and Docker daemon share a host.
Making them work in CI is a follow-up: either re-write them to
discover container IPs via `docker inspect`, or reconfigure the
runner with host networking.
`test_pipelock_sidecar_smoke.test_smoke`) still run locally where the
test process and Docker daemon share a host. Making them work in CI
is a follow-up: either re-write them to discover container IPs via
`docker inspect`, or reconfigure the runner with host networking.
+4 -4
View File
@@ -13,13 +13,13 @@ Add Content-Length validation and a body-size cap to `git_http_backend.py` so ma
`bot_bottle/git_http_backend.py` calls `int(self.headers.get("Content-Length", 0))` without catching `ValueError`. A request with a non-numeric Content-Length raises an unhandled exception in the request handler.
The handler reads the full declared length into memory before passing the body to `git http-backend` with no upper bound. A local or compromised client can force arbitrarily high memory use.
The handler reads the full declared length into memory before passing the body to `git http-backend` with no upper bound. A local or compromised client can force arbitrarily high memory use. For comparison, `supervise_server.py` caps request bodies at 1 MiB.
## Goals / Success Criteria
- A missing or non-numeric Content-Length returns HTTP 400.
- A negative Content-Length returns HTTP 400.
- A body larger than the cap (100 MiB) returns HTTP 413.
- A body larger than the cap (1 MiB, matching `supervise_server.py`) returns HTTP 413.
- Valid Git smart-HTTP pushes and fetches continue to work.
- Unit tests cover: missing length, non-numeric length, negative length, over-cap length, and a valid push/fetch passthrough.
@@ -43,12 +43,12 @@ Out of scope:
## Design
Wrap the Content-Length parse in a try/except and return 400 on `ValueError`. Add an explicit check for negative values. After parsing, compare the declared length against a module-level `MAX_BODY_BYTES` constant (default 100 MiB) and return 413 if exceeded. Read exactly `min(content_length, MAX_BODY_BYTES)` bytes.
Wrap the Content-Length parse in a try/except and return 400 on `ValueError`. Add an explicit check for negative values. After parsing, compare the declared length against a module-level `MAX_BODY_BYTES` constant (default 1 MiB) and return 413 if exceeded. Read exactly `min(content_length, MAX_BODY_BYTES)` bytes.
## Testing Strategy
- Unit tests using `unittest.mock` to drive the handler with crafted headers.
- Test cases: no Content-Length header, `Content-Length: abc`, `Content-Length: -1`, a declared length above `MAX_BODY_BYTES`, and a normal small POST body.
- Test cases: no Content-Length header, `Content-Length: abc`, `Content-Length: -1`, `Content-Length: 2097152` (over cap), and a normal small POST body.
Run:

Some files were not shown because too many files have changed in this diff Show More