Commit Graph

7 Commits

Author SHA1 Message Date
didericis 5b9ceaaaee fix(sidecars): per-daemon pipelock restart keeps supervise socket alive
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
`apply_allowlist_change` used `docker restart <bundle>` to make
pipelock reload, which bounced ALL four daemons — including
supervise, whose MCP socket the agent's claude-code client had
open. That dropped the connection. A second apply works because
supervise has come back up by then.

Fix: per-daemon restart via SIGUSR1.

- New `_Supervisor.restart_daemon(name)` terminates one named
  child and spawns a replacement in place. Other daemons keep
  running.
- main() wires SIGUSR1 → `restart_daemon("pipelock")`. Pipelock
  has no in-process reload, so this is its analog of egress's
  SIGHUP-reload-addon path. Pipelock is the only daemon that
  currently needs hot-config reload via restart; if others
  acquire the need, add a new signal.
- `apply_allowlist_change` now `docker kill --signal USR1
  <bundle>` instead of `docker restart`. Supervise / egress /
  git-gate keep running across the apply.

Tests:
- New `_Supervisor.restart_daemon` cases: replaces in place
  (different pid post-restart, sibling daemon unchanged),
  unknown name is a no-op, restart-during-shutdown is a no-op.
- `test_pipelock_apply` rewritten to bring up the bundle image
  with `CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock` so the
  supervisor is PID 1 and handles SIGUSR1. The previous
  standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock
  default disposition is terminate). Test builds the bundle
  image in setUpClass (cached layers make repeat runs fast).

531 tests passing locally (unit + integration).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 02:12:37 -04:00
didericis 62f6f8db34 refactor(sidecars): bundle is the only shape (PRD 0024 chunk 5)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
The CLAUDE_BOTTLE_SIDECAR_BUNDLE feature flag is gone. Every
bottle ships with the agent + bundle pair — no opt-in, no legacy
four-sidecar fallback.

Changes:

- Renderer (compose.py): bottle_plan_to_compose unconditionally
  emits {agent, sidecars}. Deleted _pipelock_service,
  _git_gate_service, _egress_service, _supervise_service helpers.
  _agent_service.depends_on collapses to ["sidecars"].

- sidecar_bundle.py: deleted sidecar_bundle_enabled (the flag
  parser). SIDECAR_BUNDLE_IMAGE + container-name helper stay.

- pipelock_apply.py: docker cp + docker restart now target
  sidecar_bundle_container_name(slug). Bundle restart bounces
  all four daemons together (per-daemon reload is the eventual
  feature, not v1).

- Per-sidecar modules trimmed:
  - egress.py: dropped EGRESS_IMAGE, EGRESS_DOCKERFILE,
    build_egress_image, egress_url. Kept EGRESS_PORT, CA paths,
    egress_container_name (still used by the renderer's network
    aliases).
  - git_gate.py: dropped GIT_GATE_IMAGE, GIT_GATE_DOCKERFILE,
    build_git_gate_image. Kept git_gate_host + GIT_GATE_PORT.
  - supervise.py: dropped SUPERVISE_IMAGE, SUPERVISE_DOCKERFILE,
    build_supervise_image, supervise_url.

- Deleted Dockerfile.{egress,git-gate,supervise}. The bundle's
  Dockerfile.sidecars is the only sidecar image now.

- test_compose.py: deleted TestPipelockAlwaysPresent,
  TestConditionalGitGate, TestConditionalEgress,
  TestConditionalSupervise, TestFullMatrix (legacy-shape only),
  TestSidecarBundleFlag (flag is gone). TestSidecarBundleShape
  drops its patch.dict wrapper. TestAgentAlwaysPresent's
  depends_on cases collapse to one.

- test_pipelock_apply.py: bringup container name uses
  sidecar_bundle_container_name(slug) to match the production
  target.

- README.md Architecture section rewritten to describe the
  agent + bundle pair.

Net: -626 lines.

Test status: 498 unit + 27 integration + 1 skipped (chunk-4
pending — superseded by this chunk's rewrite). Locally verified
end-to-end bottle launch produces exactly 2 containers
(claude-bottle-<slug> + claude-bottle-sidecars-<slug>).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:37:21 -04:00
didericis 3c2585cb98 fix(apply): write routes/pipelock yaml in place, not via rename
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
PRD 0018 chunk 3's atomicity fix used write-temp-then-rename to
update bind-mounted config files. POSIX rename atomically swaps
the inode at the host path — but Docker single-file bind mounts
on Linux pin the source inode at mount time, so post-rename the
container's mount points at the now-orphaned old inode and never
sees the new content. The egress sidecar's SIGHUP-driven reload
re-reads the same stale file → "egress route updates aren't
updatable via the supervisor anymore".

Switch egress_apply + pipelock_apply to write in place (same
inode, truncated + rewritten). Lose file-level POSIX atomicity,
but:

  - egress: SIGHUP fires only AFTER the write returns; the
    addon's `load_routes` raises `ValueError` on a partial read
    and keeps the previous in-memory routes, so the in-process
    race window (already narrow) is non-disruptive.
  - pipelock: applies via `docker restart` rather than SIGHUP;
    restart serializes after the host write completes, so the
    container reads the fully-written file on next boot.

macOS Docker Desktop's file-sharing layer (virtiofs / osxfs)
silently re-resolves the path on rename, which is why this bug
didn't surface in dev tests on macOS. Linux native Docker is
the strict reading; the fix works on both.
2026-05-26 02:31:46 -04:00
didericis c9825cf701 refactor(egress): write routes.yaml as actual YAML, not JSON-in-yml
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
`egress_render_routes` now emits hand-rolled YAML in the same style
as `pipelock_render_yaml`. The egress addon parses it via
`yaml_subset.parse_yaml_subset` — the same parser the manifest
loader + pipelock_apply use.

Why bother: routes.yaml is bind-mounted into the egress sidecar
AND surfaced to operators through `routes edit` (PRD 0019). JSON-
in-yml renders ugly in $EDITOR and signals "this is data" rather
than "this is config you can read at a glance". Real YAML reads
cleanly.

Mechanics:

  - `yaml_subset.py` drops its `claude_bottle.log` dependency.
    Errors now raise `YamlSubsetError` (a `ValueError`); the
    manifest loader + pipelock_apply catch it at the boundary
    and forward to `die` / `PipelockApplyError` so callers see
    the same behavior they did before.
  - `Dockerfile.egress` adds one COPY line for `yaml_subset.py`
    so it sits flat in `/app/` next to the addon. The addon
    uses an absolute-import-with-fallback shim so the same file
    works inside the container AND from the host's unit tests.
  - `egress_apply._merge_single_route` round-trips current
    routes.yaml through `parse_yaml_subset` + a new
    `_render_routes_payload` helper instead of `json.loads` +
    `json.dumps`.

End-to-end: rebuilt the egress image, ran `./cli.py start` to a
full bring-up, confirmed the addon's boot log shows `egress:
loaded 9 route(s)` — i.e., the YAML parses inside the container.
453 unit + 3 integration tests pass.
2026-05-26 02:17:42 -04:00
didericis cefdc8c6e9 feat(launch): switch start to docker compose project per bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 3. Each instance is now one `docker compose` project:

  - launch.py renders the compose spec via chunk-1's
    bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml,
    `docker compose up -d`s, and (on teardown) dumps
    `docker compose logs --no-color --timestamps` to
    state/<slug>/compose.log before `docker compose down`.
  - Networks are pre-created (`docker network create --internal` +
    user-defined bridge) so pipelock yaml can know the internal CIDR
    before compose-up. Compose references them with `external: true`;
    the launch step's ExitStack still owns network removal.
  - Agent still runs `sleep infinity`; claude reaches it via
    `docker exec -it` exactly like before (per the PRD's resolved
    TTY question).
  - metadata.json grows a `compose_project` field so dashboard /
    cleanup tooling can derive compose invocations without
    re-deriving the slug.

Security follow-ups from chunk-2 review:

  (b) CA private keys: pipelock + egress ca-key.pem land at 0o600
      explicitly. The mitmproxy cert+key concat stays 0o644 because
      the egress container's uid-1000 user reads it through the
      bind mount; parent dir at 0o700 still restricts host-side
      reach.
  (c) Apply atomicity: egress_apply + pipelock_apply switch from
      `docker cp` to host-side write-temp-then-rename on the
      bind-mount source. POSIX rename is atomic on the same
      filesystem, so a sidecar SIGHUP racing the apply can't see
      a half-written routes.yaml / pipelock.yaml.

Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the
integration test suite drives them directly to validate each image
in isolation, which is still useful. launch.py no longer calls
them; a follow-up chunk can prune if the integration tests move to
the compose lifecycle.

git-gate entrypoint's chmod 600 on the keyfile + known_hosts now
tolerates EROFS (`|| true`) — the host SSH key is already 0600
(SSH refuses to load otherwise), so the inside-container chmod
was already a no-op in the docker-cp path and now just needs to
not error on the read-only bind mount.

422 unit tests pass; supervise integration test passes; end-to-end
`./cli.py start implementer` brings up the project, attaches,
captures full merged logs on teardown, and reaps all containers +
networks.
2026-05-25 23:16:40 -04:00
didericis 4fada1651b test(pipelock): integration test for apply_allowlist_change (PRD 0015)
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m8s
Phase 4 of PRD 0015. End-to-end test against real Docker:

- Brings up a real pipelock sidecar via the production
  DockerPipelockProxy bring-up + pipelock_tls_init.
- Calls apply_allowlist_change to add a new host.
- Polls the live /etc/pipelock.yaml until the new host shows up
  (bridging the docker-restart window).
- Verifies api_allowlist contains both old + new hosts and
  tls_interception block is preserved.
- Smaller cases: invalid hostname raises, missing sidecar raises,
  fetch_current_allowlist returns one-per-line format.

Skipped under GITEA_ACTIONS because pipelock_tls_init bind-mounts a
host path that doesn't share fs in the runner, matching the
existing pipelock smoke test's skip pattern.

Drive-by fix: fetch_current_yaml now uses `docker cp` (daemon-API
tarball copy) instead of `docker exec cat` because the pipelock
image is distroless and has no shell utilities.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:07:26 -04:00
didericis c05457fbef feat(pipelock): host-side apply_allowlist_change helper (PRD 0015)
Phase 1 of PRD 0015. New module
claude_bottle/backend/docker/pipelock_apply.py:

- fetch_current_yaml(slug): docker exec cat of the live
  /etc/pipelock.yaml.
- fetch_current_allowlist(slug): parses the yaml, extracts
  api_allowlist, renders as one-per-line for the operator/agent.
- parse_allowlist_content / render_allowlist_content: one-per-line
  with `#` comments + blank-line tolerance, conservative hostname
  validation.
- apply_allowlist_change(slug, new): parses new hosts, fetches +
  parses current yaml, swaps api_allowlist, re-renders via
  pipelock_render_yaml, docker cp into sidecar, docker restart.
  Returns (before, after) as one-per-line strings for the audit diff.
- PipelockApplyError: caller surfaces to operator without crashing
  the dashboard.

v1 uses restart, not SIGHUP — pipelock has no in-process reload
hook; adding one is the PRD's open question. Restart drops in-flight
outbound calls and the agent retries pick up the restarted proxy.

Yaml roundtrip is covered by tests: parse(render(cfg)) preserves
all fields pipelock_render_yaml emits, including tls_interception
+ passthrough_domains.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:59:13 -04:00