fix(sidecars): per-daemon pipelock restart keeps supervise socket alive
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s

`apply_allowlist_change` used `docker restart <bundle>` to make
pipelock reload, which bounced ALL four daemons — including
supervise, whose MCP socket the agent's claude-code client had
open. That dropped the connection. A second apply works because
supervise has come back up by then.

Fix: per-daemon restart via SIGUSR1.

- New `_Supervisor.restart_daemon(name)` terminates one named
  child and spawns a replacement in place. Other daemons keep
  running.
- main() wires SIGUSR1 → `restart_daemon("pipelock")`. Pipelock
  has no in-process reload, so this is its analog of egress's
  SIGHUP-reload-addon path. Pipelock is the only daemon that
  currently needs hot-config reload via restart; if others
  acquire the need, add a new signal.
- `apply_allowlist_change` now `docker kill --signal USR1
  <bundle>` instead of `docker restart`. Supervise / egress /
  git-gate keep running across the apply.

Tests:
- New `_Supervisor.restart_daemon` cases: replaces in place
  (different pid post-restart, sibling daemon unchanged),
  unknown name is a no-op, restart-during-shutdown is a no-op.
- `test_pipelock_apply` rewritten to bring up the bundle image
  with `CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock` so the
  supervisor is PID 1 and handles SIGUSR1. The previous
  standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock
  default disposition is terminate). Test builds the bundle
  image in setUpClass (cached layers make repeat runs fast).

531 tests passing locally (unit + integration).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 02:12:37 -04:00
parent c48f791d7d
commit 5b9ceaaaee
4 changed files with 140 additions and 42 deletions
+15 -15
View File
@@ -5,7 +5,9 @@ Used by the supervise dashboard when the operator approves a
pipelock-block proposal (or runs the operator-initiated `pipelock
edit <bottle>` verb). Fetches the current pipelock.yaml via `docker
exec`, parses it, swaps the api_allowlist with the proposed hosts,
re-renders, writes back via `docker cp`, then `docker restart` so
re-renders, writes back via the bind-mount path, then signals the
bundle supervisor to restart the pipelock daemon (`docker kill
--signal USR1`) so
pipelock picks up the new config.
v1 uses restart, not SIGHUP — pipelock has no in-process reload
@@ -130,19 +132,17 @@ def apply_allowlist_change(
2. Fetch + parse current pipelock.yaml.
3. Replace api_allowlist with the proposed hosts; re-render.
4. Write the new yaml to the bind-mount source.
5. `docker restart` the bundle so pipelock reloads.
The restart bounces ALL four daemons inside the bundle, not
just pipelock — pipelock has no in-process reload and the
bundle init re-spawns the four daemons on container restart.
Per-daemon reload would need a supervisor IPC channel (PRD
0024 open question 1's "eventually" path); the bundle-wide
restart is the v1 trade-off.
5. `docker kill --signal USR1 <bundle>` so the supervisor
restarts the pipelock daemon in place (leaving egress,
git-gate, and supervise running). Pipelock has no
in-process reload; the supervisor's per-daemon restart
keeps the agent's MCP socket alive — a whole-bundle
`docker restart` would bounce supervise too.
Returns (before, after) where both are one-per-line allowlist
strings (operator-facing format). Raises PipelockApplyError on
any failure; the sidecar's existing config stays in place until
the host write succeeds, and the restart is what makes it
the host write succeeds, and the SIGUSR1 is what makes it
live."""
new_hosts = parse_allowlist_content(new_allowlist_content)
container = sidecar_bundle_container_name(slug)
@@ -167,9 +167,9 @@ def apply_allowlist_change(
# FILE — same Docker single-file inode issue as egress_apply:
# write-temp-then-rename swaps the host inode and leaves the
# container's mount pointing at the orphaned old one. Write
# in-place. `docker restart` below picks up the new content
# (and pipelock has no in-process reload anyway, so the
# restart is what makes it live regardless of write atomicity).
# in-place. The SIGUSR1 below makes the new content live
# (pipelock has no in-process reload, so the supervisor
# restarts the pipelock daemon in response).
target = _pipelock_yaml_host_path(slug)
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(rendered)
@@ -177,12 +177,12 @@ def apply_allowlist_change(
# fine — but 0o600 matches what prepare wrote.
target.chmod(0o600)
restart = subprocess.run(
["docker", "restart", container],
["docker", "kill", "--signal", "USR1", container],
capture_output=True, text=True, check=False,
)
if restart.returncode != 0:
raise PipelockApplyError(
f"failed to restart {container}: "
f"failed to signal {container} for pipelock restart: "
f"{(restart.stderr or '').strip()}"
)