fix(sidecars): per-daemon pipelock restart keeps supervise socket alive #61

Merged
didericis merged 1 commits from fix-pipelock-restart-keeps-bundle-up into main 2026-05-27 02:14:34 -04:00
Owner

Summary

Fixes the MCP-socket-drops-on-route-apply behavior: applying a route change made the agent's MCP client lose its connection until a second apply. Cause: apply_allowlist_change used docker restart <bundle>, which bounced all four daemons including supervise — the exact whole-bundle restart PR #59's notes called out as a v1 trade-off. The user's bug report is that trade-off being unworkable in practice.

Fix is per-daemon restart via SIGUSR1.

What changed

  • _Supervisor.restart_daemon(name) — terminates one named child (SIGTERM → SIGKILL grace), spawns a replacement under the same DaemonSpec. Updates self.procs[idx] in place so subsequent forward_signal / request_shutdown calls reach the new pid.
  • SIGUSR1 wiring in main()signal.signal(signal.SIGUSR1, lambda *_: sup.restart_daemon("pipelock")). Pipelock has no in-process reload (per the existing comment); SIGUSR1 is its analog of the SIGHUP-reload-addon path egress uses. Pipelock is the only daemon that needs this today.
  • apply_allowlist_changedocker kill --signal USR1 <bundle> instead of docker restart <bundle>. Supervise / egress / git-gate keep running across the apply. The MCP socket stays open.

Tests

  • 3 new _Supervisor.restart_daemon cases: replaces in place (different pid post-restart, sibling daemon unchanged), unknown name is a no-op, restart-during-shutdown is a no-op.
  • test_pipelock_apply rewritten: brings up the bundle image with CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock so the supervisor is PID 1 and handles SIGUSR1. The previous standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock default disposition is terminate). Bundle image is built in setUpClass; cached layers make repeats fast.

Test status

531 tests passing locally (unit + integration), 1 skipped (the existing GITEA_ACTIONS guard).

## Summary Fixes the MCP-socket-drops-on-route-apply behavior: applying a route change made the agent's MCP client lose its connection until a second apply. Cause: `apply_allowlist_change` used `docker restart <bundle>`, which bounced all four daemons including supervise — the exact whole-bundle restart PR #59's notes called out as a v1 trade-off. The user's bug report is that trade-off being unworkable in practice. Fix is per-daemon restart via SIGUSR1. ## What changed - **`_Supervisor.restart_daemon(name)`** — terminates one named child (SIGTERM → SIGKILL grace), spawns a replacement under the same DaemonSpec. Updates `self.procs[idx]` in place so subsequent `forward_signal` / `request_shutdown` calls reach the new pid. - **SIGUSR1 wiring in `main()`** — `signal.signal(signal.SIGUSR1, lambda *_: sup.restart_daemon("pipelock"))`. Pipelock has no in-process reload (per the existing comment); SIGUSR1 is its analog of the SIGHUP-reload-addon path egress uses. Pipelock is the only daemon that needs this today. - **`apply_allowlist_change`** — `docker kill --signal USR1 <bundle>` instead of `docker restart <bundle>`. Supervise / egress / git-gate keep running across the apply. The MCP socket stays open. ## Tests - 3 new `_Supervisor.restart_daemon` cases: replaces in place (different pid post-restart, sibling daemon unchanged), unknown name is a no-op, restart-during-shutdown is a no-op. - `test_pipelock_apply` rewritten: brings up the bundle image with `CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock` so the supervisor is PID 1 and handles SIGUSR1. The previous standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock default disposition is terminate). Bundle image is built in setUpClass; cached layers make repeats fast. ## Test status 531 tests passing locally (unit + integration), 1 skipped (the existing GITEA_ACTIONS guard).
didericis added 1 commit 2026-05-27 02:12:53 -04:00
fix(sidecars): per-daemon pipelock restart keeps supervise socket alive
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
5b9ceaaaee
`apply_allowlist_change` used `docker restart <bundle>` to make
pipelock reload, which bounced ALL four daemons — including
supervise, whose MCP socket the agent's claude-code client had
open. That dropped the connection. A second apply works because
supervise has come back up by then.

Fix: per-daemon restart via SIGUSR1.

- New `_Supervisor.restart_daemon(name)` terminates one named
  child and spawns a replacement in place. Other daemons keep
  running.
- main() wires SIGUSR1 → `restart_daemon("pipelock")`. Pipelock
  has no in-process reload, so this is its analog of egress's
  SIGHUP-reload-addon path. Pipelock is the only daemon that
  currently needs hot-config reload via restart; if others
  acquire the need, add a new signal.
- `apply_allowlist_change` now `docker kill --signal USR1
  <bundle>` instead of `docker restart`. Supervise / egress /
  git-gate keep running across the apply.

Tests:
- New `_Supervisor.restart_daemon` cases: replaces in place
  (different pid post-restart, sibling daemon unchanged),
  unknown name is a no-op, restart-during-shutdown is a no-op.
- `test_pipelock_apply` rewritten to bring up the bundle image
  with `CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock` so the
  supervisor is PID 1 and handles SIGUSR1. The previous
  standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock
  default disposition is terminate). Test builds the bundle
  image in setUpClass (cached layers make repeat runs fast).

531 tests passing locally (unit + integration).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
didericis merged commit a7ed571cf9 into main 2026-05-27 02:14:34 -04:00
Sign in to join this conversation.