fix(sidecar): queue restart signals

2026-06-02 07:52:19 +00:00
parent 1b34b1df85
commit 31708abfad
3 changed files with 135 additions and 33 deletions
@@ -100,11 +100,11 @@ loop should continue to call `tick()` and sleep on `_POLL_INTERVAL`; `tick()`
 then performs the actual `restart_daemon("pipelock")` work while normal Python
 control flow is in the supervisor loop.

-Repeated restart requests should not overlap. Either coalescing or FIFO
-serialization is acceptable, but the PRD prefers coalescing by daemon name: if
-three SIGUSR1 signals arrive before the next loop turn, one pipelock restart is
-enough because each restart rereads the latest `pipelock.yaml` from disk.
-Document this because it is a semantic choice.
+Repeated restart requests should not overlap. Restart requests coalesce by
+daemon name: if three SIGUSR1 signals arrive before the next loop turn, one
+pipelock restart is enough because each restart rereads the latest
+`pipelock.yaml` from disk. This treats SIGUSR1 as "make pipelock reflect the
+current config" rather than "run exactly one restart per signal."

 Shutdown wins over restart. If SIGTERM/SIGINT is received while a restart is
 pending, the supervisor should drop the pending restart and terminate live
@@ -116,9 +116,9 @@ between bytecodes and cannot interrupt a single blocking `wait()` until control
 returns to Python.

 Exit-code behavior should be documented as "positive failures win, otherwise
-return the maximum observed child return code." That matches the current intent:
-positive process failures remain visible, while a clean shutdown of only
-signal-terminated children does not hide an earlier crash.
+return zero." Positive process failures remain visible, while a clean shutdown
+of only zero-exit or signal-terminated children returns zero instead of leaking
+platform-specific negative signal return codes to the container exit status.

 ## Implementation Chunks

@@ -148,10 +148,4 @@ Also run the full unit suite before merge:

 ## Open Questions

- Should repeated restart requests be coalesced by daemon name, or should the
-  supervisor preserve every queued request? Coalescing is simpler and appears
-  sufficient because pipelock rereads the latest config on restart.
- Should exit-code handling clamp all negative signal return codes to zero
-  when no positive child failure occurred, or should it continue returning the
-  maximum raw child return code? The current tests tolerate platform-specific
-  negative signal codes; tightening this would be a behavior change.
+None.