fix(sidecar): queue restart signals

This commit is contained in:
2026-06-02 07:52:19 +00:00
parent 1b34b1df85
commit 31708abfad
3 changed files with 135 additions and 33 deletions
@@ -100,11 +100,11 @@ loop should continue to call `tick()` and sleep on `_POLL_INTERVAL`; `tick()`
then performs the actual `restart_daemon("pipelock")` work while normal Python
control flow is in the supervisor loop.
Repeated restart requests should not overlap. Either coalescing or FIFO
serialization is acceptable, but the PRD prefers coalescing by daemon name: if
three SIGUSR1 signals arrive before the next loop turn, one pipelock restart is
enough because each restart rereads the latest `pipelock.yaml` from disk.
Document this because it is a semantic choice.
Repeated restart requests should not overlap. Restart requests coalesce by
daemon name: if three SIGUSR1 signals arrive before the next loop turn, one
pipelock restart is enough because each restart rereads the latest
`pipelock.yaml` from disk. This treats SIGUSR1 as "make pipelock reflect the
current config" rather than "run exactly one restart per signal."
Shutdown wins over restart. If SIGTERM/SIGINT is received while a restart is
pending, the supervisor should drop the pending restart and terminate live
@@ -116,9 +116,9 @@ between bytecodes and cannot interrupt a single blocking `wait()` until control
returns to Python.
Exit-code behavior should be documented as "positive failures win, otherwise
return the maximum observed child return code." That matches the current intent:
positive process failures remain visible, while a clean shutdown of only
signal-terminated children does not hide an earlier crash.
return zero." Positive process failures remain visible, while a clean shutdown
of only zero-exit or signal-terminated children returns zero instead of leaking
platform-specific negative signal return codes to the container exit status.
## Implementation Chunks
@@ -148,10 +148,4 @@ Also run the full unit suite before merge:
## Open Questions
- Should repeated restart requests be coalesced by daemon name, or should the
supervisor preserve every queued request? Coalescing is simpler and appears
sufficient because pipelock rereads the latest config on restart.
- Should exit-code handling clamp all negative signal return codes to zero
when no positive child failure occurred, or should it continue returning the
maximum raw child return code? The current tests tolerate platform-specific
negative signal codes; tightening this would be a behavior change.
None.