fix(git-gate): bound daemon client sessions #114

Merged
didericis merged 4 commits from fix/git-gate-daemon-timeouts into main 2026-06-01 14:38:20 -04:00
Collaborator

Summary

  • Add bounded git daemon idle/init timeouts for git-gate
  • Prevent interrupted push sessions from leaving receive-pack children wedged indefinitely
  • Add unit coverage for the rendered daemon timeout flags

Closes #113

Tests

  • python3 -m unittest tests.unit.test_git_gate
## Summary - Add bounded git daemon idle/init timeouts for git-gate - Prevent interrupted push sessions from leaving receive-pack children wedged indefinitely - Add unit coverage for the rendered daemon timeout flags Closes #113 ## Tests - python3 -m unittest tests.unit.test_git_gate
didericis-codex added 1 commit 2026-05-29 21:59:40 -04:00
fix(git-gate): bound daemon client sessions
test / unit (pull_request) Successful in 34s
test / integration (pull_request) Successful in 44s
de2267d1b4
didericis reviewed 2026-05-29 22:00:49 -04:00
@@ -44,0 +44,4 @@
# Bound half-open git client sessions. If an agent/tool runner is
# interrupted during push, git daemon should reap the receive-pack
# child instead of keeping the gate wedged indefinitely.
GIT_GATE_DAEMON_TIMEOUT_SECS = 60
Owner

Seems a bit long... can we reduce to 15s (weigh pros and cons of this)

Seems a bit long... can we reduce to 15s (weigh pros and cons of this)
didericis-codex marked this conversation as resolved
didericis added 1 commit 2026-05-29 22:02:28 -04:00
fix(git-gate): shorten daemon client timeout
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 42s
7bffaa791c
Author
Collaborator

Adjusted this to 15s in 7bffaa7. The tradeoff looks right here: this timeout covers the local VM-to-git-gate daemon session, so a healthy push should keep sending pack/protocol data and should not sit idle for 15 seconds. The upside is much faster cleanup for interrupted tool runs; the main downside is that an extremely stalled local transfer could be terminated sooner, but upstream SSH already has its own bounded connect behavior and this path is local to the bottle/sidecar.

Re-ran python3 -m unittest tests.unit.test_git_gate successfully.

Adjusted this to 15s in `7bffaa7`. The tradeoff looks right here: this timeout covers the local VM-to-git-gate daemon session, so a healthy push should keep sending pack/protocol data and should not sit idle for 15 seconds. The upside is much faster cleanup for interrupted tool runs; the main downside is that an extremely stalled local transfer could be terminated sooner, but upstream SSH already has its own bounded connect behavior and this path is local to the bottle/sidecar. Re-ran `python3 -m unittest tests.unit.test_git_gate` successfully.
Owner

I'm also wondering if there's a better solution than just a timeout... the daemon should be getting a permissions error, right? Can't we have a better way to fail fast on permission errors when trying to push?

I'm also wondering if there's a better solution than just a timeout... the daemon should be getting a permissions error, right? Can't we have a better way to fail fast on permission errors when trying to push?
Author
Collaborator

I do not think this instance was a permission-error path. The stuck state happened before git-gate had accepted the pack and run the pre-receive hook: the attempted commit was not present in /git/bot-bottle.git, and the matching sidecar children were git receive-pack waiting on the client while the VM still had git push / pack-objects children alive after the tool interruption. A direct no-op upstream SSH push from git-gate to Gitea also succeeded, so upstream permission/auth was healthy.

If the push reaches the hook and upstream rejects it, we should already fail fast: the hook runs GIT_SSH_COMMAND=... git push origin ..., prints git-gate: upstream push failed for <ref>, and exits non-zero. The bounded daemon timeout is aimed at the different failure mode where the local client side is abandoned mid-protocol and neither side gets a clean EOF.

A stronger follow-up would be adding a regression/integration test that simulates an interrupted local git client and asserts the receive-pack child is reaped. I would keep that separate from this small fix because it needs process-level harnessing around git daemon.

I do not think this instance was a permission-error path. The stuck state happened before git-gate had accepted the pack and run the pre-receive hook: the attempted commit was not present in `/git/bot-bottle.git`, and the matching sidecar children were `git receive-pack` waiting on the client while the VM still had `git push` / `pack-objects` children alive after the tool interruption. A direct no-op upstream SSH push from git-gate to Gitea also succeeded, so upstream permission/auth was healthy. If the push reaches the hook and upstream rejects it, we should already fail fast: the hook runs `GIT_SSH_COMMAND=... git push origin ...`, prints `git-gate: upstream push failed for <ref>`, and exits non-zero. The bounded daemon timeout is aimed at the different failure mode where the local client side is abandoned mid-protocol and neither side gets a clean EOF. A stronger follow-up would be adding a regression/integration test that simulates an interrupted local git client and asserts the receive-pack child is reaped. I would keep that separate from this small fix because it needs process-level harnessing around git daemon.
Owner

I'm fairly sure the error happened specifically after a https push to something blocked by the egress proxy, not git-gate. (EDIT: seems like this was wrong/I misdiagnosed the issue). I'd like to try to replicate exactly what caused the hang so we can have misbehaving agents that try to https fail fast, which I don't think they do.

Regardless, we should try to replicate exactly what caused the hang/write a test for that in this PR. This project is pre-release/no need for a quick fix, want to go deep.

~~I'm fairly sure the error happened specifically after a *https* push to something blocked by the *egress proxy*, not git-gate.~~ (EDIT: seems like this was wrong/I misdiagnosed the issue). I'd like to try to replicate exactly what caused the hang so we can have misbehaving agents that try to https fail fast, which I don't think they do. Regardless, we should try to replicate exactly what caused the hang/write a test for that in this PR. This project is pre-release/no need for a quick fix, want to go deep.
didericis added 1 commit 2026-05-29 22:13:38 -04:00
test(egress): cover blocked git push fail-fast
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 42s
630e65e9a4
Author
Collaborator

Agreed. I added 630e65e with a regression test that exercises the HTTPS/egress-shaped failure path using a real git push client.

The test spins up a local smart-HTTP-shaped endpoint that returns the same egress 403 when git asks for info/refs?service=git-receive-pack, then asserts the git process exits within the test timeout instead of hanging. That covers the misbehaving-agent case where it tries to push over HTTP(S) through egress and should fail fast before any pack transfer or git-gate path is involved.

I kept the daemon timeout too, because it covers the separate half-open git-gate client/session cleanup case we observed after the interrupted push.

Tested with:

python3 -m unittest tests.unit.test_egress_addon_core tests.unit.test_git_gate
Agreed. I added `630e65e` with a regression test that exercises the HTTPS/egress-shaped failure path using a real `git push` client. The test spins up a local smart-HTTP-shaped endpoint that returns the same egress 403 when git asks for `info/refs?service=git-receive-pack`, then asserts the git process exits within the test timeout instead of hanging. That covers the misbehaving-agent case where it tries to push over HTTP(S) through egress and should fail fast before any pack transfer or git-gate path is involved. I kept the daemon timeout too, because it covers the separate half-open git-gate client/session cleanup case we observed after the interrupted push. Tested with: ``` python3 -m unittest tests.unit.test_egress_addon_core tests.unit.test_git_gate ```
Owner

Doesn't solve the core issue, which I'm still getting. Basic test is this:

  1. Start up an agent in a bottle (am using my local claude-implementer)
  2. Ask the agent to clone a repo (it does this successfully)
  3. Ask the agent to push an empty test branch (this fails and hangs, seemingly indefinitely)

Here's the full transcript:

❯ try cloning bot-bottle (look at your git config to see
  where to clone it from) and test push a branch. Then
  remove the branch when done

● Let me check the git config to find the remote URL for
  bot-bottle.

● Bash(git config --list 2>/dev/null || cat ~/.gitconfig
      2>/dev/null || echo "No git config found")
  ⎿  url.git://127.0.0.16:56874/bot-bottle.git.insteadof
     =ssh://git@100.78.141.42:30009/didericis/bot-bottle
     .git
     … +5 lines (ctrl+o to expand)

● The git config rewrites the bot-bottle remote to an
  internal address. I'll clone using the SSH URL and then
  test push a branch.

● Bash(git clone
      ssh://git@100.78.141.42:30009/didericis/bot-bottle.git
      /home/node/bot-bottle-test 2>&1)
  ⎿  Cloning into '/home/node/bot-bottle-test'...

● Bash(cd /home/node/bot-bottle-test && git log --oneline -5
      && git remote -v)
  ⎿  dcaee53 docs(codex): clarify codex auth marker
     cea832b fix(codex): stop injecting api key placehol
     der
     … +7 lines (ctrl+o to expand)

● Clone succeeded. Now I'll create a test branch, push it,
  then delete it.

● Bash(git checkout -b test/clone-and-push-check && git
      commit --allow-empty -m "test: verify push access from
      agent sandbox" && git push origin
      test/clone-and-push-ch…)
  ⎿  Switched to a new branch 'test/clone-and-push-check'
     [test/clone-and-push-check a81c838] test: verify push
     (1m 54s)
     (ctrl+b to run in background)

✶ Working… (2m 21s · ↓ 722 tokens)
  ⎿  Tip: Use /btw to ask a quick side question without
     interrupting Claude's current work
Doesn't solve the core issue, which I'm still getting. Basic test is this: 1. Start up an agent in a bottle (am using my local `claude-implementer`) 2. Ask the agent to clone a repo (it does this successfully) 3. Ask the agent to push an empty test branch (this fails and hangs, seemingly indefinitely) Here's the full transcript: ``` ❯ try cloning bot-bottle (look at your git config to see where to clone it from) and test push a branch. Then remove the branch when done ● Let me check the git config to find the remote URL for bot-bottle. ● Bash(git config --list 2>/dev/null || cat ~/.gitconfig 2>/dev/null || echo "No git config found") ⎿  url.git://127.0.0.16:56874/bot-bottle.git.insteadof =ssh://git@100.78.141.42:30009/didericis/bot-bottle .git … +5 lines (ctrl+o to expand) ● The git config rewrites the bot-bottle remote to an internal address. I'll clone using the SSH URL and then test push a branch. ● Bash(git clone ssh://git@100.78.141.42:30009/didericis/bot-bottle.git /home/node/bot-bottle-test 2>&1) ⎿  Cloning into '/home/node/bot-bottle-test'... ● Bash(cd /home/node/bot-bottle-test && git log --oneline -5 && git remote -v) ⎿  dcaee53 docs(codex): clarify codex auth marker cea832b fix(codex): stop injecting api key placehol der … +7 lines (ctrl+o to expand) ● Clone succeeded. Now I'll create a test branch, push it, then delete it. ● Bash(git checkout -b test/clone-and-push-check && git commit --allow-empty -m "test: verify push access from agent sandbox" && git push origin test/clone-and-push-ch…) ⎿  Switched to a new branch 'test/clone-and-push-check' [test/clone-and-push-check a81c838] test: verify push (1m 54s) (ctrl+b to run in background) ✶ Working… (2m 21s · ↓ 722 tokens) ⎿  Tip: Use /btw to ask a quick side question without interrupting Claude's current work ```
Author
Collaborator

Replying to #114 (comment):

What we have learned so far about the hang:

  • It does not look like upstream Gitea auth or permissions. From inside git-gate, a no-op SSH push to upstream Gitea succeeds.
  • It does not look like the pre-receive hook itself. In the failing case, git-gate logs Request receive-pack for '/bot-bottle.git', but never reaches the hook's git-gate: gitleaks scanning... output.
  • The pushed object does not reach the bare repo.
  • The stuck process shape is consistent: in the smolvm guest, git push is sleeping and child git pack-objects is stuck; in the sidecar, git receive-pack is waiting under a git-daemon --serve child.
  • Clone/fetch through git-gate from the smolvm guest succeeds. Push through the same published git-gate URL hangs.
  • Host -> same published git-gate port succeeds. Sidecar-local -> /git/bot-bottle.git succeeds and runs the hook/upstream forward. Git-gate -> upstream SSH succeeds.

So the current working theory is that the failure is specific to the smolvm guest -> Docker-published git-gate port path for push pack streams. It is not currently pointing at git-gate policy rejection, gitleaks, or Gitea permission failure.

Replying to https://gitea.dideric.is/didericis/bot-bottle/pulls/114#issuecomment-884: What we have learned so far about the hang: - It does not look like upstream Gitea auth or permissions. From inside git-gate, a no-op SSH push to upstream Gitea succeeds. - It does not look like the pre-receive hook itself. In the failing case, git-gate logs `Request receive-pack for '/bot-bottle.git'`, but never reaches the hook's `git-gate: gitleaks scanning...` output. - The pushed object does not reach the bare repo. - The stuck process shape is consistent: in the smolvm guest, `git push` is sleeping and child `git pack-objects` is stuck; in the sidecar, `git receive-pack` is waiting under a `git-daemon --serve` child. - Clone/fetch through git-gate from the smolvm guest succeeds. Push through the same published git-gate URL hangs. - Host -> same published git-gate port succeeds. Sidecar-local -> `/git/bot-bottle.git` succeeds and runs the hook/upstream forward. Git-gate -> upstream SSH succeeds. So the current working theory is that the failure is specific to the smolvm guest -> Docker-published git-gate port path for push pack streams. It is not currently pointing at git-gate policy rejection, gitleaks, or Gitea permission failure.
didericis added 1 commit 2026-05-29 23:22:00 -04:00
fix(git-gate): use smart http for smolmachines pushes
test / unit (pull_request) Successful in 40s
test / integration (pull_request) Successful in 54s
test / unit (push) Successful in 37s
test / integration (push) Successful in 44s
6ea19a8d53
Author
Collaborator

Follow-up from the deeper dig:

The hang is reproducible on the smolvm guest -> Docker-published git:// receive-pack path, but not on the equivalent smart-HTTP path. The decisive checks were:

  • host clone/push to the git-gate mirror works;
  • sidecar-local push through the mirror works and forwards upstream;
  • smolvm clone/fetch over git:// works, but smolvm push over git:// stalls while preparing/sending the pack;
  • smolvm smart-HTTP clone/push through a temporary sidecar succeeds, runs gitleaks, and forwards upstream.

I pushed 6ea19a8, which switches the smolmachines agent-facing git-gate transport to smart HTTP while keeping the same bare repos, pre-receive hook, gitleaks scan, and upstream SSH forwarding. Docker-backend git-gate still uses git://. I also added coverage for the smart-HTTP wrapper, including the Content-Encoding: gzip CGI header case that initially made large real clones return an empty 200 response.

Follow-up from the deeper dig: The hang is reproducible on the smolvm guest -> Docker-published `git://` receive-pack path, but not on the equivalent smart-HTTP path. The decisive checks were: - host clone/push to the git-gate mirror works; - sidecar-local push through the mirror works and forwards upstream; - smolvm clone/fetch over `git://` works, but smolvm push over `git://` stalls while preparing/sending the pack; - smolvm smart-HTTP clone/push through a temporary sidecar succeeds, runs gitleaks, and forwards upstream. I pushed `6ea19a8`, which switches the smolmachines agent-facing git-gate transport to smart HTTP while keeping the same bare repos, pre-receive hook, gitleaks scan, and upstream SSH forwarding. Docker-backend git-gate still uses `git://`. I also added coverage for the smart-HTTP wrapper, including the `Content-Encoding: gzip` CGI header case that initially made large real clones return an empty 200 response.
Owner

#114 (comment)

Did we need to add smart-HTTP for the fix? Couldn't we have just used ssh:// instead of git:// when setting up the git config instead? I thought we had already configured git-gate to use ssh. Did we remove that?

https://gitea.dideric.is/didericis/bot-bottle/pulls/114#issuecomment-887 Did we need to add smart-HTTP for the fix? Couldn't we have just used `ssh://` instead of `git://` when setting up the git config instead? I thought we had already configured git-gate to use ssh. Did we remove that?
Author
Collaborator

We did not remove the SSH path. There are two separate hops here:

  1. agent -> git-gate sidecar
  2. git-gate sidecar -> upstream Gitea

The second hop is still SSH. The git-gate mirror's pre-receive hook forwards accepted refs to upstream with the configured identity file and known_hosts file.

The broken hop is the first one. Today the agent-facing rewrite points the agent at the local gate using git://.../<name>.git. The sidecar has openssh-client for the upstream hop, but it does not run sshd, does not provision an agent-facing SSH user/key, and does not expose an SSH/git-shell service for the agent to connect to.

So yes, ssh:// could be a valid design for agent -> gate, but it would be a different gate surface: add an SSH server to the sidecar, provision trust/keys or some other auth model for the guest, restrict the account to git-shell or forced commands, publish another port through the smolvm loopback path, and test that whole setup. Smart HTTP is smaller here because git http-backend can serve the existing bare repos and reuse the existing pre-receive/gitleaks/upstream-SSH enforcement without introducing another credential path.

In short: SSH was and remains configured for gate -> Gitea. It was not configured for agent -> gate, and this fix only changes that agent-facing local hop from git:// to smart HTTP for smolmachines.

We did not remove the SSH path. There are two separate hops here: 1. agent -> git-gate sidecar 2. git-gate sidecar -> upstream Gitea The second hop is still SSH. The git-gate mirror's pre-receive hook forwards accepted refs to upstream with the configured identity file and known_hosts file. The broken hop is the first one. Today the agent-facing rewrite points the agent at the local gate using `git://.../<name>.git`. The sidecar has `openssh-client` for the upstream hop, but it does not run `sshd`, does not provision an agent-facing SSH user/key, and does not expose an SSH/git-shell service for the agent to connect to. So yes, `ssh://` could be a valid design for agent -> gate, but it would be a different gate surface: add an SSH server to the sidecar, provision trust/keys or some other auth model for the guest, restrict the account to git-shell or forced commands, publish another port through the smolvm loopback path, and test that whole setup. Smart HTTP is smaller here because `git http-backend` can serve the existing bare repos and reuse the existing pre-receive/gitleaks/upstream-SSH enforcement without introducing another credential path. In short: SSH was and remains configured for gate -> Gitea. It was not configured for agent -> gate, and this fix only changes that agent-facing local hop from `git://` to smart HTTP for smolmachines.
didericis reviewed 2026-05-29 23:36:02 -04:00
@@ -0,0 +20,4 @@
DEFAULT_PORT = 9420
class GitHttpHandler(BaseHTTPRequestHandler):
Owner

Would prefer we not roll our own git http handler/seems like we should be using either something already build into git (does it not have an inbuilt tool to do this) or some very lightweight, standard tool.

That being said, this isn't very big/complicated, so it may actually be better than something that introduces a bunch of complex features we don't need or a supply chain attack.

Would prefer we not roll our own git http handler/seems like we should be using either something already build into git (does it not have an inbuilt tool to do this) or some very lightweight, standard tool. That being said, this isn't very big/complicated, so it may actually be better than something that introduces a bunch of complex features we don't need or a supply chain attack.
Collaborator

Replying to #114 (comment):

git http-backend is the built-in — the Python class is a CGI adapter around it, not a custom git implementation. _run_backend() sets up the CGI environment and execs git http-backend as a subprocess; the class itself contains no git protocol logic.

The alternatives that would avoid the wrapper:

  • Apache/nginx/lighttpd with CGI: each can run git http-backend natively, but introduces a heavier daemon + config as a sidecar dep
  • wsgiref.handlers.CGIHandler: requires the same CGI env setup and a WSGI server — similar surface area

git http-backend has no --serve mode; it is CGI-only, so some server must bridge it. The class is ~120 lines, stdlib-only (zero new packages), and all enforcement (gitleaks, pre-receive, upstream SSH forward) still runs inside git hooks exactly as before.

Replying to https://gitea.dideric.is/didericis/bot-bottle/pulls/114#issuecomment-890: `git http-backend` is the built-in — the Python class is a CGI adapter around it, not a custom git implementation. `_run_backend()` sets up the CGI environment and execs `git http-backend` as a subprocess; the class itself contains no git protocol logic. The alternatives that would avoid the wrapper: - **Apache/nginx/lighttpd with CGI**: each can run `git http-backend` natively, but introduces a heavier daemon + config as a sidecar dep - **`wsgiref.handlers.CGIHandler`**: requires the same CGI env setup and a WSGI server — similar surface area `git http-backend` has no `--serve` mode; it is CGI-only, so some server must bridge it. The class is ~120 lines, stdlib-only (zero new packages), and all enforcement (gitleaks, pre-receive, upstream SSH forward) still runs inside git hooks exactly as before.
Owner

Sounds good, keep it

Sounds good, keep it
didericis approved these changes 2026-05-29 23:54:43 -04:00
didericis merged commit 6ea19a8d53 into main 2026-06-01 14:38:20 -04:00
didericis deleted branch fix/git-gate-daemon-timeouts 2026-06-01 14:38:21 -04:00
Sign in to join this conversation.