-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
We spawn ephemeral runners for each github job we receive.
During recent outage, ephemeral runners got stuck not picking up jobs even though
they were registered with github and printed listening for jobs
.
there is no way for us to tell if the runner is just waiting for a job, or it's genuinely stuck, so it's not safe
for us to kill due to a small chance that it picks up a job just before we kill it.
Basically. Once a runner is registered and listening, there is no safe way for us to shut it down without race conditions
To Reproduce
- Have a GitHub Outage
- Wait for outage to recover
- Notice all our runners are stuck
Expected behavior
The runner should eventually exit with a non-zero exit code if it fails to pick up jobs. Would even help for us if there is an --exit-after-idle <hours-idle>
flag or something.
Runner Version and Platform
Version of your runner?
OS of the machine running the runner? OSX/Windows/Linux/...
What's not working?
Please include error messages and screenshots.
Job Log Output
If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.
Runner and Worker's Diagnostic Logs
If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag
folder. The runner logs are prefixed with Runner_
and the worker logs are prefixed with Worker_
. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.