Skip to content

Bugs

1 post with the tag “Bugs”

GitHub Actions' "Deranged" Sleep Loop: Years of Bugs Costing Developers Thousands

GitHub Actions, the ubiquitous CI/CD platform powering workflows for millions of repositories, harbors a notorious four-line Bash function that’s been lambasted as “utterly deranged” by programming luminaries. This sleep mechanism, meant to pause execution briefly, has instead spawned infinite loops, zombie processes, and runaway bills—issues persisting for nearly a decade despite fixes sitting unmerged.

The saga begins around 2016 with the initial public commits to the GitHub Actions runner codebase. Early versions reveal Windows developers grappling with Bash scripting, resorting to a Stack Overflow hack from 16 years prior: using ping to simulate a 5-second delay when sleep wasn’t available.

Terminal window
if [ $? -eq 4 ]; then
sleep 5 || ping -n 6 127.0.0.1 > nul || (for i in `seq 1 5000`; do echo >&5; done)
fi

This fallback chain—sleep first, then ping -n 6 (yielding ~4 seconds), or worst-case, echoing to /dev/null 5,000 times—earned promotion to a top-level safe_sleep function. It was crude but mostly functional, if CPU-intensive.

By 2022, evolution took a darker turn. The code “improved” into this gem, which lingered for years:

Terminal window
start=$SECONDS
while [ $((SECONDS - start)) -ne ${1?} ]; do :; done

At first glance, it leverages Bash’s SECONDS variable (incrementing every second) for a precise wait. Pass 5, and it loops until exactly 5 seconds elapse. But here’s the fatal flaw: on busy CI runners juggling heavy jobs, loop iterations might skip seconds due to scheduling delays. If SECONDS - start jumps from 4 to 6, -ne 5 never falsifies, trapping the process in an eternal spin.

Worse, with no sleep inside the loop, it pegs a full CPU core at 100%—half the compute on GitHub’s standard 2-vCPU runners—starving other tasks and cascading failures across queues.

Real-World Carnage: $2,400 Zombie Processes and CI Meltdowns

Section titled “Real-World Carnage: $2,400 Zombie Processes and CI Meltdowns”

The fallout was brutal. One developer reported a single runner idling for 5,135 hours, billed at GitHub’s $0.08 per vCPU-minute rate: ~$2,400 vaporized. Projects like Zigg abandoned GitHub entirely for Codeberg, citing “inexcusable bugs” and “vibe scheduling” post-Microsoft’s AI pivot—random job prioritization exacerbating backlogs where even main branch commits stalled.

A simple fix emerged in 2024: tweak to while [ $((SECONDS - start)) -le ${1?} ]; do :; done. This embraces overshoot, halting reliably. Proposed in February 2022, it languished, auto-closed after a month, then merged 1.5 years later amid outcry—yet other bugs persist, like recent file-hashing failures from a botched refactor.

That refactor? A JavaScript snippet trading a clean Object.getOwnPropertyNames for nested loops and redundant if statements, ballooning complexity and introducing regressions.

// Pre-refactor simplicity vs. post-refactor horror
function getKeys(obj) {
return Object.getOwnPropertyNames(obj);
}
// Now: convoluted for-loops, duplicated code, function-returning-functions

GitHub Actions underpins a multi-billion-dollar Microsoft ecosystem, yet these gremlins fester. PRs ignored, refactors worsening code, neglect amid scale: it’s a stark reminder that even giants stumble on fundamentals. Proponents argue it’s “elegant Bash”—short, using : as a no-op—but critics like Matt Lad, of Antithesis, nail it: peak engineering this is not.

Recent AI hype restores ironic faith; no LLM could conjure such creatively catastrophic logic. Developers pay premium for reliability—GitHub must prioritize core stability over shine. Until then, audit your runners: that “harmless” sleep might be your budget’s silent killer.

But why would they fix it? This isn’t just “neglect”; it’s a symptom of monopoly lethargy. When you own the ecosystem—when millions of workflows are locked into your syntax—performance bugs that burn customer CPU cycles are, perversely, revenue generators. Every wasted minute of a zombie runner is a minute billed. In a competitive market, this would be a death sentence. In GitHub’s world, it’s a rounding error. The real fix isn’t just a patch; it’s viable competition that forces the giant to care about the ants.