CI runner storage incident

GitHub Actions Go Race Test Output Disk Full

Large Go race-detector runs can write enough JSON or log output to exhaust the runner, truncate the result file, and make the analysis step fail on malformed output instead of surfacing the real test result. Treat it as two incidents: runner headroom and parser resilience.

CI cleanup request

Get the exact runner cleanup step.

Leave your email now; the scan summary or failing job link can follow after the first reply. We send the $29 Deep Cleanup step only if the runner still needs review.

Run free GitHub Action See sample result

Capture Evidence Before Re-running

Before adding another retry, capture runner headroom before and after the heavy package group. The goal is to prove whether the race output, Docker cache, workspace, or tool cache is the true peak consumer.

GitHub Actions step

Print disk, inode, workspace, cache, and output-file sizes.

This is read-only. Put it immediately before the Go test step and again in an if: always() step after test output is generated.

- name: Runner storage snapshot
  if: always()
  shell: bash
  run: |
    set +e
    echo "== df =="
    df -h . "$RUNNER_TEMP" "$RUNNER_TOOL_CACHE" 2>/dev/null || true
    df -i . "$RUNNER_TEMP" "$RUNNER_TOOL_CACHE" 2>/dev/null || true
    echo "== largest local buckets =="
    du -sh . "$RUNNER_TEMP" "$RUNNER_TOOL_CACHE" ~/.cache ~/go/pkg/mod /var/lib/docker 2>/dev/null | sort -hr | head -40
    echo "== test output files =="
    find . "$RUNNER_TEMP" -type f \( -name '*.json' -o -name '*test*.out' -o -name '*.log' \) -size +50M -print0 2>/dev/null \
      | xargs -0 ls -lh 2>/dev/null | sort -k5 -hr | head -40

Safe Fix Order

Reserve space before the test: fail early if free space is below the measured peak plus a reserve, instead of producing a truncated JSON file.
Stream or shard output: pipe JSON through a consumer, gzip chunks, or split by package so one file cannot consume the remaining disk.
Clean runner-local caches before the job: Docker build cache, unused images, and language caches are candidates only if the workflow does not need them later.
Make the parser tolerant: if the output is truncated, report that the run was storage-corrupt and keep any completed test records instead of panicking.
Track peak bytes: store a tiny artifact with before/after df and output-file sizes so regressions are visible.

Do Not Delete Blindly

Do not remove the current workspace or test output before the parser has consumed it.
Do not prune Docker volumes if integration tests rely on named volumes or local databases.
Do not hide the failure with retries; a retry on another runner makes the cause harder to prove.

Deep Cleanup

Need a cleanup order for this runner?

Submit the form first; the failing job link can follow. We check whether free guidance is enough before asking for the $29 Deep Cleanup.