File Lock Heartbeat Disk Full Stale Lock

When a file-lock heartbeat silently fails on ENOSPC, inode exhaustion, permissions, or a missing heartbeat path, the lock holder may still be inside the critical section while another process decides the lock is stale and enters too.

Ask free AI judgment

Find what you can delete.

Leave your email now. The scan summary can follow after the first reply; we send the free SafeDisk AI deletion trial step only if deletion risk is still unclear.

See sample result Ask AI about one file

First Response Runbook

A heartbeat failure should not be treated as a successful lock refresh. It should create an explicit lock-health state that downstream stale-lock logic can reason about.

Log heartbeat refresh failures with the lock path, heartbeat path, operation, and filesystem error.
Classify ENOSPC, EDQUOT, EIO, EACCES, EPERM, and missing heartbeat paths separately from normal stale timeout.
When heartbeat refresh fails, decide whether the holder aborts protected work, releases the lock, or marks the lock as unhealthy.
Do not let a contender steal only because mtime is stale when disk or permission failure could explain the stale heartbeat.
Require dead-owner evidence, an explicit fencing token, or a recovery lock before allowing a steal.
Add a two-contender regression test: holder heartbeat fails, contender polls, and both processes never enter the critical section at once.

Copy-ready issue reply

Use this checklist when a heartbeat error is currently swallowed.

It keeps the fix focused on preventing concurrent entry, not only printing a warning.

I would make the heartbeat failure visible, and I would also define what happens to the protected critical section once the heartbeat path becomes unhealthy.

Acceptance checks I would add:

- Inject utimes(heartbeatPath) failure with ENOSPC/EIO/EACCES and assert a warning includes the lock path and operation.
- Treat heartbeat-write failure as lock-health degradation, not a silent success.
- The holder should either abort protected work or mark the lock as non-stealable until ownership is resolved.
- The stale-lock detector should require both stale mtime and dead owner/process evidence before stealing.
- Add a two-contender regression test: holder heartbeat fails, second process polls, and concurrent entry never happens.
- Surface lock-dir filesystem and inode status so disk-full and permission failures are distinguishable.

AI CLI disk-full guide

Evidence To Collect

Lock path, heartbeat path, owner PID, and stale timeout.
Filesystem free space and inode status for the directory that stores the heartbeat.
The exact error returned by heartbeat refresh, not just whether the timer is still running.
The condition a contender uses before stealing the lock.
Whether protected writes can continue after heartbeat refresh has failed.

Paid Scope

The free AI incident triage reviews one lock or runner failure and returns the safest next diagnostic step. The free SafeDisk AI deletion trial turns one representative incident into a stale-lock policy, failure taxonomy, and regression checklist for your agent, CLI, or CI tool.

Free AI deletion trial

Need a delete / confirm / protect answer?

Send the issue link, log excerpt, or storage summary first. We reply with the next safe move and offer the free SafeDisk AI deletion trial only if the incident still needs review.