LSM Compaction Disk Full Deadlock
An LSM tree can deadlock near full disk when compaction is the operation that would free space, but a merge compaction first needs transient output space while the old SSTs still exist. The fix is a storage admission policy, not a blind cleanup script.
Keep Drop and Move available while bounding Merge compactions by real headroom.
Use this when a storage engine, cache, or embedded database needs compaction to reclaim disk but can also fail compaction when the filesystem is already near full.
classify Drop/Move/Merge -> reserve headroom -> narrow merge -> atomic manifest
Measure table bytes, free headroom, and the compaction class.
These checks keep the first response public-safe: no data contents, no private keys, only filesystem headroom and SST/log size buckets.
df -h; du -sh; find *.sst; manifest/log sizes
Runbook: Break The Compaction Reclaim Deadlock
- Split compaction choices by disk effect. Drop and Move should stay runnable at the limit; Merge needs admission control.
- Reserve a band for one minimal reclaiming compaction. User writes should be paused before they consume it.
- Estimate a merge's transient need before starting. A conservative bound is the sum of input SST sizes while the output is being written.
- Narrow instead of fail when possible: choose fewer inputs, smaller key ranges, or the compaction with the highest likely bytes reclaimed per byte written.
- Keep the manifest atomic. If an output write fails, inputs remain valid, the manifest stays unchanged, and orphan outputs are separately cleaned.
- Expose availability as a status bit: full compaction available, only reclaim compaction available, writes paused, or hard full.
Use this when compaction needs space in order to free space.
This turns the bug report into concrete acceptance criteria: admission, reserve, safe failure, and observability.
I would make the compaction picker space-aware rather than treating all compactions like writes.
Acceptance checks I would add:
- Drop and Move remain eligible even when normal writes are paused.
- Merge estimates transient output space before starting.
- If the merge bound does not fit free-reserved, the picker skips or narrows the merge.
- The emergency reserve cannot be consumed by user writes.
- Injected mid-merge ENOSPC leaves inputs intact, output orphaned, manifest unchanged, and reopen succeeds.
- StorageStatus exposes whether full compaction is available or only reclaiming compaction is allowed.
Turn one compaction ENOSPC issue into a reusable storage admission policy.
The $99 policy is for storage engines, embedded databases, caches, queues, and stateful services where compaction, cleanup, or retention both consumes and frees disk. You get the headroom model, reserve thresholds, safe failure checklist, and tests for one representative component.
Do Not Delete First
- Manifest, current files, table metadata, or compaction markers before proving reopen behavior.
- Input SSTs for a failed merge before confirming the output was committed to the manifest.
- Orphan outputs before recording whether cleanup is automatic and idempotent.
- WAL or memtable recovery files just to make room for compaction.