Backup Retention Policy Disk Full
Backup tools feel safe until every save creates another timestamped copy, every job leaves a directory behind, and the host silently runs out of disk. The fix is not just "delete old backups"; it is a retention policy that runs before the next write can fail.
Prune old timestamped copies before the next backup write can fail.
For tools like trs-file-backup, deleting after a successful backup is too late on a nearly full disk. The safer acceptance test is prune-before-write, free-space reserve, temp-copy, then atomic rename.
group by source -> prune beyond keep -> reserve bytes -> temp copy -> rename
Measure history growth, retention gaps, and the next-write risk.
These checks identify whether the disk pressure is old backup copies, oversized logs, active temp files, or a missing prune step.
du -h -d 2 ./backups | sort -h | tail
Runbook: Retain Before The Disk Is Full
- Preserve one recent known-good backup and one failed backup sample before deleting bulk history.
- Measure the backup root by source path, not only total directory size. One large file saved 100 times can dominate everything.
- Add a prune-before-write step. If pruning only runs after a successful backup, the next write can fail before cleanup happens.
- Set both count and age limits: for example, keep the newest 5 per source and delete anything older than 30 days.
- Cap full-log reads and per-job logs. A backup job should not create a second unbounded log-growth problem.
- Fail with a useful message when reserve space is below threshold: backup root, free bytes, largest retained bucket, and recommended cleanup command.
Use this when backups accumulate until the host is full.
This separates active backup safety from old-history retention, and avoids deleting the only good restore point.
I would make retention run before new writes, not only after successful backups.
Read-only evidence first:
BACKUP_ROOT=<backup-root>
df -h "$BACKUP_ROOT" .
du -h -d 2 "$BACKUP_ROOT" 2>/dev/null | sort -h | tail -40
find "$BACKUP_ROOT" -type f -size +100M -printf "%s %TY-%Tm-%Td %p\n" 2>/dev/null | sort -n | tail -40
Then split the policy:
- keep newest N per source file or job
- keep max age for old timestamped copies
- reserve free disk before starting a backup
- cap per-job logs and full-log reads
Turn one backup-growth incident into a reusable retention policy.
The $99 policy is for tools that create timestamped copies, job directories, restore snapshots, or backup logs that can grow without bounds. You get prune-before-write checks, retention thresholds, safe cleanup order, and operator-facing failure messages for one representative workflow.
Do Not Delete First
- The newest known-good backup for each source or job.
- Metadata that proves which source generated the largest history bucket.
- Failed partial backups before recording size, timestamp, and error message.
- Only restore point for a database, project, or customer file before a separate backup exists.