pg_dump Partial Backup Disk Full
A backup sidecar can look healthy while quietly publishing an empty or truncated PostgreSQL dump. The dangerous pattern is writing `pg_dump -Fc` directly to the final `.dump` path, then letting retention or offsite upload treat that file as a valid restore point.
Only publish a backup after the dump is complete and restorable.
Use this when `pg_dump` can fail from disk full, database unavailability, OOM, network blips, or container restarts. A failed run should leave a `.partial` file, not a final backup name.
write .partial -> pg_restore -l -> atomic rename -> upload
Find partial dumps, zero-byte files, and backup-root headroom.
These checks avoid file contents. They show whether the backup root has enough space, whether failed files are mixed with final restore points, and which dump names need review.
df; find *.dump *.partial; check zero-byte outputs
Runbook: Never Publish A Failed Dump
- Stop redirecting `pg_dump` directly to the final backup path. Shell redirection creates or truncates that file before `pg_dump` has proved anything.
- Write the dump to a temp path on the same filesystem, for example `app-2026-06-17.dump.partial`.
- Delete an old stale `.partial` for the same job before starting, or make the temp name unique per timestamp.
- Require `pg_dump` exit code 0, a non-empty temp file, and `pg_restore -l` success for custom-format dumps.
- Rename the temp file to the final `.dump` name only after verification passes. This makes final names mean "restore candidate".
- Run offsite upload and retention only after final publication. Failed partials should not prune the previous known-good backup.
- Emit a greppable failure line and call a failure hook such as `BACKUP_ON_FAILURE_CMD` so operators can wire email, Slack, or pager alerts.
Use this when a backup sidecar can leave corrupt final dumps.
This keeps the public issue useful: concrete acceptance checks, no private data, and a clear safety boundary.
I would make the backup sidecar publish-by-verification rather than write directly to the final dump path.
Acceptance checks I would add:
- A failing pg_dump never creates or truncates a final app-*.dump / zitadel-*.dump file.
- The dump writes to "$final.partial" on the same filesystem, then renames only after pg_dump exits 0.
- The sidecar verifies test -s "$final.partial" and, for -Fc dumps, pg_restore -l "$final.partial".
- Retention and offsite upload ignore .partial files and do not prune the previous known-good dump after a failed run.
- Failed runs emit a distinct [backup] FAILED line and fire BACKUP_ON_FAILURE_CMD if configured.
- The runbook proves the latest dump is restorable, not just present in ls -lh.
Turn one failed-backup issue into a reusable dump integrity policy.
The $99 policy is for self-hosted products, backup sidecars, app templates, and platform runbooks where a failed dump can be mistaken for a valid restore point. You get temp-file naming, verification gates, alert hooks, retention boundaries, and acceptance tests for one representative workflow.
Do Not Delete First
- The previous known-good dump before the replacement dump has passed verification.
- `.partial` files before recording timestamp, size, and the failure that created them.
- Backup logs that prove `pg_dump` exited non-zero or `pg_restore -l` failed.
- Offsite restore points before confirming whether they were uploaded from verified final names or failed partial output.