Linux /var/lib Not Writable Service Crash Loop
When a service needs to create a state file under /var/lib but systemd sandboxing, ownership, read-only mounts, or ENOSPC blocks the write, the correct fix is a guarded error path plus a packaging-level writable-state check.
Free cleanup decision
Get a cleanup decision before you pay.
Leave your email now. The scan summary can follow after the first reply; we offer the $29 Deep Cleanup only if it is useful.
Read-Only Evidence
Do not delete state directories first. Prove which layer prevents writes.
svc=netdata
state_dir=/var/lib/netdata
systemctl cat "$svc"
systemctl show "$svc" -p User -p Group -p StateDirectory -p ReadWritePaths -p ReadWriteDirectories -p ProtectSystem -p ProtectHome
namei -om "$state_dir"
stat -c '%U:%G %a %n' "$state_dir" 2>/dev/null || true
df -h "$state_dir" /
df -i "$state_dir" /
mount | grep -E ' /var | /var/lib | / '
journalctl -u "$svc" -n 120 --no-pager | grep -Ei 'read-only|permission denied|no space|enospc|erofs|segv|null|state|var/lib'
Safe Fix Boundary
- Add a null/error guard in code wherever state-file creation can fail. The API should return a controlled error or degraded response, not crash.
- Treat
EROFS,EACCES,ENOSPC, and missing parent directories as separate test cases. - For packages, use systemd state management deliberately:
StateDirectory=,ReadWritePaths=, or the narrowest writable path required by the service. - Add a startup or health warning that says the configured state directory is not writable before the first user-facing request hits the crash path.
- Keep cleanup separate from packaging. Freeing disk can fix
ENOSPC, but it does not fix a read-only mount or missing writable path in the unit file.
Copy-ready issue reply
Use this when a service crashes because its state directory is not writable.
The goal is to keep review focused on deterministic failure handling and packaging validation.
I would add tests at two layers:
1. Code path: state/session file creation returns EROFS, EACCES, ENOSPC, and missing-parent errors. The API should not crash, and it should not pass NULL into formatting/path helpers.
2. Package path: the shipped systemd sandbox can create the exact state file the agent expects under the configured varlib/state directory.
Read-only operator evidence:
- systemctl cat/show for ProtectSystem, StateDirectory, ReadWritePaths/Directories
- namei/stat for the state directory ownership and mode
- df -h and df -i for the state directory
- one namespace write probe only if needed, using a harmless temp name
The fix boundary should be: make the directory writable for the service, report a controlled degraded state when it is not writable, and never delete live state as the first recovery step.
Do Not Delete First
- Existing service state, databases, claims, keys, tokens, or local identity files.
- Any parent directory under
/var/libbefore proving which service owns it. - Logs that prove whether the failure was
EROFS,EACCES, orENOSPC.
Deep Cleanup
Still full after the browser scan?
Start with the browser scan. If the scan shows review-first storage that still needs judgment, send one request for the $29 Deep Cleanup next step.