Artifact Cache Write Failure Connection Leak
When a package proxy streams an upstream response to the client and into a cache at the same time, a disk-full or S3 write failure must not leave the main request blocked until `WriteTimeout` while both client and upstream connections stay open.
Turn one cache-write hang into a reusable artifact proxy contract.
Use this when npm, PyPI, Cargo, Maven, Go module, or artifact proxy handlers tee upstream bytes into a disk/S3 cache while serving the client.
cache failure must unblock the response path, not hold two connections until timeout
Capture stream topology, cache backend, timeout behavior, and connection pressure.
These checks do not need package contents, repository secrets, credentials, or customer artifacts. The useful signal is where the cache writer exits and whether the response path unblocks.
tee path -> cache backend -> error injection -> unblock time -> held connections
Runbook: Cache Failure Must Not Own The Request
- Do not let cache persistence be a hidden hard dependency for serving an upstream artifact response.
- When the cache writer exits early, explicitly drain or close the cache-side reader so the tee writer does not block forever.
- Use `CloseWithError`, context cancellation, or equivalent signaling so the response path knows the cache side is gone.
- Set deadlines for cache writes and object-store puts that are shorter than the client-facing `WriteTimeout` failure mode.
- Test disk-full, quota-exceeded, short-write, and object-store failure paths with multiple concurrent artifact requests.
- Verify the error path releases client connections, upstream connections, goroutines, temp files, and file descriptors promptly.
- Write cache objects atomically and remove partial temp files so a failed cache write cannot poison future reads.
Use this when cache-write failure blocks the streaming handler.
This keeps the fix measurable: cache failure is injected, the response path unblocks promptly, and connection pressure is bounded.
I would turn this into a cache-failure contract for every artifact handler, not just a local fix in one backend.
Acceptance checks I would add:
- Inject ENOSPC/EDQUOT or S3 PutObject failure after some bytes have already streamed.
- The client-facing handler returns promptly; it should not wait until WriteTimeout.
- The upstream response body is closed promptly on cache failure or client abort.
- The cache side drains or closes the pipe reader so the tee writer cannot block forever.
- Partial temp files are removed and never become readable cache entries.
- A concurrent test proves failing cache writes do not hold client connections, upstream connections, goroutines, or fds until timeout.
- The same behavior is covered for npm, PyPI, Cargo, Maven, and Go module handlers if they share the pattern.
Turn one cache-write failure into a reusable artifact proxy policy.
The $99 policy is for package mirrors, artifact proxies, build caches, CI dependency caches, and S3-backed cache services where disk-full or object-store errors can block request handlers and exhaust connections.
Do Not Treat As A Cache Miss
- Cache-write errors that occur after the response has already begun streaming.
- Object-store failures that leave pipe readers, goroutines, or upstream bodies open.
- Partial cache files that can later be served as complete artifacts.
- Connection/fd growth during disk-full or quota-exceeded cache failures.