Confidential Space Model Image Boot Disk Full
When a confidential VM pulls or extracts a container image with baked model weights, the workload can die before app logs start. Treat this as a boot-disk admission and delivery-mode decision, not a generic runner cleanup problem.
Separate baked image bytes from runtime model delivery.
If both the primary model and embedding model are baked into image layers, the extracted footprint can exceed the default Confidential Space boot disk. The safe fix is either a documented single-model bake path, encrypted runtime delivery to /models, or an explicit boot-disk-size knob with a preflight.
image layers -> extracted size -> boot disk -> runtime delivery path
Capture the size budget before changing the boot disk.
Useful evidence is public-safe: model artifact sizes, compressed image size, estimated extracted size, configured boot disk, runtime reserve, and the Cloud Logging ENOSPC signature. No model weights, keys, attestation secrets, or private logs are needed.
MODEL_URL + EMBED_MODEL_URL + extracted image + boot disk reserve
Runbook: Fail Before The Confidential VM Starts
- Document the supported delivery modes: single-model bake, two-model encrypted runtime delivery, or explicitly-sized boot disk.
- Estimate extracted bytes, not only compressed image size. Model layers, runtime packages, OS reserve, logs, temp, and decompression overhead share the boot disk.
- Add a deploy preflight that rejects combinations such as
MODEL_URLplusEMBED_MODEL_URLwhen they exceed the documented boot-disk budget. - If a boot-disk-size variable is added, make the minimum size calculation visible in docs and examples.
- Put the Cloud Logging signature next to other silent-start failure modes:
write /models/... no space left on device, missing app logs, and self-termination after image extraction. - Keep encrypted delivery as the default for large or gated model weights so sensitive artifacts do not become permanent image layers.
Use this when baked model layers overflow the CVM boot disk.
This keeps the fix framed around a clear delivery-mode contract and a preflight, not a vague increase-disk note.
I would make the acceptance criteria explicit around a size-budget preflight, not only docs.
Suggested boundary:
- One model baked into the image is allowed only if extracted image bytes + OS/runtime reserve fit the default boot disk.
- Two large model artifacts should default to encrypted runtime delivery into /models or tmpfs unless a boot-disk-size variable is set.
- Deployment should fail before launch when MODEL_URL + EMBED_MODEL_URL exceed the documented boot-disk budget.
- The silent-termination docs should include the Cloud Logging signature: write /models/... no space left on device, followed by the workload never starting.
- If a boot-disk-size knob is added, document the minimum-size formula and add a fixture with two model sizes that would fail on the default disk.
Turn one silent CVM failure into a reusable model-delivery policy.
The $99 policy is for teams shipping confidential AI workloads where image layers, boot disks, encrypted delivery, tmpfs, and model-cache behavior need a clear safe/review/do-not-touch boundary.
Do Not Delete Or Rebuild Blindly
- Gated model weights that are expensive or sensitive to rehydrate.
- Attestation, encryption, or delivery-path evidence that explains why the workload never started.
- Cloud Logging entries that identify image extraction ENOSPC before application logs begin.
- Runtime tmpfs or encrypted-delivery paths before confirming what the model server actually expects.