One of the things I care about most in a compute platform is startup time.
People say they want fast machine starts.
What they usually mean is they do not want to wait while your platform does cold work right in front of them.
That distinction matters a lot.
If you create a machine from scratch on demand, you are paying for a bunch of slow steps in public:
- prepare a root filesystem
- create overlays
- set up networking
- boot or restore Firecracker
- wait for the guest agent
Even when each step is pretty fast, the total is not.
Here is a rough way to think about that budget:
rootfs prep: hundreds of ms to seconds
network setup: tens to hundreds of ms
boot or restore: hundreds of ms to seconds
guest readiness: hundreds of ms to secondsNone of those numbers are scary on their own.
Stack them together on the request path and they stop feeling cheap.
Cold create is not the enemy
Cold create is fine. You need it. It is the fallback path and the general path.
The mistake is pretending cold create can also be your instant path.
If you care about interactive workloads, dev environments, agent sandboxes, or bursty compute, you need a different shape:
- prepare ahead of time
- reserve capacity before doing expensive work
- activate something already close to ready
That is where snapshots, clones, and warm slots come in.
In this post, I want to walk through how I think about those three paths.
Three levels of speed
I like to think about machine startup in three buckets.
1. Fresh boot
This is the full path. Pull image. Build filesystem. Boot guest. Wait for readiness.
It is the most flexible and the slowest.
2. Clone from a prepared source
This path reuses a known-good base, often through snapshot or overlay tricks. It cuts a lot of setup time, but there is still work to do.
It is much better, but not instant.
3. Warm activation
This path claims something already prepared and nearly ready.
That is the path that can feel snappy.
Here is the rough picture:
fresh create:
image pull -> rootfs prep -> network -> boot -> ready
clone:
prepared base -> overlay/snapshot -> network -> restore -> ready
warm activation:
claim ready slot -> attach identity -> expose -> runThe third path is where good latency lives.
Why reservation matters
One easy mistake is to fan out real work to multiple nodes and let the fastest winner keep running.
That looks clever until you notice the waste.
Two or three nodes may do expensive rootfs and startup work for one user request. The losers then roll back after burning CPU, IO, and time.
That is not a fast path. It is a messy betting strategy.
A better model is:
- rank nodes
- reserve one node
- commit cold work to that node only
- use warm activation when an exact match already exists
This keeps the fast path honest.
The importance of exact match
Warm pools sound great until you make them fuzzy.
If a warm slot is only "kind of close" to what the user asked for, activation turns into mutation, and mutation turns back into cold work.
So the identity of a warm slot needs to be strict:
- image digest
- machine shape
- restore capability
- any other setup that changes startup cost or correctness
That lets the scheduler ask a sharp question:
"Do I already have a prepared thing that exactly matches this request?"
If yes, activate it.
If no, reserve a node for cold work and move on.
What breaks warm starts
Warm starts stop being warm pretty quickly when one of these shows up:
- the image digest does not match
- the machine shape does not match
- the prepared state is stale
- the guest still has expensive init work left to do
That is why I do not like fuzzy warm capacity. Either it is a real ready slot for this request, or it is not.
Speed without lying
The biggest risk in performance work is cheating.
You call something fast because one internal benchmark looked nice, but the real path still hides slow work in the same request.
I do not want "fast" to mean:
- it was cached on one node once
- the loser nodes cleaned up eventually
- we skipped a correctness check
- the guest was not actually ready yet
I want it to mean the user asked for a machine and got a usable machine quickly, without hidden waste and without crossed fingers.
The user-facing payoff
When this is done well, the platform feels different.
It stops feeling like "please wait while we build your environment" and starts feeling closer to "your environment is here."
That opens the door for better product shapes:
- on-demand dev machines
- ephemeral test environments
- short-lived AI sandboxes
- burst compute for jobs that should start now
Those experiences depend less on the hypervisor and more on how smart the control plane is about prepared capacity.
The real challenge
The hard part is not building one fast restore path.
The hard part is managing the whole life around it:
- how many warm slots to keep
- which image digests deserve them
- when to recycle them
- how to avoid stale prepared state
- how to represent ready slots to the scheduler
That is why fast activation is a control-plane problem as much as a Firecracker problem.
The simple summary
Cold create is necessary.
Clone paths are useful.
Warm activation is what changes the feel of the product.
And if you want warm activation to work well, you need more than a snapshot trick. You need clear reservation semantics, exact identity, and a scheduler that knows the difference between "this node has free CPU" and "this node has a prepared machine I can claim right now." That difference is where a lot of the speed comes from. Fast start is mostly about moving slow work off the user path.