First things first: long cold boot times are a real pain. On hyperscalers, starting a new service can take seconds or even minutes, and even newer platforms can take hundreds of milliseconds or seconds to have a new service started. Such long boot times can be the difference between a client purchasing a product or leaving a store. Even FaaS offerings, for all their purported nimbleness, also suffer from noticeable cold boots.
Worse, autoscale, supposedly a seamless mechanism for coping with traffic peaks and varying demand, can only be effective if it can react in the same time scales as those peaks. Because autoscale in major cloud providers can take seconds or even minutes to bring new instances up and have them ready to process requests, engineers are having to come up with less-than-optimal workarounds such as having to have hot instances ready to run for scaling purposes (costly) or trying to come up with complex algorithms to predict demans/peaks (hard or near impossible).
Hardware Isolation and Millisecond Cold Boots?
The world would be a better place if cold boots and autoscale were completely transparent to users, in the order of a few milliseconds. KraftCloud (www.kraft.cloud) provides exactly such millisecond semantics, all of the while providing full hardware isolation. How fast is it? Because I like numbers, let’s start with a graph:
To generate this graph, we create an instance (read: unikernel) of NGINX and measure how long it takes for it to be ready to serve requests. We then boot a second instance (all the while leaving the first one running), do the same measurement, and so on all the way up to 5,000 such instances — all on a single, relatively standard server consisting of an AMD EPYC 7402P, 24 cores @ 2.8 GHz CPU and 64GiB of memory.
On the lowest end, for the first instances, we see cold boot times of about 4 milliseconds; and on the upper range, for the 5,000th VM, that number goes up to a still quite low 14ms. Note that the slight sub-linear increase is due to system/host overheads (e.g., scheduling) which can be optimized.
To make it all a bit more tangible, here’s the output of a
kraft cloud deploy command which creates an instance on the platform:
It is worth nothing that even though we are running a Unikraft unikernel, the application is standard, unmodified NGINX. What about other applications or programming languages? Here’s a table with a sample of them, including a comparison with those same apps running on Linux:
How Does it Work?
In one word, specialization (well, with many performance optimizations as well 😀): if you know the application you want to deploy, and presumably for cloud deployments you always do, then you can, at build time, fully customize the image, all the way down to the OS, to only contain the functionality that the app needs, and nothing more.
The concept is illustrated in the diagram above. If a line of code is in a Unikraft image it’s because the application needs it to run — otherwise it’s out 🪓. To further help with specialization, Unikraft is a modular OS, making it easy to and/remove functionality from builds. Finally, we optimize the boot process and code itslef, and we leverage a fast VMM (Virtual Machine Monitor, in our case FireCracker) to ensure the quickest possible boot times.
As a final note, while these boot times are small, we have a number of ideas as to how to reduce them further, so watch this space!
Want Early Beta Access?
We are currently looking for tech enthusiasts itching to get early KraftCloud tinkering rights. If you feel you fall in this category, make sure to fill out the brief sign up from on kraft.cloud and we’ll be in touch shortly!
If you want to find out more about the tech behind KraftCloud please read this previous tech blog, and be sure to check out Unikraft’s Linux Foundation OSS website and corresponding Discord . We would be extremely grateful for any feedback provided, feel free to drop me a line at firstname.lastname@example.org .