Millisecond Autoscale for Apps: a Pipedream?

In Alice in Wonderland, the King suggests to “Begin at the beginning”, so let’s start with Wikipedia’s definition of autoscale:

[…] a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load

Easy enough, but what I feel is missing from this definition when it talks about “load” is the question of timescales. On the Internet, bursts for such loads happen quickly, sometimes in scales of seconds or milliseconds; it would make sense, then, that the autoscale mechanisms that are designed to cope with such bursts would operate in those same timescales.

But they don’t. If you ever set up autoscale mechanisms on hyperscaler infra, you know that it takes seconds and often minutes for new instances to come up, and so a reactive approach isn’t always feasible. There are then a few sub-optimal workarounds:

Use scheduled autoscaling, where you know when bursts will happen, and what the size of them will be — a big assumption.
Use predictive autoscaling — but using heuristics to predict the future is tricky and not always accurate
Use FaaS offerings like AWS Lambda, where autoscale happens quickly (ie, it is reactive) — but you then have to cope with other limitations like cold starts and having to run under a functions model.

Back To First Principles?

This whole landscape feels like we’re tying ourselves in knots. If we were to go back to first principles, we could perhaps outline the requirements for an ideal autoscale mechanism as follows:

Reactive in timescales of milliseconds, both for scaling out and scaling in
Based on apps, so not limited to functions
Provide strong, hardware-level isolation (i.e., no language level isolation)

The bottom line is that the tech currently used on cloud platform (based on general-purpose OSes, standard distros and slow-to-react controllers) makes it really hard to achieve millisecond reactivity.

A Different Way Forward

In order to have millisecond autoscale, a basic building block is that apps on a cloud platform need to be able to start and stop very quickly. To achieve this, we leverage years of research and open source work into unikernels, extremely specialized, fast yet strongly isolated virtual machines that can cold start in milliseconds. A second, Fundamental building block is a high performance controller and proxy built from scratch that can be reactive in scales of milliseconds and scale to thousands of instances. And some mods and tweaks to the underlying network interfaces and host to make sure everything runs fast.

The end result is that on KraftCloud, scale out (the process of adding instances to cope with increased load) happens in milliseconds, so you can transparently and effortlessly handle load increase including traffic peaks. No more headaches due to slow autoscale like keeping hot instances around to deal with peaks, coming up with complex predictive algorithms, or other painful workarounds; you can just set autoscale on and let KraftCloud handle your traffic increases and peaks. And all of this while using apps/containers, not functions, all of them strongly isolated (unikernels are, after all, VMs, albeit extremely specialized ones).

Millisecond App Autoscale in Practice

Setting up autoscale on KraftCloud is fairly simple. To show this, we’ll use NGINX as the app to deploy. First, we need to deploy an instance of it, and we’ll use the kraft cloud deploy -p 443:8080 one-liner command to do so:

[●] Deployed successfully!
 │
 ├────────── name: nginx-4d7u3
 ├────────── uuid: 8fda2a70-6a32-4b5e-8900-4395b33d02d7
 ├───────── state: running
 ├─────────── url: https://small-leaf-rafirkw7.fra0.kraft.host
 ├───────── image: nginx@sha256:389bfa6be6455c92b61cfe429b50491373731dbdd8bd8dc79c08f985d6114758
 ├───── boot time: 20.36 ms
 ├──────── memory: 128 MiB
 ├─ service group: small-leaf-rafirkw7
 ├── private fqdn: nginx-4d7u3.internal
 ├──── private ip: 172.16.6.5
 └────────── args: /usr/bin/nginx -c /etc/nginx/nginx.conf

It’s worth noting the cold start of just 20 milliseconds, which is fundamental to having fast scale out. Next, we’ll configure how we want autoscale to function: in this case we’ll configure it to use 0-8 instances, and scale out and in based on CPU (though it can also be done based on network metrics):

kraft cloud scale init small-leaf-rafirkw7 --master  nginx-4d7u3 --min-size 0 --max-size 8
kraft cloud scale add  small-leaf-rafirkw7 --name scale-out-policy --metric cpu --adjustment percent --step 600:800/50 --step 800:/100
kraft cloud scale add  small-leaf-rafirkw7 --name scale-in-policy  --metric cpu --adjustment percent --step :50/-50

And that’s it! You now have millisecond autoscale operating on apps. To show that it is fast/reactive, check out this brief video showing different traffic loads and how KraftCloud’s autoscale reacts to them:

KraftCloud autoscale

Get early access to the KraftCloud Beta

If you want to find out more about the tech behind KraftCloud read our other blog posts, join our Discord server and check out the Unikraft’s Linux Foundation OSS website . We would be extremely grateful for any feedback provided!

Sign-up now

Millisecond Autoscale for Apps: a Pipedream?

Back To First Principles?

A Different Way Forward

Millisecond App Autoscale in Practice

Get early access to the KraftCloud Beta

Related Posts

Unikraft vs. UKL: What's the Difference?

Unikraft and WebAssembly: Friends or Foes?

Cut your Cloud Computing Costs by Half with Unikraft

Why the Future of Cloud Deployments is Unikernels – and why with Unikraft that Future is Near