Spot instances and similar “spare capacity” models are a great way to save money on public cloud. But a surprising amount of cloud customers are not taking advantage of this discounted capacity.
Spot instances are a type of purchasing option that allows users to take advantage of spare capacity at a low price, with the possibility that it could be reclaimed for other workloads with just brief notice.
“Spot instances” is the term I use in this piece, but each cloud provider has their own name for the sale of discounted spare capacity – AWS’s spot instances, Azure’s spot VMs and Google Cloud’s preemptible VMs.
In AWS, for example, the customer makes a Spot Request that essentially includes a “maximum bid” for how much they are willing to pay for a spot instance.
If the current spot price is at or below this bid price, then the spot instance is started. When demand for cloud resources increases, the Spot Price increases, and shortly after it exceeds the customer bid price, the instance is terminated.
Spot Instances in Each Cloud
Variations of spot instances are offered across different cloud providers. AWS has Spot Instances while Google Cloud offers preemptible VMs and as of March of this year, Microsoft Azure announced an even more direct equivalent to Spot Instances, called Azure Spot Virtual Machines.
Azure Spot VMs provide access to unused Azure compute capacity at deep discounts. Spot VMs can be evicted at any time if Azure needs capacity.
AWS spot instances have variable pricing. Azure Spot VMs offer the same characteristics as a pay-as-you-go virtual machine, the differences being pricing and evictions. Google Preemptible VMs offer a fixed discounting structure.
Google’s offering is a bit more flexible, with no limitations on the instance types. Preemptible VMs are designed to be a low-cost, short-duration option for batch jobs and fault-tolerant workloads.
While applications can be built to withstand interruption, specific concerns remain, such as loss of log data, exhausting capacity and fluctuation in the spot market price.
In AWS, the issue in the market occurs when the price of a spot instance can rise beyond its typical historic price. This can make it difficult for a customer to judge the best bid price to use. If the spot price is the same as the on-demand price, it defeats the purpose of using the Spot Instance.
AWS addresses this problem with the notion of a Spot Fleet, in which you specify a certain capacity of instances you want to maintain. If the Spot instances are terminated, the Spot Fleet will automatically backfill the fleet with on-demand instances, allowing you to take advantage of whatever discounts you can, while maintaining your operations.
In any given zone, another potential issue is that capacity of an instance type could be completely exhausted. If capacity is exhausted, it prevents applications from running if they are dependent on a specific instance type or zone.
Not to turn into a commercial for Spot Fleet, but this is addressed as well, by allowing you to specify a range of instance types that would be acceptable for your workload.
Is “Eviction” Driving People Away?
There is one main caveat when it comes to spot instances – they are interruptible. All three major cloud providers have mechanisms in place for these spare capacity resources to be interrupted, related to changes in capacity availability and/or changes in pricing.
This means workloads can be “evicted” from a spot instance or VM. Essentially, this means that if a cloud provider needs the resource at any given time, your workloads can be kicked off.
You are notified when a spot instance is going to be evicted. AWS emits an event two minutes prior to the actual interruption. In Azure, you can opt to receive notifications that tell you when your VM is going to be evicted.
However, you will have only 30 seconds to finish any jobs and perform shutdown tasks prior to the eviction making it almost impossible to manage. Google Cloud also gives you 30 seconds to shut down your instances when you are preempted so you can save your work for later.
Google also always terminates preemptible instances after 24 hours of running. All of this means your application must be designed to be interruptible and should expect it to happen regularly – difficult for some applications, but not so much for others that are rather stateless, or normally process work in small chunks.
AWS also offers an automatic scaling feature that has the ability to increase or decrease the target capacity of your Spot Fleet automatically based on demand. The goal of this is to allow users to scale in conservatively in order to protect your application’s availability.
Early Adopters May be One and the Same
People who are hesitant to build for spot more likely use regular VMs, perhaps with Reserved Instances for savings.
It is likely that people open to the idea of spot instances are the same who would be early adopters for other tech, like serverless, and no longer have a need for Spot.
For the right architecture, spot instances can provide significant savings. It is a matter of whether you want to bother.