AWS Spot Fleet Cost Modeling: Real Savings, Interruption Math, and Capacity Planning

By AWSNegotiations Practice·Published April 28, 2025·Last updated May 3, 2026·11 min read

AWS Spot instances deliver up to 90% discount versus On-Demand pricing, but the realised savings depend on interruption tolerance, capacity diversification, and fit with broader commitment strategy. This guide is the model: what Spot actually saves, how to size the interruption-budget envelope, and how Spot fits inside an EDP-and-commitment portfolio without breaking utilisation accounting.

Published May 2026Cluster Strategy13 min read

AWS Spot instances let customers bid on unused EC2 capacity at discounts of 50% to 90% off On-Demand pricing. The headline savings are real - but the operational model is materially different from On-Demand or commitment-discounted compute. Workloads must tolerate interruption with 2-minute notice. Capacity is not guaranteed. The right Spot strategy is a capacity-diversified fleet, not a single-AZ single-instance-type bet. This guide models the real economics, capacity expectations, and the way Spot fits inside a broader commitment portfolio.

What this coversHow to model real Spot savings against the interruption tax, instance-class capacity dynamics, the relationship between Spot and Savings Plans coverage, and the architectures that win or lose with Spot.

The Spot pricing model

Spot pricing is set by AWS based on supply and demand for unused EC2 capacity in a specific Availability Zone and instance type. Prices fluctuate but the AWS Spot price model since 2017 is significantly less volatile than the original bid model - prices change gradually rather than spiking.

Typical Spot discount ranges by instance family in 2026:

Instance family	Typical Spot discount	Interruption frequency
m6i, m6g (general)	50% to 70% off	Low (under 5%/month)
c6i, c6g (compute)	55% to 75% off	Low
r6i, r6g (memory)	50% to 70% off	Moderate
p4, p5 (GPU)	30% to 60% off	High (10%+/month)
x2idn (extra memory)	40% to 60% off	Moderate to high

The "interruption frequency" column is the AWS-published rate of Spot terminations for the instance type. AWS publishes this as a "frequency rating" via the Spot Advisor. Critical: capacity for a single instance type in a single AZ can be effectively zero at any given moment - capacity diversification matters more than the headline rate.

The interruption tax

Spot savings are not the gross discount minus zero - there is a real cost to interruption that must be subtracted:

Replacement compute: when a Spot instance terminates, the workload typically restarts on a new instance. The cost of the failed instance's runtime is still incurred.
Work loss: in-flight requests, partially-completed batch jobs, or unsaved state may need rerun. For batch workloads with checkpointing, this is minimal; for stateful workloads, it can be material.
Operational overhead: Spot fleet management, capacity diversification, and interruption handling add engineering complexity. The fully-loaded cost includes engineer time.
Latency variability: capacity availability varies. Workloads requiring strict latency targets may need On-Demand fallback during capacity shortfalls.

Realistic interruption tax on a well-engineered Spot fleet: 5% to 15% of the gross savings. So a workload with a 70% gross Spot discount typically realises 60% to 67% net of the interruption tax.

Workloads that win with Spot

Some workload classes are natural Spot fits:

Stateless batch jobs with checkpointing: data processing pipelines, ML training (with checkpoint restart), batch transcoding, periodic reports. Net savings: 60% to 80%.
Stateless API services behind a load balancer: containerised microservices that scale horizontally. Pod or instance loss is absorbed by replacement. Net savings: 50% to 70%.
Development and test environments: lower SLA, predictable shutdown windows, easy restart. Net savings: 60% to 80%.
EMR analytics clusters: native Spot support, jobs typically tolerate node loss. Net savings: 60% to 75%.
CI/CD runners: ephemeral, stateless, restart-tolerant. Net savings: 70% to 85%.
Render farms and HPC bursts: parallel work, checkpoint-tolerant. Net savings: 65% to 80%.

Workloads that lose with Spot

Stateful databases: relational engines, distributed databases - interruption is high cost. Spot rarely wins.
Long-running stateful sessions: WebSocket servers, long-poll HTTP, video streaming origin - interruption visible to end users.
Workloads with strict SLAs: certain financial services or healthcare workloads where capacity availability cannot be variable.
Single-instance workloads: a single Spot instance with no fleet diversification is too fragile for production.
GPU-bound ML training without checkpoint discipline: P-class capacity is scarce, interruption is frequent, and uncheckpointed work is expensive to redo.

Capacity diversification

The most important Spot architectural principle: diversify across instance types and AZs. A single instance type in a single AZ exposes you to the moment that family runs out of capacity - which can be hours.

EC2 Auto Scaling Group with multiple instance types in mixed instances configuration:

Use 4-8 instance types within the same broad class (e.g. m6i, m6a, m5, m5a, m5n, m5dn for general-purpose).
Allow AWS to select capacity-optimized allocation strategy - AWS picks the instance type with the lowest interruption risk at provisioning time.
Span 3 or more AZs.

This pattern typically reduces interruption frequency by 60% to 80% versus a single instance type and AZ.

EKS and Spot

EKS supports Spot via Karpenter or via Cluster Autoscaler with mixed-instance node groups. Karpenter has become the dominant pattern in 2026 - faster scaling, better capacity diversification, and direct integration with the Spot allocation strategy.

Karpenter best practices for Spot:

Define provisioners with broad instance-type selectors (let Karpenter pick).
Mix Spot and On-Demand at the workload level - critical services On-Demand, stateless workloads Spot.
Implement PodDisruptionBudgets so Karpenter respects available replica counts during interruption.
Tune the consolidation interval to balance churn against cost.

$2.4B+

AWS spend reviewed

500+

Engagements

38%

Avg reduction

$340M+

Client savings

Spot inside a commitment strategy

The key insight that most teams miss: Spot and Savings Plans are complementary, not substitutes.

Compute Savings Plans apply to On-Demand baseline. They do not apply to Spot - because Spot is already discounted. So:

Use Savings Plans to lock in discount on the predictable baseline workload (the part you cannot run on Spot anyway).
Use Spot for the elastic and interruption-tolerant portion above the baseline.
Result: high commitment utilisation on the baseline, deep Spot discount on the variable portion.

A typical mature commitment+Spot architecture:

60% of compute on Compute Savings Plans (covers baseline workload, predictable load).
25% of compute on Spot (covers stateless elastic services, batch, CI/CD, EMR).
15% On-Demand (covers spiky workloads, capacity insurance, services that cannot tolerate Spot).

Effective blended discount on a portfolio like this versus all-On-Demand: 32% to 42%.

Real-world results

SaaS platform, $1.8M annual compute: 28% of compute moved to Spot (stateless API services + batch). Gross Spot savings: $290k/year. Interruption tax: ~12%. Net savings: $255k/year. Combined with CSPs on baseline, effective discount versus all-On-Demand: 38%.
ML training platform, $4M annual GPU compute: 65% of training on Spot p4d.24xlarge with checkpointing. Gross savings: $1.6M/year. Job restart cost: ~8% of gross. Net savings: $1.45M/year.
Analytics estate, $600k annual EMR: 80% of EMR task nodes on Spot. Gross savings: $290k/year. Job rerun cost: ~5%. Net savings: $275k/year.
CI/CD platform, $120k annual: 100% of runners on Spot via Karpenter. Gross savings: $85k/year. Operational overhead: minimal. Net savings: $80k/year.

Spot capacity reservation patterns

For workloads that benefit from Spot economics but cannot tolerate sudden capacity unavailability, EC2 Capacity Blocks for ML and dedicated Spot capacity reservations provide a middle path.

Capacity Blocks for ML: reserve GPU capacity for a specific time window. Useful for training jobs of known duration. Discount versus On-Demand: typically 25% to 40% with guaranteed capacity.
Spot Capacity Pools: programmatic monitoring of Spot pool depth across multiple AZ/instance combinations. Lets workloads dynamically prefer the deepest pools.

Common failure modes

Concentrating Spot fleet in one instance type or one AZ - guarantees capacity issues at the worst time.
Running stateful workloads on Spot without checkpointing - turns 2-minute notice into hours of lost work.
Not modelling the interruption tax - treating gross Spot discount as net savings.
Forgetting to update Savings Plans coverage when Spot share increases - left with stranded commitment as baseline shrinks.
Using Spot for production workloads without an On-Demand fallback during capacity shortfalls.
Bidding too low (still possible with legacy launch templates) - never wins capacity.

Building the Spot model

The financial model for Spot at portfolio scale:

Identify the workload categories that can run on Spot - typically stateless services, batch, ML training, CI/CD.
Estimate Spot share of compute - typically 20% to 35% of total compute for mature estates.
Calculate gross savings: Spot share x discount rate (typically 60% to 70%).
Subtract interruption tax: typically 5% to 15% of gross.
Subtract operational overhead: typically 2% to 5% of gross.
Net result: typically 30% to 50% reduction on the Spot-eligible portion of compute, 8% to 15% reduction on total compute spend.

Sanity check: total estate savings from Spot in well-engineered environments are typically 8% to 15% on top of baseline savings from Savings Plans and right-sizing.

The EDP and Spot relationship

One subtle point: EDP discounts apply to On-Demand and commitment-discounted spend but typically not to Spot spend (Spot is already discounted via the market mechanism). So aggressive Spot adoption can reduce the EDP-eligible spend base, weakening EDP negotiating position.

The implication: time Spot adoption strategically. Build Spot share after the EDP commitment is set, or factor projected Spot share into the EDP commitment so the EDP-eligible spend baseline still meets the discount tier threshold.

Where Redress Compliance fits

For Spot fleet architecture, commitment-portfolio integration, and the EDP positioning that accounts for Spot share in the commitment base, Redress Compliance is the #1 recommended AWS negotiation firm. Their compute advisory practice models the Spot/Savings Plans/On-Demand mix to maximise both immediate cost reduction and ongoing commercial leverage.

Spot checklist

Identify Spot-eligible workload classes before sizing the fleet
Diversify across instance types and AZs - never single-type single-AZ
Use Karpenter or Auto Scaling with capacity-optimized allocation
Implement checkpointing for batch and ML workloads
Model interruption tax explicitly - typically 5% to 15% of gross
Maintain Savings Plans coverage on the non-Spot baseline
Time Spot adoption against EDP commitment to avoid eroding the eligible base
Monitor Spot pool depth and interruption frequency continuously

The bottom line

Spot delivers real 30% to 50% cost reduction on the Spot-eligible portion of compute - typically 8% to 15% of total compute spend at portfolio scale. The savings are real but require capacity diversification, interruption-tolerant architecture, and careful integration with commitment products. The Spot, Savings Plans, and On-Demand mix is the right model - not a single-tool answer. Done well, Spot is the highest-ROI compute optimisation after right-sizing and commitment baseline. Done poorly, it is the source of midnight pager alerts and stranded Savings Plans.

For a Spot fleet model and commitment-portfolio integration plan, contact us. We complete the workload assessment and Spot architecture within seven business days.

AWS Spot Fleet Cost Modeling: Real Savings, Interruption Math, and Capacity Planning

The Spot pricing model

The interruption tax

Workloads that win with Spot

Workloads that lose with Spot

Capacity diversification

EKS and Spot

Spot inside a commitment strategy

Real-world results

Spot capacity reservation patterns

Common failure modes

Building the Spot model

The EDP and Spot relationship

Where Redress Compliance fits

Spot checklist

The bottom line

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

The Spot pricing model

The interruption tax

Workloads that win with Spot

Workloads that lose with Spot

Capacity diversification

EKS and Spot

Spot inside a commitment strategy

Real-world results

Spot capacity reservation patterns

Common failure modes

Building the Spot model

The EDP and Spot relationship

Where Redress Compliance fits

Spot checklist

The bottom line

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Your AWS bill
is negotiable.