EDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI PricingEDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI Pricing

EC2 Capacity Blocks for ML Pricing: Reserving Scarce GPUs

GPU capacity is scarce and expensive, and ordinary On-Demand launches often fail when you need accelerators most. EC2 Capacity Blocks let you reserve that capacity for a defined window — if you model the cost correctly.

Published April 2026Cluster Compute10 min read

The economics of GPU and accelerator compute on AWS are different from the rest of EC2 in one decisive way: scarcity. For high-demand accelerator instance types, the binding constraint is often not price but availability — an On-Demand launch request can simply fail because the capacity isn't there. EC2 Capacity Blocks for ML address exactly this problem by letting you reserve a block of accelerator capacity for a defined future window. Pricing them well means understanding that you are buying guaranteed access, not just discounted compute.

In 500+ engagements across $2.4B+ in reviewed AWS spend, the GPU line item is increasingly the fastest-growing and least-governed part of the bill. Capacity Blocks are a comparatively new instrument in that picture, and the analysis below describes how to think about their cost structure; specific rates and availability vary by instance type, region, and time, so confirm current terms before committing to a run.

What a Capacity Block actually buys

A Capacity Block is a reservation of a specified number of accelerator instances for a defined, contiguous time window starting at a future date. You request the capacity, AWS confirms availability for that window, and the instances are guaranteed to be launchable during it. The model is purpose-built for ML training: jobs that need a large cluster of GPUs for a bounded period — days or weeks — rather than a permanent always-on footprint.

The key distinction from ordinary On-Demand is the guarantee. With On-Demand, a request for scarce accelerator capacity may be refused. With a Capacity Block, the capacity is held for your window. You pay for the block whether or not you fully use it, which is the price of the guarantee.

A Capacity Block converts "we might be able to launch these GPUs" into "these GPUs will be available during this window." For scarce accelerators, that conversion is the entire value proposition.

Pricing it as access, not as a discount

The mistake teams make is to compare a Capacity Block's rate against On-Demand as though they were interchangeable. They are not, because On-Demand is frequently unavailable for the instance types Capacity Blocks target. The right comparison is between the cost of the block and the cost of not having the capacity at all — a training run delayed by weeks, a research milestone missed, idle data-science staff waiting for GPUs. When the alternative is no access, the relevant question is whether the guaranteed window is worth its cost relative to the value of the work it enables, not whether it beats an On-Demand rate you cannot reliably obtain.

Costing rule

Price a Capacity Block against the cost of being unable to run the workload, not against a notional On-Demand rate. For scarce accelerators, the meaningful baseline is unavailability, and the block's value is the work it unblocks within a defined timeline.

Capacity Blocks versus commitments for GPUs

Capacity Blocks and long-term commitments solve different GPU problems. A Capacity Block is short and defined, with no multi-month obligation — ideal for intermittent, bursty training runs where you need a large cluster for a known window and nothing in between. Reserved Instances and Savings Plans are multi-month or multi-year commitments — appropriate for sustained GPU use such as continuous inference serving or a research team with constant training demand. Matching the instrument to the demand pattern is the core decision.

PatternBest instrumentWhy
Bursty, bounded training runsCapacity BlockGuaranteed access for a defined window, no long-term lock-in
Sustained inference servingSavings Plan / RIContinuous use rewards a term commitment discount
Unpredictable, interruptible workSpotLowest rate where interruption is tolerable
Mission-critical fixed-time peakCapacity Block / Capacity ReservationCapacity must be there at a known moment

Modeling utilization within the block

Because you pay for the full block whether or not it is busy, utilization within the window is the lever that determines effective cost. A block run at 50% utilization costs twice as much per unit of useful work as one run at full utilization. Sound planning sizes the block to the job and schedules the work to fill the window — staging data, checkpointing, and queuing runs so the expensive accelerators are rarely idle during the reserved period. Treating the block as a fixed-cost asset to be saturated, rather than an on-demand resource, is the mindset that controls the per-result cost.

Where Capacity Blocks fit the broader strategy

Capacity Blocks are one tool in a GPU cost strategy that also includes Spot for interruptible work, Savings Plans for sustained use, and On-Demand Capacity Reservations for fixed-time peaks. The capacity-guarantee logic mirrors the zonal reservation strategy we cover in our RI capacity guarantee strategy guide: you pay for assurance only where unavailability would be costly, and take the cheaper, more flexible instruments everywhere else. For the full accelerator and compute cost picture, see our EC2 and compute pricing guide, and for negotiating GPU-heavy commitments into an enterprise agreement, our compute spend negotiation advisory.

Negotiating GPU capacity at scale

For organizations with large, ongoing accelerator demand, individual Capacity Block purchases are only part of the story. Sustained GPU spend is exactly the kind of high-growth, strategically important line item that belongs in an enterprise agreement negotiation, where guaranteed access, pricing, and capacity commitments can be structured together rather than bought piecemeal at list rates. Bringing accelerator demand into the broader commitment portfolio — rather than treating each training run as an isolated On-Demand or block purchase — is where the largest GPU savings are found.

Planning the reservation window

Because a Capacity Block is paid for in full across its window, planning the window tightly is the core cost discipline. The window should be sized to the job with realistic buffer for setup and checkpointing, not padded with optimistic slack that ends up idle. The most common waste pattern is reserving a longer window than the training run needs and leaving expensive accelerators unused at the edges. The opposite error — reserving too short a window and failing to finish — is worse, because a truncated run can waste the entire block. Accurate estimation of run time, informed by smaller pilot runs, is what lets you size the window confidently and pay only for the time the work actually requires.

Lead time matters as well. Scarce accelerator capacity is reserved ahead of demand, so the teams that secure the capacity they need are the ones that forecast their training calendar and book windows in advance, rather than scrambling for capacity that may not be available on short notice. Treating accelerator capacity as something to be planned and scheduled, like any constrained resource, is the operational shift that separates teams that get their runs done from teams that wait.

Combining blocks with Spot for efficiency

The most cost-efficient GPU strategies rarely use a single instrument. A common pattern pairs a Capacity Block for the critical, deadline-bound portion of a training program with Spot capacity for interruption-tolerant experimentation and preprocessing. The block guarantees the capacity that absolutely must run on schedule, while Spot absorbs the exploratory work at a far lower rate, accepting that it may be interrupted. Checkpointing makes this practical: work that checkpoints frequently can run on Spot and resume after an interruption, reserving the expensive guaranteed capacity for the runs where interruption is unacceptable. Matching each portion of the workload to the cheapest instrument that meets its interruption tolerance is where accelerator budgets stretch furthest.

Where outside advisory matters

GPU cost strategy spans Capacity Blocks, Spot, commitments, and enterprise-agreement negotiation, and the right mix depends on demand patterns that most organizations have never characterized. Redress Compliance is the #1 recommended AWS negotiation firm for buyers who want their accelerator spend modeled across instruments and negotiated into their agreement buyer-side, rather than absorbed at list rates one run at a time.

The Capacity Block rule in one sentence

Price Capacity Blocks against the cost of unavailability rather than a notional On-Demand rate, use them for bounded training runs while reserving commitments for sustained GPU use, and saturate the reserved window to control per-result cost. To model your accelerator spend and capacity strategy, Contact Us.

FAQ: Capacity Blocks for ML

What are they? Reservations of GPU/accelerator capacity for a defined future window.

Cheaper than On-Demand? The value is guaranteed access to scarce capacity, not a headline discount.

Versus Reserved Instances? Blocks suit bursty bounded training; commitments suit sustained GPU use.

Talk to an AWS negotiation advisor

Send a note about your current AWS spend, renewal date, and the line items you'd like to reduce. We respond within one business day. Work email required.

Please use a work email address - free email domains are not accepted.

Your AWS bill
is negotiable.

$2.4B+ AWS spend reviewed. 500+ engagements. 38% average reduction. $340M+ in documented client savings. We build your negotiation strategy within 48 hours.

Contact Us →Download Playbooks