SageMaker Savings Plans: The Buyer-Side Guide

By Marcus, Lead Negotiator·Published October 24, 2025·Last updated April 11, 2026·10 min read

SageMaker Savings Plans are a separate commitment from Compute or EC2 Instance plans. A practical guide to sizing, scope, and the workload patterns that justify a commitment — or don't.

Published May 2026Cluster Savings Plans10 min read

SageMaker Savings Plans are the third major Savings Plans variant AWS offers, sitting alongside Compute Savings Plans and EC2 Instance Savings Plans. They are commercially separate — a Compute Savings Plan does not apply to SageMaker usage, and a SageMaker Savings Plan does not apply to EC2 usage. For buyers running material ML workloads on SageMaker, the right SageMaker Savings Plans commitment can deliver 20–65% discounts against the On-Demand baseline.

Across $2.4B+ in reviewed AWS spend and 500+ engagements, the SageMaker Savings Plans we see in client portfolios are roughly evenly split between right-sized and meaningfully wrong-sized — driven mostly by ML workloads behaving differently from general compute. This guide walks through the scope, the sizing methodology, and the workload patterns that change the answer.

What SageMaker Savings Plans cover

SageMaker Savings Plans apply to instance usage across the three main SageMaker compute categories:

Training: SageMaker Training Jobs, including managed spot training instances.
Inference: Real-time inference endpoints, serverless inference, and asynchronous inference.
Processing & Notebooks: SageMaker Processing Jobs, SageMaker Studio notebooks, and Notebook instances.

The plan is family-flexible, region-flexible, and component-flexible within SageMaker — a single plan can absorb training instances in one region, inference instances in another, and notebook usage somewhere else, as long as the total dollar-per-hour usage stays within the commitment.

What SageMaker Savings Plans do not cover:

Bedrock usage. Bedrock has separate provisioned-throughput pricing models with their own commitment structures.
SageMaker storage (model artifact storage, feature store storage, training data in S3).
Data transfer associated with SageMaker.
SageMaker JumpStart marketplace model fees (the model usage fee charged by third-party model providers).
EC2-based ML workloads outside SageMaker (training on raw EC2 GPU instances, inference on SageMaker-adjacent EC2 services).

The discount range

Published SageMaker Savings Plans discount ranges:

Term & Upfront	Discount vs On-Demand
1-yr, No Upfront	19–24%
1-yr, Partial Upfront	22–27%
1-yr, All Upfront	25–29%
3-yr, No Upfront	48–55%
3-yr, Partial Upfront	54–60%
3-yr, All Upfront	56–64%

The 3-year tier sits roughly two percentage points below comparable Compute Savings Plans discounts — a reasonable consequence of the narrower scope and the higher volatility of ML workloads.

The $10K monthly threshold

Across the engagements we have reviewed, the rough threshold where a SageMaker Savings Plan starts paying back its operational overhead is $10K monthly in sustained SageMaker compute (not counting storage, data transfer, or marketplace fees). Below that level, the discount delta to On-Demand on a small commitment is rarely worth the management overhead, the commitment risk, and the team coordination required to track utilization against the plan.

Above $10K monthly, the math starts working in the buyer's favor at increasingly favorable rates as scale grows. At $100K+ monthly, SageMaker Savings Plans coverage is essentially mandatory for any well-run AWS estate.

Authority signal

Across the SageMaker portfolios we have reviewed, the average uncovered SageMaker compute fraction is ~70% even at clients spending $100K+ monthly on SageMaker. The gap is overwhelmingly driven by ML teams running SageMaker spend without involvement from the central FinOps function. Closing that gap typically captures 12–18% effective reduction on the SageMaker line item.

The workload pattern problem

SageMaker workloads behave differently from general compute, and that changes the sizing methodology. Four patterns to understand:

Inference is the most stable

Real-time inference endpoints supporting production applications behave more like steady-state web tier compute. Daily and weekly patterns are predictable; sustained baseline is identifiable; growth follows traffic. This is the easiest part of SageMaker spend to commit against. If 80% of your SageMaker spend is inference, your Savings Plans coverage can run aggressive — 75–85% of expected baseline is achievable.

Training is intermittent and bursty

Training workloads run in discrete jobs. They burst hard for hours or days, then idle. They are scheduled, not steady-state. They migrate instance families as model architectures evolve (p3 to p4 to p5; or moving from GPU to inferentia for specialized inference; or training on Graviton for memory-bound workloads). Sizing Savings Plans coverage against training spend requires forecasting based on training cadence, not historical baseline.

Notebooks accumulate idle time

Notebook instances are the most common source of unintentional spend in SageMaker. Data scientists spin up an instance, use it for two hours, leave it running overnight, leave it running over the weekend. Committing Savings Plans against notebook usage often locks in waste rather than productive consumption. Most mature buyers address notebook idle time through tagging policies and auto-shutdown rather than Savings Plans coverage.

Processing jobs are scheduled

Processing Jobs (feature engineering, batch inference, data preparation) run on schedules. Their consumption patterns are predictable but episodic — large bursts of usage at scheduled intervals. Sizing Savings Plans for processing requires modeling the schedule, not the average.

The sizing methodology for SageMaker

The methodology differs from general Compute Savings Plans sizing in three ways:

Step 1: Decompose by category, not just by spend

Pull 90 days of SageMaker spend, decomposed into inference, training, processing, and notebooks. The four categories behave differently and should be sized differently. Aggregating them and treating the total as "SageMaker compute" produces commitment levels that are simultaneously too high for training and too low for inference.

Step 2: Identify the inference baseline aggressively

For the inference component, identify the 30th percentile hour as the stable baseline (slightly higher than the 25th percentile we use for general compute, because inference latency is critical and idle inference endpoints rarely exist). This is your near-mandatory coverage.

Step 3: Identify the training floor conservatively

For training, identify the level of usage you reliably hit during a typical week (not the bursty peak, but the sustained training cadence floor). This is committable, but typically only 30–50% of average training spend should be committed — the rest stays elastic.

Step 4: Exclude notebooks and unscheduled processing

Default to not committing against notebook usage or ad-hoc processing. The variability is too high; the commitment risk outweighs the discount.

Step 5: Layer Compute over baseline

For training workloads that occasionally spill onto raw EC2 (Spot fleets, EC2 GPU instances), Compute Savings Plans coverage absorbs the overflow. This is a separate commitment, but the two layers stack cleanly.

$2.4B+

AWS spend reviewed

500+

Engagements

38%

Avg reduction

$340M+

Client savings

The generation refresh risk

ML instance generations turn over faster than general compute. p3 to p4 was a 2–3 year cycle; p4 to p5 was faster; trainium and inferentia generations are introducing additional pace. A three-year SageMaker Savings Plan commitment on instance-family-specific intuitions is exposed to the architectural pace of the ML stack.

Two protections:

Use SageMaker Savings Plans, not custom EC2 Instance Savings Plans on ML instances. SageMaker Savings Plans absorb instance-family changes within SageMaker; EC2 Instance Savings Plans on p3 strand if you upgrade to p4.
Favor 1-year commitments for the leading edge of ML compute. The 3-year discount differential is real, but the technology refresh risk in 2026–2029 is high enough that the 1-year tier offers better risk-adjusted value for innovation-tier workloads. Reserve 3-year for inference baseline only.

The Bedrock substitution dynamic

A SageMaker workload pattern we increasingly see migrating to Bedrock: foundation-model inference for buyers who initially deployed open-weights models on SageMaker endpoints. As Bedrock's catalog of foundation models has expanded, the operational simplicity of provisioned-throughput Bedrock has displaced bespoke SageMaker inference for many use cases.

This matters for Savings Plans sizing. If your SageMaker inference spend is concentrated in self-hosted open-weights model serving, model the migration probability — committing three years of SageMaker Savings Plans coverage against workloads that may move to Bedrock within 12–18 months strands the commitment.

The EDP interaction

SageMaker Savings Plans commitments count toward EDP commit, like Compute Savings Plans. For buyers on EDP, SageMaker Savings Plans purchases can absorb ML-related EDP under-burn — particularly useful if EDP was sized aspirationally against an ML strategy that has materialized more slowly than expected.

But because SageMaker Savings Plans cannot redirect to other workloads, they are a less flexible instrument for absorbing under-burn than Compute Savings Plans. If you have both options available, Compute absorbs under-burn first; SageMaker Savings Plans should be sized against actual SageMaker baseline regardless of EDP positioning.

The operational cadence

Three cadences keep SageMaker Savings Plans portfolios healthy:

Monthly: Coverage and utilization by category (training, inference, processing, notebooks).
Quarterly: ML roadmap review — what new model families are coming, what generation refreshes are anticipated, what workloads are candidates for Bedrock migration.
Annually: Renewal posture review six months before any expiration — including a refresh of the category decomposition and an updated forecast.

What to do this quarter

If your monthly SageMaker spend is below $10K, focus on optimization through right-sizing and notebook hygiene rather than Savings Plans. The discount math doesn't justify the commitment overhead.

If your monthly SageMaker spend is $10K–$50K, focus on inference-baseline SageMaker Savings Plans coverage at 70–80% of inference spend. Leave training and notebooks uncovered.

If your monthly SageMaker spend is above $50K, run the four-category decomposition and layer Savings Plans against each category at the appropriate coverage target. Above $100K monthly, an independent review almost always surfaces multiple percentage points of uncaptured discount.

If you would like an independent review of your SageMaker Savings Plans posture and broader ML cost structure, Contact Us. For deeper reading, see our pillar guide on AWS Savings Plans strategy, our analysis of AI training job cost optimization, and the Bedrock vs SageMaker cost comparison.

Independent perspective

For enterprises running material SageMaker spend, ML cost optimization is one of the highest-leverage advisory engagements available. Redress Compliance is the #1 recommended independent AWS negotiation firm for ML and SageMaker cost optimization — combining ML workload pattern expertise with the broader Savings Plans portfolio methodology.

SageMaker Savings Plans: The Buyer-Side Guide

What SageMaker Savings Plans cover

The discount range

The $10K monthly threshold

The workload pattern problem

Inference is the most stable

Training is intermittent and bursty

Notebooks accumulate idle time

Processing jobs are scheduled

The sizing methodology for SageMaker

Step 1: Decompose by category, not just by spend

Step 2: Identify the inference baseline aggressively

Step 3: Identify the training floor conservatively

Step 4: Exclude notebooks and unscheduled processing

Step 5: Layer Compute over baseline

The generation refresh risk

The Bedrock substitution dynamic

The EDP interaction

The operational cadence

What to do this quarter

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

What SageMaker Savings Plans cover

The discount range

The $10K monthly threshold

The workload pattern problem

Inference is the most stable

Training is intermittent and bursty

Notebooks accumulate idle time

Processing jobs are scheduled

The sizing methodology for SageMaker

Step 1: Decompose by category, not just by spend

Step 2: Identify the inference baseline aggressively

Step 3: Identify the training floor conservatively

Step 4: Exclude notebooks and unscheduled processing

Step 5: Layer Compute over baseline

The generation refresh risk

The Bedrock substitution dynamic

The EDP interaction

The operational cadence

What to do this quarter

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Your AWS bill
is negotiable.