Bedrock Provisioned Throughput Cost: Model Units, Commitments, and Break-Even

By AI/ML Practice·Published March 18, 2026·Last updated May 19, 2026·9 min read

Bedrock Provisioned Throughput buys dedicated model capacity by the hour or by commitment term. It is powerful and dangerous — below a clear utilization threshold, on-demand is dramatically cheaper.

Published Mar 2026Cluster AI/ML9 min read

What this coversThe model-unit pricing model, hourly vs 1-month and 6-month commitments, the break-even against on-demand token pricing, when guaranteed throughput justifies the cost, and how to negotiate provisioned capacity into a Bedrock EDP. Written for AI platform leads and architects.

Amazon Bedrock offers two ways to pay for foundation-model inference: on-demand (per-token) and Provisioned Throughput (dedicated capacity, billed by model unit). Provisioned Throughput guarantees throughput and latency for production workloads — and it can either save you a great deal or waste a great deal, depending entirely on whether you keep it busy. This guide gives you the break-even math.

How Provisioned Throughput is priced

Term	Commitment	Relative rate
No commitment (hourly)	Bill per model-unit-hour, cancel anytime	Highest hourly rate
1-month commitment	Lock a model unit for one month	Lower hourly rate
6-month commitment	Lock a model unit for six months	Lowest hourly rate

You buy model units — each unit delivers a defined throughput (tokens per minute) for a specific model. You pay for the unit whether or not you send it traffic. A model unit can run into the high hundreds to low thousands of dollars per month depending on the model and term. That fixed cost is the whole risk: provisioned capacity bills around the clock regardless of utilization.

The utilization ruleProvisioned Throughput only beats on-demand when you keep the model unit highly utilized. Below roughly 50–60% sustained utilization, on-demand per-token pricing is cheaper — and far more flexible.

The break-even against on-demand

On-demand Bedrock bills per 1,000 input and output tokens with zero idle cost. Provisioned Throughput bills a flat model-unit rate with guaranteed throughput. The crossover is a function of how many tokens you actually push through the unit:

Low / spiky volume: on-demand wins — you pay only for tokens used.
High, steady volume that saturates the unit: Provisioned Throughput wins — the per-token equivalent drops below on-demand.
Latency- or throughput-guaranteed production: Provisioned may be required regardless of pure cost, because on-demand offers no capacity guarantee.

The discipline is to compute your monthly token volume, divide by the model unit's throughput, and only provision the number of units a saturated workload justifies. Many teams over-provision "for headroom" and pay for idle units 24/7.

Worked cost example

A customer-support copilot runs steady daytime traffic on a mid-tier model:

On-demand at current volume: ~$5,800/month, but with latency spikes at peak
Two model units on a 6-month commitment: ~$4,400/month flat, guaranteed throughput
Net: Provisioned saves ~$1,400/month and fixes the latency — because the units stay busy

Contrast a batch enrichment job that runs four hours a night: provisioning the same two units would cost $4,400/month for ~17% utilization, versus ~$900/month on-demand. Same service, opposite answer — utilization decides.

Optimization levers

Measure real token throughput before provisioning — size units to saturated demand, not peak headroom.
Keep spiky and batch workloads on-demand; reserve Provisioned for steady, latency-sensitive production.
Use shorter commitment terms until utilization is proven, then step to 6-month for the steady core.
Right-size the model — a smaller model on fewer units often beats a large model under-utilized. See Bedrock AI pricing strategy.
Route by workload — production to Provisioned, experimentation and batch to on-demand or batch pricing.

Provisioned Throughput in your EDP

Provisioned Throughput commitments are exactly the kind of forecastable, contracted spend that belongs in an Enterprise Discount Program. The negotiation pattern:

Forecast model-unit demand by model and by quarter across the commit term.
Negotiate the model-unit rate as a distinct line — AWS has meaningful flex on Bedrock above ~$1M annual commit.
Bundle with on-demand Bedrock, SageMaker, and the broader AI/ML category.
Anchor against direct model-provider APIs (Anthropic, others) and Azure OpenAI as the competitive BATNA.

Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and provisioned model capacity are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better outcomes than a direct rep conversation, and Bedrock Provisioned Throughput is one of the SKUs with the most movement available right now.

Engagement benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings. Bedrock is among the SKUs with the most movement in 2026 as AWS positions against Azure OpenAI.

Common mistakes

Provisioning units "for headroom" and paying for idle capacity
Putting spiky or batch workloads on Provisioned Throughput
Committing to 6-month terms before utilization is proven
Over-sizing the model instead of right-sizing units
Paying list model-unit rates when EDP-tier Bedrock discounts are available

The bottom line

Bedrock Provisioned Throughput is a utilization bet: it wins decisively on steady, saturated, latency-sensitive production and loses badly on spiky or idle workloads. Measure throughput, route workloads by pattern, and commit only to proven demand — that discipline plus EDP-tier rate negotiation typically cuts Bedrock production costs 30–50%. Read it with our on-demand vs batch pricing and Bedrock pricing strategy guides.

For a Bedrock cost audit before your next EDP renewal, contact us. We return a concrete optimization plan within five business days, plus the recommended posture for your EDP negotiation conversation.

Bedrock Provisioned Throughput Cost: Model Units, Commitments, and Break-Even

How Provisioned Throughput is priced

The break-even against on-demand

Worked cost example

Optimization levers

Provisioned Throughput in your EDP

Common mistakes

The bottom line

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

Explore more AWS cost & negotiation guides

How Provisioned Throughput is priced

The break-even against on-demand

Worked cost example

Optimization levers

Provisioned Throughput in your EDP

Common mistakes

The bottom line

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Explore more AWS cost & negotiation guides

Your AWS bill
is negotiable.