EDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI PricingEDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI Pricing

Bedrock Provisioned Throughput Cost: Model Units, Commitments, and Break-Even

Bedrock Provisioned Throughput buys dedicated model capacity by the hour or by commitment term. It is powerful and dangerous — below a clear utilization threshold, on-demand is dramatically cheaper.

Published Mar 2026Cluster AI/ML9 min read
What this coversThe model-unit pricing model, hourly vs 1-month and 6-month commitments, the break-even against on-demand token pricing, when guaranteed throughput justifies the cost, and how to negotiate provisioned capacity into a Bedrock EDP. Written for AI platform leads and architects.

Amazon Bedrock offers two ways to pay for foundation-model inference: on-demand (per-token) and Provisioned Throughput (dedicated capacity, billed by model unit). Provisioned Throughput guarantees throughput and latency for production workloads — and it can either save you a great deal or waste a great deal, depending entirely on whether you keep it busy. This guide gives you the break-even math.

How Provisioned Throughput is priced

TermCommitmentRelative rate
No commitment (hourly)Bill per model-unit-hour, cancel anytimeHighest hourly rate
1-month commitmentLock a model unit for one monthLower hourly rate
6-month commitmentLock a model unit for six monthsLowest hourly rate

You buy model units — each unit delivers a defined throughput (tokens per minute) for a specific model. You pay for the unit whether or not you send it traffic. A model unit can run into the high hundreds to low thousands of dollars per month depending on the model and term. That fixed cost is the whole risk: provisioned capacity bills around the clock regardless of utilization.

The utilization ruleProvisioned Throughput only beats on-demand when you keep the model unit highly utilized. Below roughly 50–60% sustained utilization, on-demand per-token pricing is cheaper — and far more flexible.

The break-even against on-demand

On-demand Bedrock bills per 1,000 input and output tokens with zero idle cost. Provisioned Throughput bills a flat model-unit rate with guaranteed throughput. The crossover is a function of how many tokens you actually push through the unit:

  • Low / spiky volume: on-demand wins — you pay only for tokens used.
  • High, steady volume that saturates the unit: Provisioned Throughput wins — the per-token equivalent drops below on-demand.
  • Latency- or throughput-guaranteed production: Provisioned may be required regardless of pure cost, because on-demand offers no capacity guarantee.

The discipline is to compute your monthly token volume, divide by the model unit's throughput, and only provision the number of units a saturated workload justifies. Many teams over-provision "for headroom" and pay for idle units 24/7.

Worked cost example

A customer-support copilot runs steady daytime traffic on a mid-tier model:

  • On-demand at current volume: ~$5,800/month, but with latency spikes at peak
  • Two model units on a 6-month commitment: ~$4,400/month flat, guaranteed throughput
  • Net: Provisioned saves ~$1,400/month and fixes the latency — because the units stay busy

Contrast a batch enrichment job that runs four hours a night: provisioning the same two units would cost $4,400/month for ~17% utilization, versus ~$900/month on-demand. Same service, opposite answer — utilization decides.

Optimization levers

  1. Measure real token throughput before provisioning — size units to saturated demand, not peak headroom.
  2. Keep spiky and batch workloads on-demand; reserve Provisioned for steady, latency-sensitive production.
  3. Use shorter commitment terms until utilization is proven, then step to 6-month for the steady core.
  4. Right-size the model — a smaller model on fewer units often beats a large model under-utilized. See Bedrock AI pricing strategy.
  5. Route by workload — production to Provisioned, experimentation and batch to on-demand or batch pricing.

Provisioned Throughput in your EDP

Provisioned Throughput commitments are exactly the kind of forecastable, contracted spend that belongs in an Enterprise Discount Program. The negotiation pattern:

  1. Forecast model-unit demand by model and by quarter across the commit term.
  2. Negotiate the model-unit rate as a distinct line — AWS has meaningful flex on Bedrock above ~$1M annual commit.
  3. Bundle with on-demand Bedrock, SageMaker, and the broader AI/ML category.
  4. Anchor against direct model-provider APIs (Anthropic, others) and Azure OpenAI as the competitive BATNA.

Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and provisioned model capacity are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better outcomes than a direct rep conversation, and Bedrock Provisioned Throughput is one of the SKUs with the most movement available right now.

Engagement benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings. Bedrock is among the SKUs with the most movement in 2026 as AWS positions against Azure OpenAI.

Common mistakes

  • Provisioning units "for headroom" and paying for idle capacity
  • Putting spiky or batch workloads on Provisioned Throughput
  • Committing to 6-month terms before utilization is proven
  • Over-sizing the model instead of right-sizing units
  • Paying list model-unit rates when EDP-tier Bedrock discounts are available

The bottom line

Bedrock Provisioned Throughput is a utilization bet: it wins decisively on steady, saturated, latency-sensitive production and loses badly on spiky or idle workloads. Measure throughput, route workloads by pattern, and commit only to proven demand — that discipline plus EDP-tier rate negotiation typically cuts Bedrock production costs 30–50%. Read it with our on-demand vs batch pricing and Bedrock pricing strategy guides.

For a Bedrock cost audit before your next EDP renewal, contact us. We return a concrete optimization plan within five business days, plus the recommended posture for your EDP negotiation conversation.

Talk to an AWS negotiation advisor

Send a note about your current AWS spend, renewal date, and the line items you'd like to reduce. We respond within one business day. Work email required.

Please use a work email address — free email domains are not accepted.

Your AWS bill
is negotiable.

$2.4B+ AWS spend reviewed. 500+ engagements. 38% average reduction. $340M+ in documented client savings. We build your negotiation strategy within 48 hours.

Contact Us →Download Playbooks