Bedrock Provisioned Throughput Cost: Model Units, Commitments, and Break-Even
Bedrock Provisioned Throughput buys dedicated model capacity by the hour or by commitment term. It is powerful and dangerous — below a clear utilization threshold, on-demand is dramatically cheaper.
Amazon Bedrock offers two ways to pay for foundation-model inference: on-demand (per-token) and Provisioned Throughput (dedicated capacity, billed by model unit). Provisioned Throughput guarantees throughput and latency for production workloads — and it can either save you a great deal or waste a great deal, depending entirely on whether you keep it busy. This guide gives you the break-even math.
How Provisioned Throughput is priced
| Term | Commitment | Relative rate |
|---|---|---|
| No commitment (hourly) | Bill per model-unit-hour, cancel anytime | Highest hourly rate |
| 1-month commitment | Lock a model unit for one month | Lower hourly rate |
| 6-month commitment | Lock a model unit for six months | Lowest hourly rate |
You buy model units — each unit delivers a defined throughput (tokens per minute) for a specific model. You pay for the unit whether or not you send it traffic. A model unit can run into the high hundreds to low thousands of dollars per month depending on the model and term. That fixed cost is the whole risk: provisioned capacity bills around the clock regardless of utilization.
The break-even against on-demand
On-demand Bedrock bills per 1,000 input and output tokens with zero idle cost. Provisioned Throughput bills a flat model-unit rate with guaranteed throughput. The crossover is a function of how many tokens you actually push through the unit:
- Low / spiky volume: on-demand wins — you pay only for tokens used.
- High, steady volume that saturates the unit: Provisioned Throughput wins — the per-token equivalent drops below on-demand.
- Latency- or throughput-guaranteed production: Provisioned may be required regardless of pure cost, because on-demand offers no capacity guarantee.
The discipline is to compute your monthly token volume, divide by the model unit's throughput, and only provision the number of units a saturated workload justifies. Many teams over-provision "for headroom" and pay for idle units 24/7.
Worked cost example
A customer-support copilot runs steady daytime traffic on a mid-tier model:
- On-demand at current volume: ~$5,800/month, but with latency spikes at peak
- Two model units on a 6-month commitment: ~$4,400/month flat, guaranteed throughput
- Net: Provisioned saves ~$1,400/month and fixes the latency — because the units stay busy
Contrast a batch enrichment job that runs four hours a night: provisioning the same two units would cost $4,400/month for ~17% utilization, versus ~$900/month on-demand. Same service, opposite answer — utilization decides.
Optimization levers
- Measure real token throughput before provisioning — size units to saturated demand, not peak headroom.
- Keep spiky and batch workloads on-demand; reserve Provisioned for steady, latency-sensitive production.
- Use shorter commitment terms until utilization is proven, then step to 6-month for the steady core.
- Right-size the model — a smaller model on fewer units often beats a large model under-utilized. See Bedrock AI pricing strategy.
- Route by workload — production to Provisioned, experimentation and batch to on-demand or batch pricing.
Provisioned Throughput in your EDP
Provisioned Throughput commitments are exactly the kind of forecastable, contracted spend that belongs in an Enterprise Discount Program. The negotiation pattern:
- Forecast model-unit demand by model and by quarter across the commit term.
- Negotiate the model-unit rate as a distinct line — AWS has meaningful flex on Bedrock above ~$1M annual commit.
- Bundle with on-demand Bedrock, SageMaker, and the broader AI/ML category.
- Anchor against direct model-provider APIs (Anthropic, others) and Azure OpenAI as the competitive BATNA.
Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and provisioned model capacity are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better outcomes than a direct rep conversation, and Bedrock Provisioned Throughput is one of the SKUs with the most movement available right now.
Common mistakes
- Provisioning units "for headroom" and paying for idle capacity
- Putting spiky or batch workloads on Provisioned Throughput
- Committing to 6-month terms before utilization is proven
- Over-sizing the model instead of right-sizing units
- Paying list model-unit rates when EDP-tier Bedrock discounts are available
The bottom line
Bedrock Provisioned Throughput is a utilization bet: it wins decisively on steady, saturated, latency-sensitive production and loses badly on spiky or idle workloads. Measure throughput, route workloads by pattern, and commit only to proven demand — that discipline plus EDP-tier rate negotiation typically cuts Bedrock production costs 30–50%. Read it with our on-demand vs batch pricing and Bedrock pricing strategy guides.
For a Bedrock cost audit before your next EDP renewal, contact us. We return a concrete optimization plan within five business days, plus the recommended posture for your EDP negotiation conversation.