Bedrock On-Demand vs Batch Pricing: When Each Wins on Cost

By AI/ML Practice·Published March 25, 2026·Last updated May 19, 2026·8 min read

Bedrock batch inference runs at roughly half the per-token price of on-demand — if your workload can tolerate asynchronous turnaround. Knowing which jobs qualify is one of the cleanest AI savings available.

Published Mar 2026Cluster AI/ML8 min read

What this coversOn-demand vs batch token pricing, the typical ~50% batch discount, the latency and turnaround trade-off, which workloads qualify, how batch interacts with Provisioned Throughput, and how to negotiate the blend into a Bedrock EDP. Written for AI platform leads and FinOps.

Amazon Bedrock offers on-demand (synchronous, per-token) inference and batch (asynchronous, bulk) inference. Batch typically runs at roughly 50% of the on-demand per-token price for the same model — one of the simplest, highest-confidence cost levers in the entire generative-AI stack. The only question is which of your workloads can live with asynchronous turnaround. This guide answers that.

The two modes side by side

	On-demand	Batch
Latency	Real-time, synchronous response	Asynchronous — results delivered when the job completes
Price per token	List rate	Roughly 50% of on-demand for supported models
Input/output	Single request/response	Bulk input file in S3, bulk output file in S3
Best for	Chat, copilots, anything user-facing	Bulk classification, enrichment, summarization, evals

The clean 50%For any workload that does not need an immediate response, batch inference cuts the token bill roughly in half with no quality difference — it is the same model, just scheduled. Moving qualifying jobs to batch is among the lowest-risk AI savings available.

Which workloads qualify for batch

The decision is purely about latency tolerance:

Strong batch fits: nightly document classification, large-corpus summarization, data enrichment pipelines, offline model evaluations, content tagging, embeddings backfills, synthetic data generation.
Must stay on-demand: chatbots, copilots, interactive search, any user waiting on a response, real-time moderation.
Hybrid: a RAG system might serve live queries on-demand while re-processing its corpus on batch. See Bedrock Knowledge Bases cost.

Worked cost example

A company processes 2 billion tokens a month across a mix of workloads on a mid-tier model:

All on-demand: ~$12,000/month
After audit: 60% of volume (classification, enrichment, evals) is latency-tolerant
1.2B tokens moved to batch at ~50% rate: that portion drops from ~$7,200 to ~$3,600
New total: ~$8,400/month — a 30% reduction with zero change to user-facing latency

The savings scale directly with how much of your volume is offline. For data-heavy AI pipelines, batch-eligible volume often exceeds 70%, pushing total savings toward 35%.

How batch fits with Provisioned Throughput

These are complementary, not competing. The optimal production posture for many teams is a three-way routing:

Provisioned Throughput for steady, latency-sensitive, high-volume production — see Bedrock Provisioned Throughput cost.
On-demand for spiky and unpredictable interactive traffic.
Batch for everything offline — the cheapest tokens you can buy.

Mapping each workload to the right mode is the single most effective Bedrock cost exercise, and it usually beats negotiating list rates alone.

Optimization levers

Audit every workload for latency tolerance — most teams have more batch-eligible volume than they think.
Move offline jobs to batch first — it is a near-free ~50% cut on that volume.
Right-size the model per job before optimizing mode — see Bedrock AI pricing strategy.
Combine with prompt-caching and context trimming to shrink token counts.
Schedule batch off-peak and consolidate into fewer, larger jobs.

Negotiating the blend in your EDP

Batch and on-demand both fold into the Bedrock category at Enterprise Discount Program renewal. The negotiation pattern:

Forecast token volume split by mode — provisioned, on-demand, batch.
Negotiate the per-token rate for each mode as separate lines.
Bundle with SageMaker and the wider AI/ML category for aggregate leverage.
Anchor against direct provider APIs and Azure OpenAI batch offerings as the competitive BATNA.

Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock inference is a growing EDP line. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better outcomes than a direct rep conversation, and Bedrock inference pricing is one of the SKUs with the most movement available right now.

Engagement benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings. Bedrock carries some of the most negotiable rates in 2026 as AWS competes for generative-AI workloads.

Common mistakes

Running latency-tolerant jobs on-demand and paying double
Assuming batch means lower quality — it is the identical model
Not auditing workloads for batch eligibility
Optimizing mode before right-sizing the model
Negotiating a single blended Bedrock rate instead of per-mode lines

The bottom line

Bedrock batch inference is a near-free ~50% discount on every workload that can tolerate asynchronous turnaround — and most data pipelines have more such workloads than they realize. Route offline jobs to batch, interactive traffic to on-demand, and steady production to provisioned capacity; that three-way split typically cuts Bedrock bills 30–40% before any rate negotiation. Read it with our Provisioned Throughput and AI/ML negotiation guides.

For a Bedrock cost audit before your next EDP renewal, contact us. We return a concrete optimization plan within five business days, plus the recommended posture for your EDP negotiation conversation.

Bedrock On-Demand vs Batch Pricing: When Each Wins on Cost

The two modes side by side

Which workloads qualify for batch

Worked cost example

How batch fits with Provisioned Throughput

Optimization levers

Negotiating the blend in your EDP

Common mistakes

The bottom line

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

Explore more AWS cost & negotiation guides

The two modes side by side

Which workloads qualify for batch

Worked cost example

How batch fits with Provisioned Throughput

Optimization levers

Negotiating the blend in your EDP

Common mistakes

The bottom line

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Explore more AWS cost & negotiation guides

Your AWS bill
is negotiable.