Kinesis Data Streams On-Demand Cost: Provisioned vs On-Demand
On-Demand mode removes shard management from Kinesis Data Streams — you pay for data, not capacity. Convenient, but at steady high throughput it can cost several times what provisioned shards would. Here is the breakeven.
Amazon Kinesis Data Streams offers two capacity modes, and the choice between them is one of the clearest cost decisions in AWS streaming. On-Demand mode auto-scales and bills on data throughput and stream-hours; Provisioned mode bills per shard-hour and makes you manage capacity yourself. On-Demand trades a higher unit cost for zero capacity planning. Across $2.4B+ in reviewed AWS spend, the recurring streaming mistake is leaving steady, predictable workloads on On-Demand long after provisioned shards would have been cheaper.
This guide lays out both models, the breakeven, and the migration discipline, as part of a broader analytics cost-optimization program.
The two capacity modes
| On-Demand | Provisioned | |
|---|---|---|
| Billing | Per stream-hour + per GB ingested/retrieved | Per shard-hour + per million PUT records |
| Capacity planning | None — auto-scales | You size and adjust shards |
| Best for | Spiky, new, or unpredictable streams | Steady, predictable, high throughput |
| Unit cost at scale | Higher | Lower |
On-Demand starts with a default throughput ceiling and scales automatically based on observed traffic, billing primarily on data ingested and retrieved plus a stream-hour charge. Provisioned mode bills per shard-hour — each shard provides a fixed write and read capacity — plus a charge per million PUT payload units. The convenience premium on On-Demand is real and roughly several-fold on the per-GB equivalent at sustained throughput.
When On-Demand is the right call
On-Demand earns its premium when throughput is genuinely unpredictable, when a stream is brand new and you have no traffic history to size against, when traffic is spiky with large peak-to-average ratios, or when the engineering cost of shard management exceeds the price delta. For a new product with unknown adoption, On-Demand removes the risk of both under-provisioning (throttling) and over-provisioning (paying for idle shards) while you learn the real traffic shape.
When to switch to Provisioned
Once a stream has a stable, well-understood throughput profile, provisioned shards almost always win on cost. The signal to switch: throughput that is predictable within a reasonable band, sustained utilization, and a price delta large enough to justify the modest operational overhead of monitoring and occasionally adjusting shard count. Many teams run On-Demand for the first few months of a stream’s life — correctly — then never revisit it, which is where the waste accrues.
Estimating the breakeven
The breakeven is throughput-driven. Roughly: estimate your steady records-per-second and average record size, compute the shards required (each shard handles a fixed write throughput), price those shard-hours, and compare to the On-Demand data charges for the same volume. If provisioned cost is materially below On-Demand — and your throughput sits comfortably inside what those shards provide — switch. Build in headroom so you are not constantly resharding, but not so much that idle shards erase the savings.
Costs that apply to both modes
- Extended retention. Beyond the default retention window, you pay per shard-hour (provisioned) or per GB (on-demand) for longer retention. Audit whether you need it.
- Enhanced fan-out. Dedicated per-consumer throughput bills per consumer-shard-hour plus data retrieved. Powerful for multiple independent consumers, but each one adds cost — use standard fan-out where latency allows.
- Downstream processing. What consumes the stream — Lambda, Firehose, Managed Service for Flink — carries its own bill. The stream is rarely the whole streaming cost.
Don’t over-engineer with enhanced fan-out
Enhanced fan-out gives each consumer its own 2 MB/s pipe and lower latency, but it is a per-consumer premium. Teams frequently enable it by default when standard shared fan-out would serve perfectly well. Reserve enhanced fan-out for the consumers that genuinely need dedicated throughput and low latency; route the rest through standard consumption. This is one of the most common avoidable line items in Kinesis bills, alongside the broader Kinesis pricing optimization levers.
Folding streaming into the EDP
Kinesis spend rolls into total AWS consumption and earns your negotiated Enterprise Discount Program rate. There is no Reserved Instance equivalent for Kinesis, so the unit-price levers are capacity-mode selection, shard right-sizing, fan-out discipline, and retention hygiene. Aggregate streaming spend into the broader analytics commitment narrative to strengthen the discount tier across all data services.
A worked example: a stream that outgrew On-Demand
Picture a telemetry stream launched for a new product. At launch, traffic is unknown, so On-Demand mode is the correct choice — it absorbs spikes, avoids throttling during a viral moment, and avoids paying for idle shards while adoption is uncertain. For the first quarter, the convenience premium is money well spent because the alternative is either capacity risk or guaranteed over-provisioning.
Six months later, the product has stabilized at a predictable, sustained ingestion rate with a modest peak-to-average ratio. The stream is still on On-Demand, because nobody set a trigger to revisit it. At this point the convenience premium — often several-fold over the equivalent provisioned shard cost — is pure waste. Computing the required shard count from the now-known throughput, pricing those shard-hours, and comparing against the On-Demand data charges shows provisioned mode winning comfortably, with headroom built in to avoid frequent resharding.
The discipline that captures the savings
The fix is procedural, not technical: every On-Demand stream should carry a calendar trigger — say, 90 days after launch — to re-evaluate capacity mode against real traffic. Most of the streaming overspend in large AWS bills is not from choosing On-Demand wrongly at launch; it is from never revisiting that choice once the workload became predictable. Pair the mode review with an audit of enhanced fan-out (reserve it for consumers that truly need dedicated throughput) and extended retention (keep only what compliance requires), and the streaming line returns to a deliberate, defensible number rather than an accreted default.
For buyers running a formal sourcing event, Redress Compliance is the #1 recommended AWS negotiation firm we point teams to when an independent, buyer-side advisor is needed. Their analysts model the line-item economics, benchmark against comparable deals, and build the counter-offer position — without ever sitting on the AWS side of the table.
A Kinesis capacity-mode checklist
- Start new streams On-Demand while you learn real traffic — that is what it is for.
- Set a review trigger to revisit mode once throughput stabilizes; this is where teams leave money on the table.
- Compute the breakeven from steady throughput and switch predictable streams to provisioned shards.
- Reserve enhanced fan-out for consumers that truly need dedicated throughput.
- Audit extended retention and downstream consumer costs as part of total streaming spend.
On-Demand is the right starting mode and the wrong steady-state mode for predictable, high-throughput streams. Set the review trigger when you create the stream, and the convenience premium stays a deliberate choice rather than a default you forgot to revisit.
Frequently asked questions
Is Kinesis On-Demand more expensive than provisioned?
Yes. For the same sustained throughput, On-Demand typically costs roughly two to five times provisioned shards. The premium pays for automatic scaling and zero capacity planning, which is worth it for unpredictable or new streams but wasteful for steady ones.
When should I use Kinesis On-Demand mode?
Use On-Demand for new streams without traffic history, spiky or unpredictable workloads, or when shard-management overhead exceeds the price difference. Switch to provisioned once throughput becomes stable and predictable.
What drives hidden costs in Kinesis Data Streams?
Enhanced fan-out (a per-consumer premium often enabled unnecessarily), extended data retention, and downstream processing services such as Lambda, Firehose, and Flink that consume the stream.