Kinesis Pricing Optimization: From Shards to Savings
Kinesis Data Streams looks cheap per shard. At three or four shards it is. At three or four thousand it becomes one of the larger streaming line items on an enterprise bill. This piece walks the levers that reduce streaming spend without changing the application contract.
Amazon Kinesis Data Streams is the AWS-native streaming backbone for events, telemetry, change-data-capture, and clickstream. Its pricing model is straightforward in isolation, but it interacts with producer architecture, consumer architecture, and retention configuration in ways that compound. The result is bills that vary by 5x for what looks like the same workload. This piece walks through the optimisations that consistently cut Kinesis spend by 40 to 60 percent.
Kinesis Data Streams pricing dimensions
| Dimension | Pricing | Cost driver |
|---|---|---|
| Provisioned shard-hour | $0.015 per shard-hour | Shard count |
| Provisioned PUT payload units | $0.014 per million | PUT volume (25 KB units) |
| On-Demand throughput | $0.04 per GB ingested | Ingest volume |
| On-Demand PUT | $0.40 per million | PUT count |
| Enhanced fan-out | $0.015 per shard-hour per consumer | Number of dedicated consumers |
| Extended retention | $0.02 per shard-hour (after 24h) | Days of retention |
| Long-term retention | $0.10 per GB-month (after 7d) | Storage volume |
| Long-term retrieval | $0.021 per GB | Replay volume |
Provisioned vs On-Demand: the crossover
Kinesis offers two modes:
- Provisioned: You manage shard count. Cheaper for predictable, sustained throughput.
- On-Demand: AWS scales shards automatically. Easier to operate, but more expensive per GB ingested.
Rule of thumb: at sustained throughput above 300 KB/sec across the stream, provisioned with right-sized shards wins. Below that, On-Demand is operationally easier and similar cost. For workloads with large idle periods (overnight zero traffic, weekend zero traffic), On-Demand removes the idle shard tax.
Shard sizing: the most common waste
The most common Kinesis cost waste pattern is over-provisioned shard counts. Each shard supports 1 MB/sec or 1,000 records/sec ingress. Many environments leave shard counts from a peak event months ago.
- Inventory streams with their current shard counts and 7-day P99 throughput.
- Identify streams where P99 throughput is below 50 percent of provisioned capacity.
- Merge underused shards in those streams.
- For streams with very spiky throughput, switch to On-Demand.
PUT batching: the producer-side lever
Every PutRecord call counts as one PUT regardless of payload size up to 25 KB. Batching with the Kinesis Producer Library (KPL) aggregates many small records into one PUT, dropping PUT charges 10x or more.
| Producer pattern | PUT cost (per billion records) |
|---|---|
| One record per PutRecord | $14 (provisioned) / $400 (On-Demand) |
| 10 records aggregated per PUT | $1.40 / $40 |
| 100 records aggregated per PUT | $0.14 / $4 |
The KPL handles aggregation transparently; the KCL handles disaggregation on the consumer side. The cost saving is essentially free.
Enhanced fan-out: only when you need it
Enhanced Fan-Out (EFO) provides dedicated 2 MB/sec read throughput per consumer per shard. It bills $0.015/shard-hour per consumer, on top of the shard cost. For a 500-shard stream with three EFO consumers, that is $54,000/year just in EFO fees.
The decision:
- Enable EFO when consumer latency or throughput is the bottleneck (multiple consumers contending for the same 2 MB/sec read budget per shard).
- Disable EFO when a single consumer is sufficient and standard pull-based reads meet SLA.
- Avoid leaving stale EFO consumers; deregister them when retired.
Retention strategy
Default retention is 24 hours and is free. Extended retention (up to 7 days) bills $0.02/shard-hour. Long-term retention (beyond 7 days) bills $0.10/GB-month plus retrieval. Recommendation:
- Use 24-hour retention for streams whose consumers are real-time and stateful.
- Use 7-day extended retention for streams where downstream consumers may lag during incidents.
- Use long-term retention only when replay or compliance demands it; otherwise, archive to S3.
Kinesis Firehose vs Data Streams
For pure ingest-to-storage pipelines (S3, Redshift, OpenSearch, Splunk), Kinesis Data Firehose is usually cheaper than Data Streams. Firehose bills per GB ingested with no shard concept. Use Data Streams when consumers need shard-level ordering or sub-second latency; use Firehose for fire-and-forget pipelines.
Kinesis Video Streams pricing
Different pricing model: per GB ingested, stored, and consumed. Video workloads usually have much higher data volumes; tier storage to S3 after the active query window.
Worked example: 500-shard stream, ~$66K/year
| Step | Action | Annual bill |
|---|---|---|
| Baseline | 500 shards provisioned, no batching, 7-day retention | $66,000 |
| Step 1 | Right-size to 200 shards | $26,300 |
| Step 2 | Add KPL batching | $26,300 (PUT volume already low) |
| Step 3 | Reduce retention to 24h on non-replay streams | $22,000 |
| Step 4 | Remove stale EFO consumers | $18,000 |
A 70 percent reduction from a base of poorly-tuned defaults.
Negotiation hooks
Kinesis is part of the analytics commitment in most EDP renewals. Levers that work:
- On-Demand surcharge waiver for customers committing to a baseline ingest GB volume.
- EFO discount bundled with consumer service commitments (Lambda, EMR Serverless).
- Migration credit from Kafka or self-managed streaming to Kinesis or MSK Serverless.
- Long-term retention credit for compliance-driven retention commitments.
Implementation checklist
- Inventory streams with shard counts and P99 throughput.
- Right-size shard counts; merge underused shards.
- Enable KPL aggregation on producers.
- Audit EFO consumers; deregister stale ones.
- Set retention to the minimum that meets the consumer SLA.
- Negotiate streaming bundle in the next EDP cycle.
- Contact us for a Kinesis audit benchmarked against 500+ engagements.
For more see the AWS analytics cost optimization pillar, the EMR cluster cost strategy piece for downstream Spark consumers, and the OpenSearch cost management piece for the most common Kinesis Firehose destination.
Producer architecture patterns
The single most important Kinesis cost factor is producer architecture. The patterns that drive cost up versus down:
- Chatty producers without KPL. Each application call becomes a single PutRecord call. PUT charges dominate.
- KPL with aggregation. Many small records become one PUT. PUT charges drop 10x to 100x.
- Firehose with buffering. Where downstream consumers do not need shard ordering, Firehose with buffering is the cheapest ingest path.
- Direct Kinesis Agent. For log-file ingest, the Kinesis Agent handles batching automatically.
Consumer architecture and concurrency
The consumer side affects cost too:
- KCL consumers share the standard 2 MB/sec per shard read budget across all consumers. Cheap; adds latency at high consumer counts.
- Enhanced Fan-Out consumers get dedicated 2 MB/sec per shard each. Expensive; necessary when multiple consumers contend.
- Lambda consumers bill Lambda invocations on top of Kinesis charges. Batch settings (batch size, batch window) materially affect total cost.
The pattern that works: KCL for low-priority consumers, EFO for the one or two consumers that need it, Lambda for event-driven downstream processing with tuned batch sizes.
Kinesis versus MSK versus Pub/Sub alternatives
For new architectures, the alternatives matter:
| Service | Best for | Pricing model |
|---|---|---|
| Kinesis Data Streams | AWS-native, simple producers | Shard-hour or On-Demand GB |
| MSK provisioned | Kafka ecosystem, hybrid workloads | Broker-hour + storage |
| MSK Serverless | Bursty Kafka workloads | Per-partition + ingest GB |
| SNS + SQS fan-out | Decoupled async messaging | Per million messages |
| EventBridge | Event-driven AWS service integration | Per million events |
Switching from Kinesis to a different service rarely pays back on cost alone; it pays back when the alternative architecture also reduces operational overhead.
Monitoring metrics that matter
Set CloudWatch alarms on:
IncomingBytesper shard - identify under-used shards.WriteProvisionedThroughputExceeded- identify over-saturated shards.GetRecords.IteratorAgeMilliseconds- identify lagging consumers.SubscribeToShard.RateExceeded- identify EFO consumer limits.
A dashboard combining these four metrics with cost per stream is the foundation of ongoing optimisation.
Long-term retention versus S3 archive
Kinesis long-term retention beyond 7 days bills $0.10/GB-month plus retrieval. S3 with a partitioned object schema bills $0.023/GB-month for Standard, less for Glacier tiers. For audit and replay use cases without sub-second latency requirements, archive to S3 via Firehose; Kinesis long-term retention is rarely the right answer above the 7-day mark.