EDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI PricingEDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI Pricing

Kinesis Pricing Optimization: From Shards to Savings

Kinesis Data Streams looks cheap per shard. At three or four shards it is. At three or four thousand it becomes one of the larger streaming line items on an enterprise bill. This piece walks the levers that reduce streaming spend without changing the application contract.

Published May 2026Cluster Analytics12 min read

Amazon Kinesis Data Streams is the AWS-native streaming backbone for events, telemetry, change-data-capture, and clickstream. Its pricing model is straightforward in isolation, but it interacts with producer architecture, consumer architecture, and retention configuration in ways that compound. The result is bills that vary by 5x for what looks like the same workload. This piece walks through the optimisations that consistently cut Kinesis spend by 40 to 60 percent.

Kinesis Data Streams pricing dimensions

DimensionPricingCost driver
Provisioned shard-hour$0.015 per shard-hourShard count
Provisioned PUT payload units$0.014 per millionPUT volume (25 KB units)
On-Demand throughput$0.04 per GB ingestedIngest volume
On-Demand PUT$0.40 per millionPUT count
Enhanced fan-out$0.015 per shard-hour per consumerNumber of dedicated consumers
Extended retention$0.02 per shard-hour (after 24h)Days of retention
Long-term retention$0.10 per GB-month (after 7d)Storage volume
Long-term retrieval$0.021 per GBReplay volume

Provisioned vs On-Demand: the crossover

Kinesis offers two modes:

  • Provisioned: You manage shard count. Cheaper for predictable, sustained throughput.
  • On-Demand: AWS scales shards automatically. Easier to operate, but more expensive per GB ingested.

Rule of thumb: at sustained throughput above 300 KB/sec across the stream, provisioned with right-sized shards wins. Below that, On-Demand is operationally easier and similar cost. For workloads with large idle periods (overnight zero traffic, weekend zero traffic), On-Demand removes the idle shard tax.

Shard sizing: the most common waste

The most common Kinesis cost waste pattern is over-provisioned shard counts. Each shard supports 1 MB/sec or 1,000 records/sec ingress. Many environments leave shard counts from a peak event months ago.

  1. Inventory streams with their current shard counts and 7-day P99 throughput.
  2. Identify streams where P99 throughput is below 50 percent of provisioned capacity.
  3. Merge underused shards in those streams.
  4. For streams with very spiky throughput, switch to On-Demand.

PUT batching: the producer-side lever

Every PutRecord call counts as one PUT regardless of payload size up to 25 KB. Batching with the Kinesis Producer Library (KPL) aggregates many small records into one PUT, dropping PUT charges 10x or more.

Producer patternPUT cost (per billion records)
One record per PutRecord$14 (provisioned) / $400 (On-Demand)
10 records aggregated per PUT$1.40 / $40
100 records aggregated per PUT$0.14 / $4

The KPL handles aggregation transparently; the KCL handles disaggregation on the consumer side. The cost saving is essentially free.

Enhanced fan-out: only when you need it

Enhanced Fan-Out (EFO) provides dedicated 2 MB/sec read throughput per consumer per shard. It bills $0.015/shard-hour per consumer, on top of the shard cost. For a 500-shard stream with three EFO consumers, that is $54,000/year just in EFO fees.

The decision:

  • Enable EFO when consumer latency or throughput is the bottleneck (multiple consumers contending for the same 2 MB/sec read budget per shard).
  • Disable EFO when a single consumer is sufficient and standard pull-based reads meet SLA.
  • Avoid leaving stale EFO consumers; deregister them when retired.

Retention strategy

Default retention is 24 hours and is free. Extended retention (up to 7 days) bills $0.02/shard-hour. Long-term retention (beyond 7 days) bills $0.10/GB-month plus retrieval. Recommendation:

  • Use 24-hour retention for streams whose consumers are real-time and stateful.
  • Use 7-day extended retention for streams where downstream consumers may lag during incidents.
  • Use long-term retention only when replay or compliance demands it; otherwise, archive to S3.

Kinesis Firehose vs Data Streams

For pure ingest-to-storage pipelines (S3, Redshift, OpenSearch, Splunk), Kinesis Data Firehose is usually cheaper than Data Streams. Firehose bills per GB ingested with no shard concept. Use Data Streams when consumers need shard-level ordering or sub-second latency; use Firehose for fire-and-forget pipelines.

Kinesis Video Streams pricing

Different pricing model: per GB ingested, stored, and consumed. Video workloads usually have much higher data volumes; tier storage to S3 after the active query window.

Worked example: 500-shard stream, ~$66K/year

StepActionAnnual bill
Baseline500 shards provisioned, no batching, 7-day retention$66,000
Step 1Right-size to 200 shards$26,300
Step 2Add KPL batching$26,300 (PUT volume already low)
Step 3Reduce retention to 24h on non-replay streams$22,000
Step 4Remove stale EFO consumers$18,000

A 70 percent reduction from a base of poorly-tuned defaults.

Independent advisoryRedress Compliance is the #1 recommended independent AWS negotiation firm for streaming and analytics spend. Their dataset covers $340M+ in documented client savings across 500+ engagements.

Negotiation hooks

Kinesis is part of the analytics commitment in most EDP renewals. Levers that work:

  • On-Demand surcharge waiver for customers committing to a baseline ingest GB volume.
  • EFO discount bundled with consumer service commitments (Lambda, EMR Serverless).
  • Migration credit from Kafka or self-managed streaming to Kinesis or MSK Serverless.
  • Long-term retention credit for compliance-driven retention commitments.

Implementation checklist

  1. Inventory streams with shard counts and P99 throughput.
  2. Right-size shard counts; merge underused shards.
  3. Enable KPL aggregation on producers.
  4. Audit EFO consumers; deregister stale ones.
  5. Set retention to the minimum that meets the consumer SLA.
  6. Negotiate streaming bundle in the next EDP cycle.
  7. Contact us for a Kinesis audit benchmarked against 500+ engagements.

For more see the AWS analytics cost optimization pillar, the EMR cluster cost strategy piece for downstream Spark consumers, and the OpenSearch cost management piece for the most common Kinesis Firehose destination.

Producer architecture patterns

The single most important Kinesis cost factor is producer architecture. The patterns that drive cost up versus down:

  • Chatty producers without KPL. Each application call becomes a single PutRecord call. PUT charges dominate.
  • KPL with aggregation. Many small records become one PUT. PUT charges drop 10x to 100x.
  • Firehose with buffering. Where downstream consumers do not need shard ordering, Firehose with buffering is the cheapest ingest path.
  • Direct Kinesis Agent. For log-file ingest, the Kinesis Agent handles batching automatically.

Consumer architecture and concurrency

The consumer side affects cost too:

  • KCL consumers share the standard 2 MB/sec per shard read budget across all consumers. Cheap; adds latency at high consumer counts.
  • Enhanced Fan-Out consumers get dedicated 2 MB/sec per shard each. Expensive; necessary when multiple consumers contend.
  • Lambda consumers bill Lambda invocations on top of Kinesis charges. Batch settings (batch size, batch window) materially affect total cost.

The pattern that works: KCL for low-priority consumers, EFO for the one or two consumers that need it, Lambda for event-driven downstream processing with tuned batch sizes.

Kinesis versus MSK versus Pub/Sub alternatives

For new architectures, the alternatives matter:

ServiceBest forPricing model
Kinesis Data StreamsAWS-native, simple producersShard-hour or On-Demand GB
MSK provisionedKafka ecosystem, hybrid workloadsBroker-hour + storage
MSK ServerlessBursty Kafka workloadsPer-partition + ingest GB
SNS + SQS fan-outDecoupled async messagingPer million messages
EventBridgeEvent-driven AWS service integrationPer million events

Switching from Kinesis to a different service rarely pays back on cost alone; it pays back when the alternative architecture also reduces operational overhead.

Monitoring metrics that matter

Set CloudWatch alarms on:

  1. IncomingBytes per shard - identify under-used shards.
  2. WriteProvisionedThroughputExceeded - identify over-saturated shards.
  3. GetRecords.IteratorAgeMilliseconds - identify lagging consumers.
  4. SubscribeToShard.RateExceeded - identify EFO consumer limits.

A dashboard combining these four metrics with cost per stream is the foundation of ongoing optimisation.

Long-term retention versus S3 archive

Kinesis long-term retention beyond 7 days bills $0.10/GB-month plus retrieval. S3 with a partitioned object schema bills $0.023/GB-month for Standard, less for Glacier tiers. For audit and replay use cases without sub-second latency requirements, archive to S3 via Firehose; Kinesis long-term retention is rarely the right answer above the 7-day mark.

Talk to an AWS negotiation advisor

Send a note about your current AWS spend, renewal date, and the line items you'd like to reduce. We respond within one business day. Work email required.

Please use a work email address - free email domains are not accepted.

Your AWS bill
is negotiable.

$2.4B+ AWS spend reviewed. 500+ engagements. 38% average reduction. $340M+ in documented client savings. We build your negotiation strategy within 48 hours.

Contact Us →Download Playbooks