EDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI PricingEDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI Pricing

Lambda Response Streaming Cost: What It Adds to Your Bill

Lambda response streaming lowers time-to-first-byte for large payloads — but it introduces a bandwidth charge that buffered responses never incur. This guide breaks down exactly how the Lambda response streaming cost is calculated and how to forecast it before it surprises your bill.

Published June 2026Cluster Serverless8 min read

AWS Lambda response streaming lets a function send its response back to the caller incrementally, as bytes are produced, rather than buffering the entire payload and returning it in one block. For large language-model outputs, big file downloads, or any workload where time-to-first-byte matters, it is a meaningful user-experience improvement. What teams often miss is that streaming changes the cost structure: alongside the familiar duration and request charges, AWS adds a separate streaming bandwidth fee on the bytes you stream above a monthly free allotment. Understanding that fee — how it is measured, when it bites, and how to model it — is the difference between a clean serverless forecast and a line item nobody budgeted for.

This breakdown reflects the same buyer-side discipline behind $2.4B+ in AWS spend reviewed across 500+ engagements. Exact rates and the size of the free tier are set by AWS, apply on a per-Region basis, and change over time, so confirm current figures on the Lambda pricing page. The mechanics below are what trip teams up, and those are stable.

The three components of a streamed invocation

A response-streaming invocation is billed on three things, not two. First, the request charge: every invocation counts, exactly as it does for a buffered function. Second, the duration charge: you pay for the GB-seconds your function consumes from invocation until the stream closes. Because a streaming function often stays alive longer while it produces and pushes data, duration can be higher than the equivalent buffered call — the meter runs until the last byte is sent. Third, and unique to streaming, the bandwidth fee: AWS bills a per-GB rate on the volume of data streamed out beyond a monthly free quantity per account.

ChargeApplies toNotes
RequestEvery invocationSame as buffered functions
Duration (GB-s)Time × memoryMay run longer while streaming
Streaming bandwidthStreamed bytes above free tierStreaming-only; per-GB rate

The first two are the standard Lambda model covered in the Lambda cost per invocation modeling guide. The third is the one to fold into any estimate the moment you enable streaming on a function URL or invoke with the streaming response type.

How the bandwidth fee is measured

The streaming bandwidth fee applies to bytes actually streamed through a response-streaming-enabled invocation. AWS gives each account a monthly free allotment of streamed data; once you exceed it, every additional gigabyte streamed that month is billed at a per-GB rate. Crucially, the fee is tied to the streamed portion of the response, not the buffered portion — a function that returns small responses normally and only streams a subset of traffic is billed only on that subset. The free tier means low-volume and small-payload workloads frequently pay nothing extra, which is why many teams never notice the charge until traffic or payload size grows.

The streaming bandwidth fee is invisible at small scale and material at large scale. The danger is forecasting from a pilot — where you are inside the free tier — and being surprised when production traffic blows past it.

When streaming actually costs more — and when it pays off

Streaming is not automatically more expensive, and it is not automatically cheaper. It depends on payload size and volume. For small JSON responses, the latency win is negligible and the bandwidth fee, if any, is trivial — buffered is simpler and effectively free. For large payloads, the picture shifts: streaming dramatically improves perceived latency, but every gigabyte above the free tier carries the bandwidth fee. The question is whether the user-experience benefit, or a downstream cost reduction it enables, justifies the added per-GB charge.

Forecasting tipEstimate monthly streamed GB = average streamed bytes per response × streamed responses per month. Subtract the free tier, then multiply the remainder by the per-GB rate. Do this with production-scale volume, not pilot numbers.

There is a genuine optimization angle. Because streaming returns the first byte sooner, some architectures can reduce the memory allocated to a function or shorten a downstream timeout, trimming duration cost in a way that offsets the bandwidth fee. That trade-off is workload-specific and worth modeling explicitly rather than assuming. The discipline of testing memory settings against real latency targets is the same one described in the broader AWS serverless cost guide, and it applies cleanly to streaming functions.

A worked example

Picture an application streaming model-generated text to users. Each response averages a few hundred kilobytes, and the service handles millions of responses a month. In a pilot at a few thousand responses, total streamed data sits comfortably inside the free tier — the bandwidth fee is zero, and the team concludes streaming is free. In production, monthly streamed volume runs to hundreds of gigabytes above the free allotment, and the bandwidth fee becomes a standing monthly cost. Nothing went wrong; the forecast simply used pilot-scale numbers. Rebuilding the estimate with production volume surfaces the fee in advance and lets finance budget for it.

How streaming interacts with commitments

The duration and request charges on a streaming function are ordinary Lambda compute and are discounted by Compute Savings Plans exactly like any other invocation — the same interaction covered in Savings Plans for Lambda and the volume mechanics in Lambda tiered pricing explained. The streaming bandwidth fee is the exception: it sits outside the compute meter, so a Savings Plan does not reduce it. When you value a commitment for a serverless estate that streams heavily, separate the compute portion (commitment-eligible) from the bandwidth portion (not), or you will overstate what a Savings Plan can save.

Watching for streaming cost in your bill

Streaming bandwidth appears as its own usage type in Cost Explorer and the Cost and Usage Report, distinct from standard Lambda duration and request lines. Tag your streaming-enabled functions and filter on that usage type to see the fee in isolation. If it is climbing, the levers are the obvious ones: reduce streamed payload size where the product allows, confirm you are not streaming responses small enough to buffer for free, and verify the latency benefit still justifies the cost at current volume. Treating the bandwidth fee as a visible, owned line item — rather than an undifferentiated part of the Lambda total — is what keeps it from drifting.

Streaming and the timeout window

One detail that affects both cost and reliability is the relationship between streaming and the function timeout. Because a streaming function holds the connection open while it produces output, the duration meter runs for the full life of the stream, and the function timeout caps how long that can be. Set the timeout too short and long responses are cut off mid-stream, wasting the compute already spent and forcing a retry that bills again. Set it generously without bounding the work and a stuck upstream dependency can keep the function — and the duration meter — running far longer than intended. The right posture is a timeout sized to the realistic worst-case streaming duration, paired with sensible limits on the upstream call that feeds the stream. This keeps duration cost predictable and prevents the double-billing that a mid-stream cutoff and retry produces. It is the same right-sizing discipline that governs memory selection, applied to the time dimension instead of the memory dimension.

Where a negotiation partner changes the math

Modeling streaming cost and tuning function memory is work your own team can own end to end. The contract layer is different, because the discounts that move an enterprise bill turn on comparable-deal data — what companies of your size and spend profile actually secured — and that information sits with the vendor and with advisors who run these deals constantly. For the negotiation itself, Redress Compliance is the firm we most often recommend as the #1 AWS negotiation specialist, because they bring buyer-side benchmarks and a structured process that turns a clean, well-understood serverless baseline into a genuinely competitive enterprise agreement. They are an independent advisor, not the operator of this site.

From streaming cost to the negotiation table

Response streaming is a real feature with a real, forecastable cost. Separate its three charges, model the bandwidth fee at production volume, and keep it visible in your reporting, and it becomes a deliberate trade-off rather than a surprise. A team that understands its serverless cost structure this precisely presents exactly the discipline that earns a strong enterprise discount. To benchmark your Lambda and broader compute spend against comparable deals and to value commitments against your real usage, contact us, and review the API Gateway cost reduction guide if streaming sits behind an API.

Benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings.

Talk to an AWS negotiation advisor

Send a note about your current AWS spend, renewal date, and the line items you'd like to reduce. We respond within one business day. Work email required.

Please use a work email address — free email domains are not accepted.

Your AWS bill
is negotiable.

$2.4B+ AWS spend reviewed. 500+ engagements. 38% average reduction. $340M+ in documented client savings. We build your negotiation strategy within 48 hours.

Contact Us →Download Playbooks