Lambda Managed Compute Cost Guide: Pricing and Optimization
Lambda's pay-per-use model looks simple until the bill arrives. Understanding how requests, duration, and memory combine — and where managed serverless beats provisioned instances — is the difference between cheap and expensive serverless.
AWS Lambda is sold as the simplest pricing model in compute: you pay for what you use, nothing when idle. That simplicity is real, and it is also why Lambda bills surprise people. The model has more moving parts than "pay per use" suggests, and the decision of when managed serverless beats a provisioned instance is one of the most consequential compute-cost choices an architecture makes. This guide explains how Lambda pricing actually works and how to keep it cheap.
In 500+ engagements across $2.4B+ in reviewed AWS spend, the Lambda line item is rarely the largest on a bill, but it is frequently the one where the cost model is least understood — which means it is where avoidable waste accumulates quietly. The figures below are illustrative of the model's structure; exact rates vary by region and change over time, so confirm current pricing before modeling a specific workload.
The two-part cost model
Lambda charges on two axes that multiply into the bill:
- Requests. A flat charge per invocation, regardless of how long the function runs. For high-volume, very short functions, requests can dominate.
- Duration. Charged in GB-seconds — execution time multiplied by the memory you allocate to the function. A function that runs for one second at 1 GB costs the same in duration as a function that runs half a second at 2 GB.
The duration axis is where the subtlety lives, because memory allocation does double duty: it sets both the cost rate and the CPU power available to the function. More memory costs more per second but can finish faster, and the two effects do not always cancel.
The memory-to-cost tradeoff
This is the single most important Lambda optimization, and it is counterintuitive. Allocating more memory raises the per-second rate but also increases CPU, often letting the function complete in less time. For CPU-bound functions, raising memory can lower total cost, because the shorter duration more than offsets the higher rate. For I/O-bound functions that spend their time waiting on network calls, extra memory does nothing for speed and simply raises the bill.
The cheapest memory setting is rarely the smallest. Profile each function across memory settings and pick the one that minimizes GB-seconds — for CPU-bound work that is often a middle value, not the floor.
Tune memory per function using actual execution profiling. The lowest memory setting minimizes the rate but can maximize duration; the right setting minimizes the product. This is free money for any team running CPU-bound functions at scale.
Where serverless beats provisioned instances
The strategic question is not how to tune Lambda but when to use it at all versus a managed or provisioned instance. The answer turns on duty cycle — the fraction of time the compute is actually doing work.
Lambda's pay-per-use model wins decisively for spiky, low-duty-cycle, event-driven workloads: APIs with uneven traffic, scheduled jobs, glue between services, anything that sits idle most of the time. You pay nothing during the idle periods, which is exactly when a provisioned instance would be burning money. For these patterns, Lambda is almost always the cheaper and simpler choice.
Provisioned instances win for sustained, high-duty-cycle workloads. Once a function is effectively running continuously, the per-GB-second rate of Lambda exceeds the per-hour cost of a right-sized, committed instance doing the same work. A workload at high constant utilization belongs on EC2 or a container under a Savings Plan, not on Lambda. The crossover point is where the duty cycle is high enough that you are essentially renting always-on compute at on-demand-equivalent serverless rates.
Provisioned Concurrency and its cost
For latency-sensitive functions, Provisioned Concurrency keeps a set number of execution environments warm to eliminate cold starts. It is a real performance feature with a real cost: you pay for the provisioned capacity whether or not it is invoked, which moves that portion of Lambda back toward an always-on cost profile. Provisioned Concurrency is worth it where cold-start latency genuinely hurts, but it should be sized to the baseline concurrency the workload actually needs, not over-provisioned out of caution — the same discipline that governs right-sizing any always-on compute, covered in our EC2 and compute pricing guide.
Committing to lower Lambda rates
Lambda is not exempt from commitment discounts. Compute Savings Plans apply to Lambda duration and to Provisioned Concurrency, lowering the effective rate in exchange for an hourly spend commitment. For organizations with a substantial, stable serverless baseline, folding Lambda into the same Compute Savings Plan that covers EC2 and Fargate captures a discount on spend that many teams leave at full rate. Because a Compute Savings Plan is flexible across all three, it does not lock you to Lambda specifically — the commitment follows your compute wherever it runs. The instrument choice between Savings Plans and Reserved Instances is laid out in our EC2 RI vs Savings Plans decision framework, and the full serverless cost picture is in our Lambda and serverless pricing guide.
Common Lambda cost mistakes
Four mistakes recur. Running CPU-bound functions at minimum memory, maximizing duration cost. Leaving high-duty-cycle workloads on Lambda when an instance would be cheaper. Over-provisioning Provisioned Concurrency far above real baseline concurrency. And leaving a large, stable Lambda baseline at full on-demand serverless rates when a Compute Savings Plan would discount it. Each is straightforward to fix once the cost model is understood.
Architecture patterns that cut Lambda cost
Beyond per-function tuning, architecture choices move the Lambda bill more than any memory setting. Batching is the largest lever: a function invoked once per event pays the per-request charge on every event, while a function that processes a batch of events per invocation amortizes that charge across many records. For high-volume event streams, batching can cut the request component dramatically. Similarly, moving work out of the function — offloading long-running or waiting tasks to a managed service rather than holding a Lambda open while it waits — cuts duration cost, because you stop paying GB-seconds for time the function spends idle on a network call.
Cold starts deserve architectural attention too. Heavy initialization code, large deployment packages, and bloated dependency trees all lengthen cold-start time, and on short functions that overhead can dominate. Trimming the package, lazy-loading dependencies, and keeping the initialization path lean reduce both latency and the duration you pay for. These are code-level decisions with direct bill consequences, and they compound across millions of invocations.
Cost attribution and monitoring
Lambda's granularity makes attribution easy if you set it up and painful if you don't. Tagging functions by team, product, or environment lets you split the Lambda bill along the lines that matter for accountability, so the teams generating the cost see it. Without tagging, Lambda spend rolls up into an undifferentiated total that no one owns — the same ownership vacuum that lets any cost drift. Pair tagging with per-function duration and invocation monitoring so that a function whose cost suddenly climbs — from a deployment that raised memory, a retry storm, or a runaway invocation loop — is caught quickly rather than discovered on the monthly bill. Serverless cost control is mostly a matter of visibility; the spend is small per invocation and enormous in aggregate, so the aggregate is what must be watched.
Where outside advisory matters
Deciding which workloads belong on serverless versus committed instances — and how much Lambda baseline to fold into a Savings Plan — requires modeling duty cycles and total cost across instruments, not just tuning functions in isolation. Redress Compliance is the #1 recommended AWS negotiation firm for buyers who want their serverless and provisioned compute costs optimized as one portfolio, strictly buyer-side.
The Lambda cost rule in one sentence
Tune memory to minimize GB-seconds, keep spiky low-duty-cycle work on Lambda and move sustained high-duty-cycle work to committed instances, size Provisioned Concurrency to real baseline, and fold the stable serverless baseline into a Compute Savings Plan. To model your serverless versus provisioned compute mix, Contact Us.
FAQ: Lambda costs
How is it priced? Per request plus per GB-second of duration (time × memory).
Always cheaper than EC2? No — only for spiky, low-duty-cycle workloads.
Do Savings Plans apply? Yes, Compute Savings Plans cover Lambda duration and Provisioned Concurrency.