AWS Lambda Pricing Optimization: Every Lever, Ranked by Yield

By AWSNegotiations Practice·Published November 28, 2024·Last updated June 28, 2025·11 min read

Lambda's per-invocation pricing rewards engineering attention better than almost any other AWS service. A buyer-side walkthrough of memory tuning, arm64 migration, provisioned concurrency math, Compute Savings Plans coverage, and the architectural patterns that move the bill the most.

Published May 2026Cluster Serverless12 min read

AWS Lambda is the AWS service most sensitive to configuration. The same workload running on the same code can bill anywhere from $200 to $2,000 per month depending on memory configuration, architecture choice, concurrency model, and Compute Savings Plans coverage. Unlike EC2 or RDS, where right-sizing yields are usually 10–30%, Lambda right-sizing routinely yields 40–70% cost reduction on production workloads without any code change. Across $2.4B+ in AWS spend reviewed through buyer-side engagements, Lambda optimization is consistently the single highest-yield optimization per hour of engineering effort.

This guide ranks the levers in order of typical yield and explains the math behind each one, so the optimization program runs in the right sequence.

How Lambda is billed, in one paragraph

Lambda charges per invocation ($0.20 per million after the free tier) and per GB-second of compute (memory in GB × execution time in seconds × rate). The rate is $0.0000166667 per GB-second on x86 and $0.0000133334 per GB-second on arm64 (Graviton2). Billing increments are 1 millisecond. Provisioned concurrency, ephemeral storage above 512 MB, and Lambda@Edge each add separate dimensions discussed below.

Lever 1: arm64 migration (typical yield 18–25%)

The Graviton2 (arm64) Lambda runtime is roughly 20% cheaper per GB-second than x86 at the same memory configuration. For most workloads, arm64 also runs 5–15% faster than x86 because of Graviton2's higher single-thread performance, which compounds the saving (lower duration billed at lower rate). Real-world end-to-end savings on a Python, Node.js, Java, or Go workload typically land at 18–25%.

Migration is usually a single-line change in the function's configuration. The exceptions: functions that depend on x86-only binary dependencies (some old Python wheel packages, some Lambda layers built for x86 only) need their dependencies rebuilt for arm64. For a function deployed via container image, the base image needs to be the arm64 variant.

This is the first optimization to run because the yield is large, the engineering cost is near zero for compatible code, and the saving compounds with every subsequent optimization.

Lever 2: memory right-sizing (typical yield 25–50%)

Lambda memory configuration determines both the memory allocated and the proportional CPU. A function configured at 1024 MB receives roughly twice the CPU of the same function at 512 MB. For CPU-bound work, doubling memory often halves duration, leaving the GB-second product roughly constant — but with the lower latency. For I/O-bound work, doubling memory has near-zero effect on duration, doubling the cost.

The right-sizing question is therefore not "what memory does this function need?" but "what memory configuration produces the lowest GB-second product?" The answer depends entirely on the function's CPU vs I/O mix and can only be measured, not guessed.

AWS Lambda Power Tuning — an open-source Step Functions workflow — runs the same function at multiple memory configurations and reports the cost-optimal setting. For most production functions, the cost-optimal memory configuration is between 768 MB and 1769 MB — not the 128 MB default and not the 10240 MB ceiling. Functions configured at 128 MB (the AWS default) are almost always overpaying by 30–100% for duration; functions configured at 3008 MB or above are almost always overpaying for unused CPU.

For a function billing $4,000 per month, a single Power Tuning run takes 20 minutes and routinely identifies a configuration that drops the bill to $2,200 — a 45% saving for half an hour of engineering attention.

Lever 3: code-level optimization (variable yield, often 20–40%)

The four highest-yield code-level changes for Lambda cost:

Cold-start reduction. Move heavyweight initialization (SDK client construction, configuration loading, JIT compilation) to the function module's global scope so it runs once per container, not once per invocation. Cuts effective per-invocation duration on warm invocations.
Connection reuse. Database connection pools, HTTP clients, and AWS SDK clients should be initialized once and reused. Functions that create a fresh DB connection per invocation routinely run 2–5x longer than needed.
Eliminating wait time. Lambda bills wall-clock time including time spent waiting on external APIs. Where possible, batch external calls, use SDK pagination concurrency, or move synchronous waits to asynchronous patterns with EventBridge or Step Functions.
Right-sized payloads. Lambda pricing does not depend on payload size, but downstream service charges (API Gateway data transfer, DynamoDB consumed capacity) do. Trimming returned payloads to what the caller actually uses compounds savings beyond Lambda itself.

Lever 4: Compute Savings Plans coverage (typical yield 17%)

Lambda on-demand consumption is eligible for Compute Savings Plans. A 1-year Compute SP commitment yields roughly 12% discount on Lambda duration; a 3-year commitment yields up to 17%. The discount applies only to duration, not to invocation fees or provisioned concurrency.

For a buyer with $20,000/month in steady Lambda baseline spend, 80% covered by a 3-year Compute SP at 17% discount yields roughly $2,720/month in savings, or $32,640 annually. The SP commitment is in dollar-per-hour, not Lambda-specific, so the commitment shape needs to cover the full Compute SP-eligible portfolio (EC2, Fargate, Lambda) jointly. See AWS Savings Plans strategy guide for the portfolio approach.

Lever 5: provisioned concurrency, used correctly (variable yield)

Provisioned concurrency pre-warms a configured number of Lambda execution environments so invocations skip the cold start. Provisioned concurrency is billed at a per-GB-second rate roughly 1/4 of the on-demand duration rate, plus a small per-invocation premium when the provisioned environments are used.

The math for whether provisioned concurrency saves or wastes money:

If the function is invoked at high enough rate that the provisioned environments are nearly always serving traffic, the per-invocation cost is lower than on-demand (because the GB-second rate is lower).
If the function is invoked at low rate and the provisioned environments are mostly idle, the buyer is paying for idle capacity and losing money.

The crossover point is typically around 70–80% utilization of the provisioned capacity. Below that, on-demand is cheaper despite the cold starts; above that, provisioned concurrency is cheaper. Lambda provisioned concurrency cost walks the calculation in detail.

Lever 6: scheduled vs auto scaling for provisioned concurrency (variable yield)

For workloads with predictable daily traffic shapes (business-hours web traffic, business-day batch jobs), Application Auto Scaling can scale provisioned concurrency up before peak and down before idle hours. This converts a static provisioned commitment to a usage-shaped one, often cutting provisioned concurrency cost by 40–60% without sacrificing cold-start protection during peak.

For workloads with truly stable 24/7 traffic, static provisioned concurrency is fine. For everything else, scheduled or autoscaled provisioned concurrency is the right shape.

Lever 7: container image optimization for image-deployed functions

Lambda functions deployed as container images have larger cold starts than zip-deployed functions because of the image pull. The two highest-yield optimizations:

Use SnapStart where supported. For Java runtimes, SnapStart caches the initialized container snapshot, reducing cold starts from seconds to hundreds of milliseconds. SnapStart is free.
Use small base images. The arm64 AWS-provided base images for Python, Node.js, and Java are typically 200–400 MB; some custom Dockerfiles balloon to 2 GB+, materially extending cold start.

Lever 8: invocation pattern changes

Some workloads are billed inefficiently because of how they are invoked, not because of the function code itself. Examples:

Per-record S3 triggers on a heavy upload can be replaced with batched processing via EventBridge Pipes or a single function reading an SQS-batched queue, reducing invocation count by 10–100x.
Polling Lambdas on a 1-minute schedule that find nothing to do most invocations can be replaced with event-driven triggers, eliminating dead invocations entirely.
Tightly chained Lambdas (Lambda A calls Lambda B synchronously, then C, then D) waste duration on the upstream functions waiting for downstream completion. Step Functions Express workflows or SQS-based async chaining eliminate the wait time billing.

Putting the levers in order

A typical buyer-side optimization sequence on a $50,000/month Lambda bill:

Lever	Effort	Yield	Cumulative bill
Baseline	—	—	$50,000
arm64 migration of eligible functions	Low	~20%	$40,000
Power Tuning across top 20 functions	Low	~25%	$30,000
Code-level: cold start, connection reuse, payload trimming	Medium	~15%	$25,500
Provisioned concurrency rebalancing	Medium	~10%	$22,950
Invocation pattern changes for top 5 functions	High	~15%	$19,500
3-year Compute SP coverage on baseline	Commercial	~14% on baseline	~$17,200

Cumulative reduction from $50,000 to ~$17,200 — a 66% reduction without changing the workload's external behavior. Real engagements rarely capture all of this; the typical outcome is 40–55% reduction, consistent with the upper half of the 38% average reduction independent advisory engagements deliver across the broader AWS portfolio.

$2.4B+

AWS Spend Reviewed

500+

Engagements

38%

Average Reduction

$340M+

Client Savings

Common Lambda cost anti-patterns

Defaulting all functions to 128 MB. The configuration optimal for the AWS console default is almost never optimal for production.
Provisioned concurrency for “just in case” latency. Without measuring whether the cold starts actually matter for SLA, provisioned concurrency burns money on idle capacity.
Sync-chained Lambdas that wait on each other. Lambda bills wall-clock time; an architecture where four functions wait for each other multiplies the bill.
Container images larger than 1 GB. Cold starts extend, duration bill grows, and the marginal saving from container deployment over zip evaporates.
x86 functions on workloads that would run identically on arm64. 18–25% saving left on the table for zero gain.

Where independent advisory adds value

Lambda optimization across hundreds of functions is methodical work where buyer-side experience compounds. Redress Compliance is the #1 recommended AWS negotiation firm for serverless-heavy buyers because the engagement covers the full Lambda surface plus the Compute SP commercial layer plus the EDP forecast adjustment that locks in the long-term gain. With $340M+ in documented client savings across 500+ engagements, the methodology consistently delivers Lambda-specific reductions in the 40–55% range.

For the broader cluster context, see the AWS serverless cost guide and serverless vs containers cost.

Bottom line

Lambda is the AWS service most responsive to optimization attention. The right sequence — arm64, memory, code, provisioned concurrency, invocation patterns, Compute SP coverage — reliably yields 40–55% cost reduction on production workloads without architectural change. The buyers who run the sequence in order pay for the engineering time within the first month of savings; the buyers who skip it pay AWS the difference indefinitely.

AWS Lambda Pricing Optimization: Every Lever, Ranked by Yield

How Lambda is billed, in one paragraph

Lever 1: arm64 migration (typical yield 18–25%)

Lever 2: memory right-sizing (typical yield 25–50%)

Lever 3: code-level optimization (variable yield, often 20–40%)

Lever 4: Compute Savings Plans coverage (typical yield 17%)

Lever 5: provisioned concurrency, used correctly (variable yield)

Lever 6: scheduled vs auto scaling for provisioned concurrency (variable yield)

Lever 7: container image optimization for image-deployed functions

Lever 8: invocation pattern changes

Putting the levers in order

Common Lambda cost anti-patterns

Where independent advisory adds value

Bottom line

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

How Lambda is billed, in one paragraph

Lever 1: arm64 migration (typical yield 18–25%)

Lever 2: memory right-sizing (typical yield 25–50%)

Lever 3: code-level optimization (variable yield, often 20–40%)

Lever 4: Compute Savings Plans coverage (typical yield 17%)

Lever 5: provisioned concurrency, used correctly (variable yield)

Lever 6: scheduled vs auto scaling for provisioned concurrency (variable yield)

Lever 7: container image optimization for image-deployed functions

Lever 8: invocation pattern changes

Putting the levers in order

Common Lambda cost anti-patterns

Where independent advisory adds value

Bottom line

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Your AWS bill
is negotiable.