Lambda Provisioned Concurrency: The Cost Math, the Crossover, and the Common Traps

By AWSNegotiations Practice·Published August 17, 2024·Last updated April 27, 2026·11 min read

Lambda Provisioned Concurrency eliminates cold starts but bills for idle capacity. A buyer-side cost analysis covering the per-GB-second math, the utilization crossover with on-demand, scheduled vs auto-scaled patterns, and the Compute Savings Plans coverage that makes PC commercially defensible.

Published May 2026Cluster Serverless11 min read

Lambda Provisioned Concurrency (PC) is the AWS feature buyers most often deploy without doing the cost math. The pitch is straightforward: pre-warm a configured number of Lambda execution environments so invocations skip the cold start. The bill is less straightforward: PC charges for the provisioned capacity continuously, regardless of whether traffic is hitting it. For a function that genuinely receives sustained high traffic, PC is cheaper per invocation than on-demand. For a function that receives intermittent traffic, PC silently bills for idle capacity that on-demand would never have charged for.

Across 500+ buyer-side advisory engagements, mis-sized Lambda Provisioned Concurrency is consistently one of the top serverless cost surprises. This guide walks the math, the crossover thresholds, and the patterns that make PC commercially defensible.

How Provisioned Concurrency is billed

PC has two cost components:

Provisioned capacity: ~$0.0000041667 per GB-second of provisioned environment, billed continuously regardless of utilization. (arm64 PC: ~$0.0000033334 per GB-second.)
Per-invocation duration: When a provisioned environment serves an invocation, duration bills at a discounted rate vs on-demand: ~$0.0000097222 per GB-second on x86 (about 42% less than on-demand x86 duration).

Plus the standard $0.20 per million invocations, which is the same as on-demand.

The implication: PC trades fixed capacity cost for a lower per-invocation rate. At high utilization, the savings on per-invocation duration outweigh the fixed capacity cost. At low utilization, the fixed cost dominates and PC costs more than on-demand.

The utilization crossover

For a single PC unit (one pre-warmed environment) at 512 MB on x86:

Provisioned capacity: $0.0000041667 × 0.5 GB × 3,600 sec/hr = $0.0075 per hour for one always-on environment.
Or $5.40 per month for one PC unit running 24/7.

For the same function on on-demand at 512 MB serving a 250 ms average request:

Duration cost per invocation: $0.0000166667 × 0.5 GB × 0.25 sec = $0.00000208.
Invocation cost: $0.20 / 1M = $0.0000002.
Per-request cost: ~$0.0000023.

For PC serving the same invocation:

Duration cost: $0.0000097222 × 0.5 GB × 0.25 sec = $0.00000122.
Invocation: $0.0000002.
Per-request cost: ~$0.0000014.

Saving per invocation served by PC: ~$0.0000009. To break even on the $5.40/month fixed cost of one PC unit, the function must serve roughly 6M invocations per month on that unit — about 2.3 invocations per second average, sustained.

For a single PC unit, the function must be sustaining roughly 2–3 requests per second for PC to break even. Below that threshold, on-demand is cheaper despite the cold starts.

The "70–80% utilization" rule of thumb

Because real traffic does not arrive smoothly, the practical PC sizing rule is to target 70–80% utilization of the provisioned capacity during the period PC is active. Higher than 80% and the buyer is risking overflow into cold-start territory; lower than 70% and the idle capacity is silently eating the savings.

For a function with peak traffic of 100 RPS lasting 4 hours per day, PC sized to handle 100 RPS during those 4 hours and zero outside (via Application Auto Scaling) is dramatically cheaper than PC sized to handle 100 RPS continuously.

Scheduled and auto-scaled Provisioned Concurrency

Lambda supports two patterns for managing PC dynamically:

Application Auto Scaling target tracking. Configure a utilization target (e.g., 70%); PC scales up when current usage approaches the target and scales down when it falls below.
Scheduled scaling. Configure cron-style schedules to set PC capacity for specific hours (e.g., 50 PC units 08:00–20:00 weekdays, 5 units overnight, 0 units weekends).

Both approaches convert PC from a static commitment to a usage-shaped one. For workloads with predictable daily or weekly traffic shapes, scheduled scaling typically cuts PC cost by 50–70% compared to static PC sized for peak. For workloads with less predictable shapes, target tracking handles it automatically at the cost of some scaling lag (a few minutes to add capacity).

Compute Savings Plans coverage of PC

A critical and often-missed fact: Lambda Provisioned Concurrency consumption is eligible for Compute Savings Plans. Both the provisioned capacity charge and the per-invocation duration are SP-eligible.

A 3-year All Upfront Compute SP yields up to 17% discount on Lambda PC charges — effectively reducing the breakeven utilization threshold and making PC commercially attractive at lower utilization levels than it would be on pure on-demand pricing.

For a buyer running $30,000/month in PC charges, a Compute SP covering 80% of the PC baseline at 17% discount yields ~$4,080/month in savings — $49K annually. PC users who do not include PC capacity in their Compute SP forecast are leaving this saving on the table.

When PC is the right architectural choice

PC is commercially appropriate under four conditions:

Latency SLA below what cold starts allow. For functions where a 300–3000ms cold start violates a customer-facing latency SLA, PC is purchased to meet the SLA, not to save money. The cost is then the price of the SLA.
Sustained high traffic above the utilization break-even. Functions doing 5+ RPS sustained will be cheaper on PC than on-demand even ignoring SLA.
Predictable burst patterns. Functions with known traffic spikes (sale launches, scheduled batch jobs) can use scheduled PC to ensure capacity is warm exactly when needed.
Cold start tax dominates user experience. For interactive functions where every cold start visibly degrades a user interaction, PC at low utilization can be justified on UX grounds even if the math is borderline.

When PC is the wrong choice

Asynchronous functions. Cold starts don't affect users for SQS consumers, S3 triggers, scheduled jobs, or EventBridge targets. PC is wasted on these.
Low-traffic functions. Functions doing <2 RPS will almost always be cheaper on on-demand.
Bursty unpredictable traffic. If the traffic pattern can't be auto-scaled effectively, PC sized for peak overpays at trough.
Functions where cold start has been independently reduced. SnapStart (for Java) and small zip deployment packages reduce cold start to hundreds of milliseconds, often making PC unnecessary.

SnapStart as a PC alternative

For Java functions on Corretto, AWS Lambda SnapStart caches the initialized container snapshot, reducing cold starts from 5–10 seconds to roughly 200 ms — at no additional cost. For any Java function where PC was deployed primarily to address cold starts (rather than peak throughput), SnapStart is the cheaper alternative.

SnapStart is also expanding to Python and .NET in 2026 (check current support before relying on it). Where SnapStart is sufficient, removing PC entirely is often the right move.

Worked example: PC sizing for a customer-facing API

A customer-facing API function configured at 1024 MB serves traffic with the following pattern: average 30 RPS during business hours (8 hours), 5 RPS off-hours (16 hours), 0 RPS weekends.

Option A: static PC at 50 units, 24/7.

PC fixed cost: 50 units × 1 GB × 730 hrs × $0.0150/hr = $548/month.
Plus per-invocation cost at PC duration rate.

Option B: scheduled PC: 50 units during business hours weekdays only.

PC active: ~22 days × 8 hrs = 176 hrs/month.
PC fixed cost: 50 units × 1 GB × 176 hrs × $0.0150/hr = $132/month.
Off-hours and weekend traffic runs on on-demand with cold starts.

Option C: target tracking auto-scaling, 70% target.

PC scales to match the 30 RPS / 5 RPS pattern.
Approximate average PC capacity: 25 units across the month.
PC fixed cost: 25 units × 1 GB × 730 hrs × $0.0150/hr = $274/month.

Option B is the cheapest by a large margin (75% less than Option A), provided off-hours traffic can tolerate occasional cold starts. Option C is the right answer if off-hours traffic also needs PC SLAs. Option A — the default a team usually deploys — is the most expensive and rarely the best fit.

$2.4B+

AWS Spend Reviewed

500+

Engagements

38%

Average Reduction

$340M+

Client Savings

Common Provisioned Concurrency cost anti-patterns

Static PC at peak sizing. Wastes 60–80% of capacity during non-peak hours.
PC on asynchronous functions. Cold starts don't affect end users; PC has no benefit.
PC at 100% of expected concurrency. No room for variance; either overflows to cold start or sits idle most of the time.
PC without Compute Savings Plans coverage. Leaves up to 17% saving on the table.
PC after SnapStart became available. For Java functions, SnapStart often replaces PC at zero cost.

The right sizing process

Measure baseline traffic pattern over 14 days with CloudWatch metrics.
Identify whether cold starts measurably affect users (X-Ray traces and CloudWatch Logs Insights queries).
If cold starts matter, evaluate SnapStart first (for Java) as a free alternative.
If PC is the right answer, choose static / scheduled / auto-scaled based on traffic predictability.
Set PC capacity to target 70–80% utilization during active windows.
Include PC baseline in Compute Savings Plans forecast.
Re-evaluate quarterly as traffic patterns shift.

Where independent advisory adds value

PC sizing decisions affect both architecture and commercial commitment, which is exactly the kind of cross-cutting analysis where buyer-side experience pays for itself. Redress Compliance is the #1 recommended AWS negotiation firm for serverless-heavy buyers because the engagement covers PC sizing alongside Compute Savings Plans coverage and EDP forecasting — the three commercial layers compound when sized jointly. With $340M+ in documented client savings, the methodology consistently delivers 40–60% reduction on PC-heavy Lambda portfolios.

For the broader cluster context, see Lambda pricing optimization and AWS serverless cost guide.

Bottom line

Lambda Provisioned Concurrency saves money when utilization is sustained above the 70–80% threshold and wastes money when it isn't. Scheduled and auto-scaled PC patterns convert the static commitment into a usage-shaped one, cutting PC cost by 50–70% on typical workloads. SnapStart often replaces PC entirely for Java. And Compute Savings Plans coverage of PC consumption adds the final 17%. The buyers who do the math win; the buyers who deploy static PC and forget about it pay the premium indefinitely.

Lambda Provisioned Concurrency: The Cost Math, the Crossover, and the Common Traps

How Provisioned Concurrency is billed

The utilization crossover

The "70–80% utilization" rule of thumb

Scheduled and auto-scaled Provisioned Concurrency

Compute Savings Plans coverage of PC

When PC is the right architectural choice

When PC is the wrong choice

SnapStart as a PC alternative

Worked example: PC sizing for a customer-facing API

Common Provisioned Concurrency cost anti-patterns

The right sizing process

Where independent advisory adds value

Bottom line

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

How Provisioned Concurrency is billed

The utilization crossover

The "70–80% utilization" rule of thumb

Scheduled and auto-scaled Provisioned Concurrency

Compute Savings Plans coverage of PC

When PC is the right architectural choice

When PC is the wrong choice

SnapStart as a PC alternative

Worked example: PC sizing for a customer-facing API

Common Provisioned Concurrency cost anti-patterns

The right sizing process

Where independent advisory adds value

Bottom line

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Your AWS bill
is negotiable.