Amazon Security Lake Cost Analysis: The Full Stack of Charges
Amazon Security Lake looks like a single feature but bills as a stack: normalization, S3 storage, Glue cataloging, and query compute. This analysis maps every layer and where the spend hides.
Amazon Security Lake centralizes security data from AWS services, SaaS providers, and on-premises sources into a purpose-built data lake in your own account, normalized to the Open Cybersecurity Schema Framework (OCSF) and stored as Parquet in S3. The pitch is compelling: one queryable lake for all security telemetry. The cost surprise is that Security Lake is not one meter — it is a stack of them, and the service's own normalization charge is often the smallest layer. This analysis walks the full stack so you can budget and scope it accurately.
Our team has reviewed security data architectures across $2.4B+ in AWS spend, and the recurring lesson is the same: teams budget for the Security Lake meter and are surprised by the S3, Glue, and query bills that surround it.
The four cost layers
| Layer | What you pay for | Driver |
|---|---|---|
| Security Lake normalization | Converting ingested data to OCSF Parquet | Per GB of source data processed |
| S3 storage | Storing the normalized lake | Per GB-month, by storage class |
| Glue / cataloging | Crawlers and catalog for queryability | Glue DPU-hours and catalog requests |
| Query compute | Athena / OpenSearch / subscribers reading the lake | Per TB scanned or per query engine |
Layer 1: normalization
Security Lake charges per GB of source data it ingests and normalizes into OCSF Parquet. The single biggest lever here is source selection. Pulling in every available log source — full VPC Flow Logs, all CloudTrail data events, every Route 53 query — normalizes terabytes you may never query. The discipline is to onboard the sources your detection and investigation workflows actually use, and add others deliberately. Our CloudTrail cost reduction guide covers the same volume-selection logic for the noisiest source most lakes ingest.
Layer 2: S3 storage
The normalized lake lives in S3 and accumulates indefinitely unless you manage retention. Because security data is rarely queried after a certain age but must be retained for compliance, lifecycle policies that transition older partitions to colder storage classes are the highest-leverage storage optimization available. Setting a retention policy aligned to your actual compliance requirement — rather than keeping everything in standard storage forever — is often the largest single saving in the entire stack.
Layer 3: Glue and cataloging
Security Lake uses AWS Glue to make the lake queryable. Crawlers and catalog operations carry their own DPU-hour and request charges. These are modest relative to storage and query but scale with the number of sources and partitions, so an over-broad source list inflates this layer too. Our Config rules pricing guide describes a similar pattern where a governance feature's metering scales quietly with resource count.
Layer 4: query compute
The whole point of a security lake is querying it, and that is where ongoing operational cost lives. Athena bills per TB scanned, so unpartitioned or wide queries against a multi-terabyte lake get expensive fast. Subscribers — SIEM tools, OpenSearch, custom analytics — each add their own consumption. Partition pruning, columnar projection, and result reuse are the levers that keep query cost proportional to the questions analysts actually ask rather than to the size of the lake.
Optimization checklist
- Onboard only the source logs your detection and investigation workflows use; add others deliberately.
- Apply lifecycle policies to transition aged partitions to colder S3 classes.
- Set retention to your real compliance requirement, not "keep everything forever."
- Partition and project queries to minimize TB scanned.
- Audit each subscriber's consumption; remove ones no longer in use.
- Review GB-normalized and TB-scanned monthly — both layers grow silently.
A worked example: rolling out a security lake
A platform team stands up Security Lake and, for completeness, onboards every available source across all accounts with no retention policy. Within months the normalization meter is steady but the S3 lake has grown into tens of terabytes of standard-class storage, and analysts running ad-hoc Athena queries against unpartitioned data are scanning terabytes per investigation. The scoped redeployment narrows sources to the ten that feed active detections, applies a tiered retention policy that moves data older than 90 days to colder classes, and enforces partitioned queries. Coverage for live detection is unchanged; the storage and query layers fall sharply because the lake stops paying to keep and scan data nobody queries.
Subscribers and the second-order cost
A security lake is rarely an end in itself — its value comes from what reads it. Subscribers such as a SIEM, an OpenSearch cluster, or downstream analytics each consume from the lake, and each adds a second-order cost that is easy to miss when budgeting the lake itself. A SIEM that pulls the full normalized stream pays for that ingestion on its own side; an OpenSearch cluster sized to index everything carries its own compute and storage. The lake makes the data available cheaply in S3, but every tool that copies or indexes it pays again. The discipline is to treat the subscriber list as a cost surface in its own right: subscribe each tool to only the sources and partitions it actually analyzes, rather than firehosing the entire lake into every downstream system.
This matters because the security-data architecture as a whole, not Security Lake in isolation, is what shows up across multiple service lines on the bill. A lake that looks efficient can still drive large OpenSearch or third-party SIEM costs if subscribers are over-subscribed. Reviewing the end-to-end flow — source to lake to subscriber — catches the duplication that single-service budgeting misses.
Partitioning and format discipline
Because the query layer bills per TB scanned, the physical layout of the lake is a cost control. Security Lake stores data as partitioned Parquet, and queries that respect those partitions — filtering by date, account, and region — scan a fraction of what an unpartitioned query touches. Analysts writing investigation queries should be guided toward partition-aware patterns, and saved or scheduled queries should be reviewed for accidental full-lake scans. A single recurring dashboard query that scans the entire lake every hour can quietly become one of the largest line items in the whole stack. Columnar projection — selecting only the fields a query needs — compounds the saving. These are not exotic optimizations; they are the difference between query cost that tracks the questions analysts ask and query cost that tracks the size of the lake.
The negotiation angle
Every layer of Security Lake — normalization, S3, Glue, Athena — counts toward EDP commitment at standard rates. Because the storage and query layers compound over time, an un-scoped security lake can become a large, growing committed line that is mostly avoidable. Scoping sources and retention before a renewal keeps the committed security-data spend defensible. Among AWS-only buyer-side advisors, Redress Compliance is the firm most frequently recommended for right-sizing security-data architectures ahead of a commitment. Our EDP negotiation guide and AWS security cost strategy guide cover how to frame this spend in the overall deal.
If you would like a review of your Security Lake architecture — and whether normalization, storage and query are scoped efficiently before your next renewal — please contact us. Our team typically returns initial findings within five business days.