OpenSearch Cost Management: Right-Sizing Clusters, UltraWarm, and Serverless Math
OpenSearch Service is one of the most expensive analytics line items on a typical AWS bill. The good news is that it is also one of the easiest to right-size once you understand the hot, warm, cold tier model and the trade-offs of Serverless OpenSearch.
OpenSearch Service (the AWS-managed Elasticsearch-fork) is a category of AWS spend that quietly grows for years before anyone audits it. Logging clusters keep ingesting more logs. Search clusters keep adding indexes. Old data sits on hot SSD storage that costs ten times what it should. By the time someone notices the bill, the cluster is too large to right-size in a single change. This piece walks the cost levers in order of payoff, including the increasingly important Serverless OpenSearch model.
How OpenSearch billing works
- Instance-hour for data nodes, master nodes, and (optional) ultrawarm and cold nodes.
- EBS storage per GB-month for hot tier data.
- UltraWarm storage at roughly one-tenth the cost of hot tier; backed by S3.
- Cold storage at S3 pricing, attached on demand to UltraWarm nodes for queries.
- Reserved instances available for 1- and 3-year terms with the usual discounts.
- Serverless OpenSearch billed per OpenSearch Compute Unit (OCU) hour.
The three-tier model
| Tier | Storage cost | Query latency | Use for |
|---|---|---|---|
| Hot | EBS, ~$0.10/GB-month | milliseconds | Recent data, active dashboards |
| UltraWarm | ~$0.024/GB-month | seconds | 30-90 day old data |
| Cold | ~$0.024/GB-month (S3 + attach) | tens of seconds | Compliance retention, rarely queried |
A typical logging cluster that keeps 90 days of data hot is paying ten times what it should. Move data older than 7 to 14 days to UltraWarm; move data older than 30 days to cold. The query latency change is acceptable for most logs and audit data.
Right-sizing instances
The instance family matters. Common right-sizing moves:
- Replace m5 with m6g or r6g. Graviton instances are 20 to 40 percent cheaper for the same workload.
- Drop master node tier for clusters under 10 data nodes; dedicated masters are not required.
- Right-size data node count. Many clusters are sized for peak ingest, not steady-state.
Reserved Instances
OpenSearch RIs work like EC2 RIs: 1- or 3-year terms, partial or all-upfront, scoped to instance family and region. Discounts run 30 to 65 percent depending on term and payment. For any cluster running 24/7 longer than 6 months, RIs are essentially free money.
- RI coverage should be sized to baseline node count, not peak.
- Use Savings Plans for compute layer where possible (the underlying EC2 is covered by Compute Savings Plans).
- Re-evaluate RI mix at every renewal cycle; instance families shift.
See the RI optimization guide for general RI discipline.
Index lifecycle management
OpenSearch supports Index State Management (ISM) policies that automate the hot-warm-cold migration. The minimum ISM policy:
- Rollover daily indexes when they reach a size or age threshold.
- Move to UltraWarm at age 7 to 14 days.
- Move to cold at age 30 days.
- Delete at the retention horizon.
Without ISM, retention discipline depends on human attention, which fails. With ISM, indexes age automatically and the bill stays predictable.
Serverless OpenSearch
Serverless OpenSearch bills per OpenSearch Compute Unit (OCU) hour. The model removes provisioning, but the billing is intense:
- Indexing OCUs scale with ingest volume.
- Search OCUs scale with query load.
- Minimum two OCUs per collection (one indexing, one search) for redundancy.
Serverless is cost-effective for workloads with intermittent traffic, where a provisioned cluster would sit idle. Serverless is expensive for steady-state high-throughput workloads, where a Reserved Instance provisioned cluster wins by 40 to 60 percent.
Worked example: $80K monthly OpenSearch bill
| Step | Action | Bill after |
|---|---|---|
| Baseline | 90 days hot, m5 instances, no RIs | $80,000/month |
| Step 1 | Move data >14 days to UltraWarm | ~$52,000/month |
| Step 2 | Move data >30 days to cold | ~$42,000/month |
| Step 3 | Migrate to Graviton (r6g) | ~$32,000/month |
| Step 4 | 1-year all-upfront RIs on baseline nodes | ~$22,000/month |
A 73 percent reduction with no observability or retention compromise. Each step is independently safe; the order optimises for the lowest-risk savings first.
Logging-specific patterns
For logging clusters specifically, three patterns compound:
- Aggressive sampling for high-volume logs. Debug logs from production rarely need every line.
- Log indexes per service per day, not per service per hour. Hourly indexes have crippling overhead at scale.
- Move security-audit logs to a separate cluster sized for compliance retention, not query performance.
The EDP angle
OpenSearch is inside the analytics bundle for EDP commitment purposes. The negotiation levers:
- Bundle OpenSearch instance-hour with Athena, Glue, and Redshift for a blended analytics discount.
- Negotiate UltraWarm storage at a flat rate for committed volume.
- Migration credits for customers moving from self-managed Elasticsearch.
- Serverless OCU rate discounts at high commitment volumes.
Common failure modes
Keeping all data hot
The single most expensive OpenSearch anti-pattern. A 60-day hot retention on a 1 TB/day log cluster pays for tens of TB of EBS storage that is never queried. ISM with UltraWarm migration cuts this by 80 percent.
Over-replication
OpenSearch defaults to one replica shard per primary, which doubles storage and instance footprint. For non-critical workloads (logs, metrics) one replica is sufficient. For analytics that can be regenerated from source, zero replicas with snapshot-based recovery is sometimes correct.
Wrong index sharding
Too many shards inflate cluster overhead; too few create hot shards. Target 30 to 50 GB per shard for time-series data.
Serverless for steady-state workloads
Serverless OpenSearch is a great match for spiky workloads. For 24/7 production search clusters above 8 OCUs of sustained load, provisioned with RIs is cheaper.
Implementation checklist
- Audit hot tier usage by index age; identify candidates for UltraWarm migration.
- Implement ISM policies covering rollover, UltraWarm migration, and cold migration.
- Migrate to Graviton instance families.
- Purchase RIs sized to baseline node count.
- Negotiate analytics bundle discount in the next EDP cycle.
- Evaluate Serverless OpenSearch for new spiky workloads.
- Contact us for an OpenSearch cost review benchmarked against 500+ engagements.
Snapshot strategy
OpenSearch snapshots have non-trivial cost dynamics that get missed. Manual snapshots stored in S3 bill at S3 standard rates plus a small per-snapshot overhead. Automated snapshots stored in a service-managed bucket are included in the cluster cost. For long retention, manual snapshots with lifecycle to Glacier are dramatically cheaper than keeping data hot or in UltraWarm.
- Use automated daily snapshots for short-term recovery.
- Send manual monthly snapshots to S3 with a lifecycle policy moving to Glacier after 30 days.
- Audit snapshot retention; many clusters keep snapshots for years without purpose.
Cross-cluster replication and remote stores
For multi-region or DR scenarios, OpenSearch supports cross-cluster replication and remote-backed indexes. The trade-off:
- Cross-cluster replication doubles data-node storage; appropriate for active-active multi-region.
- Remote-backed indexes store primary copies in S3 and pull data on demand; cheaper for warm and cold data.
- Inter-region data transfer charges apply to replication traffic; model these into the architectural decision.
Search vs analytics workloads
OpenSearch is used for two distinct workload types and they cost differently:
- Search workloads (product search, internal search) tend to be query-heavy and benefit from larger memory-optimised instances and aggressive caching.
- Analytics workloads (logging, observability, security analytics) tend to be ingest-heavy and benefit from compute-optimised data nodes and aggressive tiering to UltraWarm.
Mixing both workloads on a single cluster usually inflates cost. Separate clusters with right-sized instance families are typically cheaper in aggregate.
For more see the AWS analytics cost optimization pillar, the Athena query cost reduction piece for cheaper log analytics for cold data, and the Kinesis pricing optimization piece for the ingestion pipeline feeding OpenSearch.