Bedrock Knowledge Bases Cost: Vector Backends, Model Choice, EDP Strategy
Bedrock Knowledge Bases layers vector storage, embedding, retrieval, and generation costs. Picking the right backend and the right model cuts the bill 50–70% before any negotiation.
Bedrock Knowledge Bases is the AWS-managed retrieval-augmented generation (RAG) layer that turns a corpus of enterprise documents into a queryable vector index that Bedrock foundation models can ground their answers in. The product solves a real problem — RAG plumbing is one of the most repetitive parts of building generative AI applications — but the pricing model layers multiple cost drivers in a way that can surprise the FinOps team. This guide walks through every Knowledge Bases cost dimension, the typical share of bill, and how to bring Knowledge Bases into a broader Bedrock and AI/ML negotiation at EDP renewal.
The four cost layers
| Layer | What it bills for |
|---|---|
| Embedding | Bedrock invocation against embedding models (Titan Embeddings, Cohere Embed) at document ingest and update |
| Vector storage | The underlying vector store: OpenSearch Serverless, Aurora PostgreSQL pgvector, Pinecone, or Redis |
| Retrieval and generation | Bedrock invocation against the response model (Claude, Llama, Titan, Mistral) on every user query |
| Source data | S3 storage of source documents plus any data transfer |
None of these are unique to Knowledge Bases — every RAG architecture has them — but the combination of OpenSearch Serverless's minimum-OCU floor and Bedrock's per-token pricing makes the math tricky.
Vector storage — the surprise cost
The default vector store option in Knowledge Bases is OpenSearch Serverless. OpenSearch Serverless has a minimum capacity floor: 2 OCUs (one search, one indexing), and a third OCU is required for replication. At $0.24 per OCU-hour, that is roughly $525 per month for a Knowledge Base that does nothing.
For small POC use cases, this minimum floor is the single largest cost driver. A 50 MB document corpus that queries 100 times a day pays $525 in storage and roughly $1 in invocations — 99.8% of the bill is the floor. The alternatives:
- Aurora Serverless v2 with pgvector — minimum 0.5 ACU at $0.06/hour = $22/month minimum, scales smoothly
- Pinecone — starts at $0 free tier, then per-pod or per-namespace pricing
- Redis (MemoryDB or ElastiCache) — flexible but you operate it
- OpenSearch Service provisioned — only at meaningful scale
For most production deployments at meaningful corpus size, OpenSearch Serverless is competitive because OCUs scale and the per-OCU rate is reasonable. For POCs and low-traffic apps, Aurora pgvector typically wins on TCO by 8–12x.
Embedding cost — small but recurring
Knowledge Bases re-embeds documents on ingest and on update. Titan Text Embeddings V2 is $0.00002 per 1,000 input tokens — about $0.02 per million tokens. A 100,000-document corpus with averaging 1,500 tokens per document is 150 million tokens — $3 to embed the whole corpus once.
That sounds trivial until documents change frequently. A Knowledge Base over a 100K-doc corpus that re-embeds 5% of documents daily incurs $0.15/day forever — small but unavoidable. Cohere Embed English V3 at $0.0001 per 1K is 5x more expensive but produces stronger retrieval on certain corpora.
Retrieval and generation — the per-query bill
Every Knowledge Bases query invokes:
- An embedding call to convert the user query to a vector ($0.00002 per 1K tokens)
- A vector store query (included in OpenSearch / Aurora / Pinecone bill above)
- A generation call against the response model with retrieved context as the prompt
The third step is the dominant cost. A Claude 3.5 Sonnet response with 4K input tokens (retrieved context plus user query) and 500 output tokens is about $0.020 per query. At 100K queries per month, that is $2,000.
For high-volume applications, switching to Claude 3 Haiku ($0.00025/1K input, $0.00125/1K output) cuts per-query cost roughly 10–12x while keeping useful response quality on most enterprise document use cases. Model selection is the single highest-leverage lever in a production RAG bill.
Cost modelling worked example
A mid-size enterprise deploys a Bedrock Knowledge Base over a 250,000-document corpus, with 50,000 user queries per month, using OpenSearch Serverless and Claude 3.5 Sonnet:
- OpenSearch Serverless 3 OCUs: $525/month
- Embedding cost (steady state, 5% doc updates daily): $50/month
- Query embedding (50K queries × ~50 tokens): $0.05/month
- Generation (50K queries × $0.02 each): $1,000/month
- S3 source storage: $20/month
- Total: ~$1,595/month
Switching that to Claude 3 Haiku for generation cuts the bill to ~$683/month — a 57% reduction. Switching the vector backend to Aurora pgvector saves another ~$500/month at this corpus size. The combination cuts the total to under $200/month.
Knowledge Bases in your EDP
Knowledge Bases bundles into the broader Bedrock and AI/ML category at EDP renewal. The negotiation pattern:
- Forecast Bedrock invocation volume by model and by Knowledge Base instance
- Pull OpenSearch Serverless or Aurora costs into the same model
- Bundle with SageMaker training, Canvas, hosted endpoints, Comprehend, and Translate for a category commit
- Anchor against alternative RAG stacks (Anthropic via API directly, Azure OpenAI on Service, Vertex AI Search)
- Negotiate Bedrock invocation rate — AWS reps have meaningful flex on Bedrock at >$1M annual commit
- Negotiate OpenSearch Serverless OCU rate as a separate line item
Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and Knowledge Bases are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better Bedrock rates than direct rep conversation, and Knowledge Bases is one of the SKUs that has the most movement available right now while AWS positions Bedrock against Azure OpenAI.
Optimization checklist
- Pick the right vector backend for corpus size and query volume
- Default to Claude 3 Haiku for production; use Sonnet/Opus only where quality demands
- Bound retrieval context length — most apps over-retrieve
- Implement query caching for repeat queries
- Re-embed selectively on actual document change, not periodic full re-index
- Use Knowledge Bases metadata filters to reduce vector search scope
- Track per-team usage for chargeback and budget enforcement
Common mistakes
- Using OpenSearch Serverless for low-traffic POCs
- Defaulting to Claude Opus or Sonnet without testing Haiku quality
- Re-embedding the full corpus daily instead of on change
- Not setting per-team query budgets
- Paying Bedrock list rates when EDP-tier discounts are available
The bottom line on Knowledge Bases pricing
Bedrock Knowledge Bases is well-priced for production workloads and badly-priced for POCs because of the vector store minimum floor. Model selection is the single largest lever in production. Vector backend selection is the largest lever at small scale. And the combination — Haiku plus Aurora pgvector for low-volume, Sonnet plus OpenSearch Serverless for production — typically cuts Knowledge Bases bills 50–70% before any negotiation.
For a Bedrock and Knowledge Bases audit before your next EDP renewal, contact us. We will return a model-and-backend optimization plan within five business days, plus the recommended posture for your AI/ML EDP conversation.