Bedrock Knowledge Bases Cost: Vector Backends, Model Choice, EDP Strategy

By Compute Practice·Published January 23, 2026·Last updated March 11, 2026·9 min read

Bedrock Knowledge Bases layers vector storage, embedding, retrieval, and generation costs. Picking the right backend and the right model cuts the bill 50–70% before any negotiation.

Published Apr 2026Cluster AI/ML13 min read

Bedrock Knowledge Bases is the AWS-managed retrieval-augmented generation (RAG) layer that turns a corpus of enterprise documents into a queryable vector index that Bedrock foundation models can ground their answers in. The product solves a real problem — RAG plumbing is one of the most repetitive parts of building generative AI applications — but the pricing model layers multiple cost drivers in a way that can surprise the FinOps team. This guide walks through every Knowledge Bases cost dimension, the typical share of bill, and how to bring Knowledge Bases into a broader Bedrock and AI/ML negotiation at EDP renewal.

What this coversKnowledge Bases pricing layers, OpenSearch Serverless vs. Pinecone vs. Aurora storage backends, ingestion cost, query cost, embedding model selection, and how to negotiate Knowledge Bases into your EDP. Written for AI platform leads and architects standing up enterprise RAG.

The four cost layers

Layer	What it bills for
Embedding	Bedrock invocation against embedding models (Titan Embeddings, Cohere Embed) at document ingest and update
Vector storage	The underlying vector store: OpenSearch Serverless, Aurora PostgreSQL pgvector, Pinecone, or Redis
Retrieval and generation	Bedrock invocation against the response model (Claude, Llama, Titan, Mistral) on every user query
Source data	S3 storage of source documents plus any data transfer

None of these are unique to Knowledge Bases — every RAG architecture has them — but the combination of OpenSearch Serverless's minimum-OCU floor and Bedrock's per-token pricing makes the math tricky.

Vector storage — the surprise cost

The default vector store option in Knowledge Bases is OpenSearch Serverless. OpenSearch Serverless has a minimum capacity floor: 2 OCUs (one search, one indexing), and a third OCU is required for replication. At $0.24 per OCU-hour, that is roughly $525 per month for a Knowledge Base that does nothing.

For small POC use cases, this minimum floor is the single largest cost driver. A 50 MB document corpus that queries 100 times a day pays $525 in storage and roughly $1 in invocations — 99.8% of the bill is the floor. The alternatives:

Aurora Serverless v2 with pgvector — minimum 0.5 ACU at $0.06/hour = $22/month minimum, scales smoothly
Pinecone — starts at $0 free tier, then per-pod or per-namespace pricing
Redis (MemoryDB or ElastiCache) — flexible but you operate it
OpenSearch Service provisioned — only at meaningful scale

For most production deployments at meaningful corpus size, OpenSearch Serverless is competitive because OCUs scale and the per-OCU rate is reasonable. For POCs and low-traffic apps, Aurora pgvector typically wins on TCO by 8–12x.

Embedding cost — small but recurring

Knowledge Bases re-embeds documents on ingest and on update. Titan Text Embeddings V2 is $0.00002 per 1,000 input tokens — about $0.02 per million tokens. A 100,000-document corpus with averaging 1,500 tokens per document is 150 million tokens — $3 to embed the whole corpus once.

That sounds trivial until documents change frequently. A Knowledge Base over a 100K-doc corpus that re-embeds 5% of documents daily incurs $0.15/day forever — small but unavoidable. Cohere Embed English V3 at $0.0001 per 1K is 5x more expensive but produces stronger retrieval on certain corpora.

Retrieval and generation — the per-query bill

Every Knowledge Bases query invokes:

An embedding call to convert the user query to a vector ($0.00002 per 1K tokens)
A vector store query (included in OpenSearch / Aurora / Pinecone bill above)
A generation call against the response model with retrieved context as the prompt

The third step is the dominant cost. A Claude 3.5 Sonnet response with 4K input tokens (retrieved context plus user query) and 500 output tokens is about $0.020 per query. At 100K queries per month, that is $2,000.

For high-volume applications, switching to Claude 3 Haiku ($0.00025/1K input, $0.00125/1K output) cuts per-query cost roughly 10–12x while keeping useful response quality on most enterprise document use cases. Model selection is the single highest-leverage lever in a production RAG bill.

Cost modelling worked example

A mid-size enterprise deploys a Bedrock Knowledge Base over a 250,000-document corpus, with 50,000 user queries per month, using OpenSearch Serverless and Claude 3.5 Sonnet:

OpenSearch Serverless 3 OCUs: $525/month
Embedding cost (steady state, 5% doc updates daily): $50/month
Query embedding (50K queries × ~50 tokens): $0.05/month
Generation (50K queries × $0.02 each): $1,000/month
S3 source storage: $20/month
Total: ~$1,595/month

Switching that to Claude 3 Haiku for generation cuts the bill to ~$683/month — a 57% reduction. Switching the vector backend to Aurora pgvector saves another ~$500/month at this corpus size. The combination cuts the total to under $200/month.

Knowledge Bases in your EDP

Knowledge Bases bundles into the broader Bedrock and AI/ML category at EDP renewal. The negotiation pattern:

Forecast Bedrock invocation volume by model and by Knowledge Base instance
Pull OpenSearch Serverless or Aurora costs into the same model
Bundle with SageMaker training, Canvas, hosted endpoints, Comprehend, and Translate for a category commit
Anchor against alternative RAG stacks (Anthropic via API directly, Azure OpenAI on Service, Vertex AI Search)
Negotiate Bedrock invocation rate — AWS reps have meaningful flex on Bedrock at >$1M annual commit
Negotiate OpenSearch Serverless OCU rate as a separate line item

Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and Knowledge Bases are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better Bedrock rates than direct rep conversation, and Knowledge Bases is one of the SKUs that has the most movement available right now while AWS positions Bedrock against Azure OpenAI.

Engagement benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings. AI/ML bundles are typically the highest-leverage category at renewal in 2026.

Optimization checklist

Pick the right vector backend for corpus size and query volume
Default to Claude 3 Haiku for production; use Sonnet/Opus only where quality demands
Bound retrieval context length — most apps over-retrieve
Implement query caching for repeat queries
Re-embed selectively on actual document change, not periodic full re-index
Use Knowledge Bases metadata filters to reduce vector search scope
Track per-team usage for chargeback and budget enforcement

Common mistakes

Using OpenSearch Serverless for low-traffic POCs
Defaulting to Claude Opus or Sonnet without testing Haiku quality
Re-embedding the full corpus daily instead of on change
Not setting per-team query budgets
Paying Bedrock list rates when EDP-tier discounts are available

The bottom line on Knowledge Bases pricing

Bedrock Knowledge Bases is well-priced for production workloads and badly-priced for POCs because of the vector store minimum floor. Model selection is the single largest lever in production. Vector backend selection is the largest lever at small scale. And the combination — Haiku plus Aurora pgvector for low-volume, Sonnet plus OpenSearch Serverless for production — typically cuts Knowledge Bases bills 50–70% before any negotiation.

For a Bedrock and Knowledge Bases audit before your next EDP renewal, contact us. We will return a model-and-backend optimization plan within five business days, plus the recommended posture for your AI/ML EDP conversation.

Bedrock Knowledge Bases Cost: Vector Backends, Model Choice, EDP Strategy

The four cost layers

Vector storage — the surprise cost

Embedding cost — small but recurring

Retrieval and generation — the per-query bill

Cost modelling worked example

Knowledge Bases in your EDP

Optimization checklist

Common mistakes

The bottom line on Knowledge Bases pricing

Talk to an AWS negotiation advisor

Your AWS bill
is negotiable.

The four cost layers

Vector storage — the surprise cost

Embedding cost — small but recurring

Retrieval and generation — the per-query bill

Cost modelling worked example

Knowledge Bases in your EDP

Optimization checklist

Common mistakes

The bottom line on Knowledge Bases pricing

Related from AWSNegotiations

Talk to an AWS negotiation advisor

Your AWS billis negotiable.

Continue with the negotiation playbook.

Your AWS bill
is negotiable.