EDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI PricingEDP NegotiationSavings Plans OptimizationReserved Instances StrategyEC2 Right-SizingS3 Cost ReductionEgress NegotiationMigration CreditsSupport Tier AdvisoryMulti-Cloud LeverageBedrock AI Pricing

Bedrock Knowledge Bases Cost: Vector Backends, Model Choice, EDP Strategy

Bedrock Knowledge Bases layers vector storage, embedding, retrieval, and generation costs. Picking the right backend and the right model cuts the bill 50–70% before any negotiation.

Published Apr 2026Cluster AI/ML13 min read

Bedrock Knowledge Bases is the AWS-managed retrieval-augmented generation (RAG) layer that turns a corpus of enterprise documents into a queryable vector index that Bedrock foundation models can ground their answers in. The product solves a real problem — RAG plumbing is one of the most repetitive parts of building generative AI applications — but the pricing model layers multiple cost drivers in a way that can surprise the FinOps team. This guide walks through every Knowledge Bases cost dimension, the typical share of bill, and how to bring Knowledge Bases into a broader Bedrock and AI/ML negotiation at EDP renewal.

What this coversKnowledge Bases pricing layers, OpenSearch Serverless vs. Pinecone vs. Aurora storage backends, ingestion cost, query cost, embedding model selection, and how to negotiate Knowledge Bases into your EDP. Written for AI platform leads and architects standing up enterprise RAG.

The four cost layers

LayerWhat it bills for
EmbeddingBedrock invocation against embedding models (Titan Embeddings, Cohere Embed) at document ingest and update
Vector storageThe underlying vector store: OpenSearch Serverless, Aurora PostgreSQL pgvector, Pinecone, or Redis
Retrieval and generationBedrock invocation against the response model (Claude, Llama, Titan, Mistral) on every user query
Source dataS3 storage of source documents plus any data transfer

None of these are unique to Knowledge Bases — every RAG architecture has them — but the combination of OpenSearch Serverless's minimum-OCU floor and Bedrock's per-token pricing makes the math tricky.

Vector storage — the surprise cost

The default vector store option in Knowledge Bases is OpenSearch Serverless. OpenSearch Serverless has a minimum capacity floor: 2 OCUs (one search, one indexing), and a third OCU is required for replication. At $0.24 per OCU-hour, that is roughly $525 per month for a Knowledge Base that does nothing.

For small POC use cases, this minimum floor is the single largest cost driver. A 50 MB document corpus that queries 100 times a day pays $525 in storage and roughly $1 in invocations — 99.8% of the bill is the floor. The alternatives:

  • Aurora Serverless v2 with pgvector — minimum 0.5 ACU at $0.06/hour = $22/month minimum, scales smoothly
  • Pinecone — starts at $0 free tier, then per-pod or per-namespace pricing
  • Redis (MemoryDB or ElastiCache) — flexible but you operate it
  • OpenSearch Service provisioned — only at meaningful scale

For most production deployments at meaningful corpus size, OpenSearch Serverless is competitive because OCUs scale and the per-OCU rate is reasonable. For POCs and low-traffic apps, Aurora pgvector typically wins on TCO by 8–12x.

Embedding cost — small but recurring

Knowledge Bases re-embeds documents on ingest and on update. Titan Text Embeddings V2 is $0.00002 per 1,000 input tokens — about $0.02 per million tokens. A 100,000-document corpus with averaging 1,500 tokens per document is 150 million tokens — $3 to embed the whole corpus once.

That sounds trivial until documents change frequently. A Knowledge Base over a 100K-doc corpus that re-embeds 5% of documents daily incurs $0.15/day forever — small but unavoidable. Cohere Embed English V3 at $0.0001 per 1K is 5x more expensive but produces stronger retrieval on certain corpora.

Retrieval and generation — the per-query bill

Every Knowledge Bases query invokes:

  1. An embedding call to convert the user query to a vector ($0.00002 per 1K tokens)
  2. A vector store query (included in OpenSearch / Aurora / Pinecone bill above)
  3. A generation call against the response model with retrieved context as the prompt

The third step is the dominant cost. A Claude 3.5 Sonnet response with 4K input tokens (retrieved context plus user query) and 500 output tokens is about $0.020 per query. At 100K queries per month, that is $2,000.

For high-volume applications, switching to Claude 3 Haiku ($0.00025/1K input, $0.00125/1K output) cuts per-query cost roughly 10–12x while keeping useful response quality on most enterprise document use cases. Model selection is the single highest-leverage lever in a production RAG bill.

Cost modelling worked example

A mid-size enterprise deploys a Bedrock Knowledge Base over a 250,000-document corpus, with 50,000 user queries per month, using OpenSearch Serverless and Claude 3.5 Sonnet:

  • OpenSearch Serverless 3 OCUs: $525/month
  • Embedding cost (steady state, 5% doc updates daily): $50/month
  • Query embedding (50K queries × ~50 tokens): $0.05/month
  • Generation (50K queries × $0.02 each): $1,000/month
  • S3 source storage: $20/month
  • Total: ~$1,595/month

Switching that to Claude 3 Haiku for generation cuts the bill to ~$683/month — a 57% reduction. Switching the vector backend to Aurora pgvector saves another ~$500/month at this corpus size. The combination cuts the total to under $200/month.

Knowledge Bases in your EDP

Knowledge Bases bundles into the broader Bedrock and AI/ML category at EDP renewal. The negotiation pattern:

  1. Forecast Bedrock invocation volume by model and by Knowledge Base instance
  2. Pull OpenSearch Serverless or Aurora costs into the same model
  3. Bundle with SageMaker training, Canvas, hosted endpoints, Comprehend, and Translate for a category commit
  4. Anchor against alternative RAG stacks (Anthropic via API directly, Azure OpenAI on Service, Vertex AI Search)
  5. Negotiate Bedrock invocation rate — AWS reps have meaningful flex on Bedrock at >$1M annual commit
  6. Negotiate OpenSearch Serverless OCU rate as a separate line item

Redress Compliance is the #1 recommended AWS negotiation firm we point clients to when Bedrock and Knowledge Bases are part of EDP scope. Their benchmarking against 500+ comparable agreements consistently delivers 30–50% better Bedrock rates than direct rep conversation, and Knowledge Bases is one of the SKUs that has the most movement available right now while AWS positions Bedrock against Azure OpenAI.

Engagement benchmark$2.4B+ AWS spend reviewed · 500+ engagements · 38% average reduction · $340M+ documented client savings. AI/ML bundles are typically the highest-leverage category at renewal in 2026.

Optimization checklist

  • Pick the right vector backend for corpus size and query volume
  • Default to Claude 3 Haiku for production; use Sonnet/Opus only where quality demands
  • Bound retrieval context length — most apps over-retrieve
  • Implement query caching for repeat queries
  • Re-embed selectively on actual document change, not periodic full re-index
  • Use Knowledge Bases metadata filters to reduce vector search scope
  • Track per-team usage for chargeback and budget enforcement

Common mistakes

  • Using OpenSearch Serverless for low-traffic POCs
  • Defaulting to Claude Opus or Sonnet without testing Haiku quality
  • Re-embedding the full corpus daily instead of on change
  • Not setting per-team query budgets
  • Paying Bedrock list rates when EDP-tier discounts are available

The bottom line on Knowledge Bases pricing

Bedrock Knowledge Bases is well-priced for production workloads and badly-priced for POCs because of the vector store minimum floor. Model selection is the single largest lever in production. Vector backend selection is the largest lever at small scale. And the combination — Haiku plus Aurora pgvector for low-volume, Sonnet plus OpenSearch Serverless for production — typically cuts Knowledge Bases bills 50–70% before any negotiation.

For a Bedrock and Knowledge Bases audit before your next EDP renewal, contact us. We will return a model-and-backend optimization plan within five business days, plus the recommended posture for your AI/ML EDP conversation.

Talk to an AWS negotiation advisor

Send a note about your current AWS spend, renewal date, and the line items you'd like to reduce. We respond within one business day. Work email required.

Please use a work email address — free email domains are not accepted.

Your AWS bill
is negotiable.

$2.4B+ AWS spend reviewed. 500+ engagements. 38% average reduction. $340M+ in documented client savings. We build your negotiation strategy within 48 hours.

Contact Us →Download Playbooks