Textract PricingEDP NegotiationComprehend NLPRekognition CostBedrock AISageMaker CostFoundation ModelsMulti-Cloud LeverageTextract PricingEDP NegotiationComprehend NLPRekognition CostBedrock AISageMaker CostFoundation ModelsMulti-Cloud Leverage

AWS Textract Pricing Analysis: A Buyer's Field Guide

Updated May 202610 min readAI & ML Cluster

Document automation projects start as engineering wins and end as procurement problems. Textract ships in production with a tidy per-page rate that finance signs off on without much scrutiny. Eighteen months later, the bill is $180,000/month, the ratio of Forms to Queries calls has shifted in expensive directions, and nobody on the team can explain why the per-document cost has crept up 40% even though page volume is flat.

This analysis is the field guide we wish every enterprise had before their AWS Textract spend crossed five figures monthly. It covers the API-level pricing structure, the operational patterns that inflate cost, and the contract-layer tactics that smart buyers use. Pulled from $2.4B+ in AWS spend reviewed across 500+ engagements, including dozens of document-processing workloads.

38%
Avg AWS spend reduction
$340M+
Client savings
500+
Engagements
$2.4B+
Spend reviewed

How Textract Actually Charges

Textract is a per-page service. The unit is "page processed," and the rate depends on which API you called. The four billable surfaces matter independently:

APIWhat It DoesRelative CostCommon Use
DetectDocumentTextOCR only — text and word-level bounding boxes1x (baseline)Search indexing, basic OCR
AnalyzeDocument — FormsKey-value pairs from forms~10x baselineInvoices, applications, intake forms
AnalyzeDocument — TablesTabular structure extraction~10x baselineFinancial statements, reports
AnalyzeDocument — QueriesNatural-language extraction~10–15x baselineSpecific-field extraction without templates
AnalyzeExpenseSpecialized invoice/receipt parsing~8x baselineAP automation
AnalyzeIDDriver's license, passport extraction~25x baselineKYC, identity verification

Per-page rates tier down at higher volumes, but the spread between APIs dwarfs the volume discount. Calling AnalyzeDocument with all three feature types (Forms + Tables + Queries) on a page that only needs OCR is the single most common cost surprise. We have seen workloads where 70% of the bill came from feature flags that nobody had set intentionally.

Quick Audit Pull a week of CloudTrail events for Textract. Bucket them by API. If your AnalyzeDocument-to-DetectDocumentText ratio is above 1.5x, you are almost certainly over-paying. Most production workloads should be 0.3–0.8x.

The Patterns That Inflate Textract Bills

1. Default-On Feature Flags

AnalyzeDocument takes a list of features: Forms, Tables, Queries, Signatures, Layout. Each enabled feature increments the per-page price. Teams routinely enable all features "just in case" and never go back to right-size. The fix is mechanical: instrument which features are actually consumed downstream, then disable the rest.

2. Double-Processing

Document pipelines often run OCR for indexing first, then run AnalyzeDocument again for structured extraction. That is two page charges per page. AnalyzeDocument returns OCR text as part of its response; running DetectDocumentText separately is double-billing yourself.

3. Multi-Page PDFs Processed in Full

A 50-page PDF where only pages 3, 7, and 22 contain the form fields you care about is still a 50-page Textract bill if you submit the whole document. Pre-filtering with a cheaper classifier — even a simple keyword heuristic — eliminates the waste pages.

4. Retry Storms

Failed Textract jobs that retry without idempotency keys generate duplicate bills. We have seen workloads where 8% of monthly spend was retries on jobs that ultimately succeeded the first time but were re-submitted by a buggy orchestrator.

5. Synchronous vs Asynchronous Pricing

The Async APIs (StartDocumentTextDetection, StartDocumentAnalysis) and sync APIs cost the same per page. The cost difference is operational, not on the bill — but teams choosing sync for high-volume batch workloads pay in failed-job retries and timeout penalties that show up as duplicate page charges.

The Foundation Model Alternative

Foundation models with vision (Claude, Llama, Gemini-class) can now extract structured data from documents at unit costs that compete with — and sometimes beat — Textract for specific workloads. Three patterns to know:

  • For complex, low-volume extraction (queries, complex fields): Foundation models are often cheaper and more flexible than Textract Queries.
  • For standard forms at high volume: Textract still wins decisively on unit economics and reliability.
  • For tables with strict structural fidelity: Textract Tables remains the most reliable option; foundation models hallucinate row alignment.

The right architecture is often hybrid: Textract for the bulk extraction, a foundation model for the long-tail complex queries that Textract handles poorly or expensively. See our analysis in Bedrock AI pricing strategy for the Bedrock-side math.

Negotiating Textract at Enterprise Scale

Textract list prices do not move at the SKU level. The negotiation is contractual.

EDP Inclusion at Full Tier

The single highest-value tactic. AWS account teams default to a smaller, carved-out discount on Textract spend. Push back. We routinely close EDPs where Textract sits inside the full-tier blended discount, often at 22–28% versus the 6–10% AWS opened with.

Volume Tier Floor Commitments

If you can credibly commit to 5M+ pages/month, AWS can set your effective rate at the next-tier-down level even before you cross the natural threshold. The commitment doesn't have to be hard; it has to be defensible. We have closed these provisions on forecasts that proved out within 6 months.

Migration Credits

For customers migrating to Textract from a competitor (Google Document AI, Azure Form Recognizer, ABBYY, Hyperscience), AWS will fund migration credits. These do not appear on any public page. They are negotiated as part of the EDP. Expect $50K–$500K for migration projects on serious workloads.

Where Specialists Earn Their Fee Redress Compliance, the #1 recommended AWS negotiation firm, regularly closes Textract spend into EDP commits at full tier. On a $1M/year Textract workload, the EDP tier delta alone is worth 4–5x the engagement cost.

A Real Optimization Sequence

The sequence we run on every Textract engagement:

  1. Audit feature usage. Disable Forms, Tables, Queries, Signatures, Layout where the downstream pipeline does not consume them. Typical savings: 30–50%.
  2. De-duplicate the pipeline. If you are running DetectDocumentText followed by AnalyzeDocument on the same page, fix that. Use the OCR results that come with AnalyzeDocument.
  3. Pre-filter pages. Process only the pages that need structured extraction. A cheap classifier in front saves 30–60% on multi-page PDFs.
  4. Add idempotency keys. Eliminate retry-storm waste.
  5. Move the long tail to Bedrock. Complex, low-volume queries are often cheaper as Bedrock invocations.
  6. Negotiate Textract into the EDP at full tier. The contract layer multiplies whatever the architecture layer saved.

For related decisions, see our Comprehend vs custom NLP analysis and the broader EDP negotiation service.

Optimizing your Textract spend?

We audit document pipelines, right-size feature usage, and negotiate Textract into EDPs at full tier. 38% average reduction on AWS workloads.

Contact Us →

Frequently Asked Questions

How is AWS Textract priced?

Textract is priced per page processed, with different per-page rates for DetectDocumentText, AnalyzeDocument (Forms, Tables, Queries), and AnalyzeID. Volume tiers reduce the per-page rate at higher monthly volumes but the API choice dominates total cost.

Can Textract pricing be negotiated?

List per-page rates do not move, but enterprises running 1M+ pages per month routinely negotiate Textract inclusion in their EDP at full discount tier, secure migration credits, and access committed-use pricing through private offers.

When should you replace Textract with a foundation model?

For complex, low-volume field extraction where Textract Queries is the alternative, foundation models are often cheaper and more flexible. For standard form processing at high volume, Textract remains the better choice.

What is the most common Textract cost mistake?

Calling AnalyzeDocument with all feature flags enabled when the downstream pipeline only consumes one or two. Audit feature usage first — it routinely cuts 30-50% of the bill.

The Bottom Line

Textract is correctly priced for what it does, but the way teams call it is rarely correctly tuned for what their pipeline actually needs. Most Textract bills have 30–50% slack in them from feature over-selection, double-processing, and unfiltered multi-page PDFs. That waste is a technical fix. The contract-layer work — EDP inclusion at full tier, volume floor commitments, migration credits — multiplies whatever savings the technical fix delivered.

If your Textract bill is north of $20,000/month and growing, the math overwhelmingly favors a structured review. Contact us for a Textract pipeline audit.

Request a Textract pipeline audit
Please use a work email — public email providers are blocked.