Azure AI Document Intelligence: automatic data extraction from documents
João Barros
01 de August de 2025
2 min read
Azure AI Document Intelligence (formerly Form Recognizer) uses AI models to extract text, tables and key fields from documents — invoices, receipts, contracts, forms — with high accuracy and no fixed template.
Available prebuilt models
prebuilt-invoice → invoices (fields: VendorName, InvoiceDate, TotalTax, ...)
prebuilt-receipt → receipts (MerchantName, TransactionDate, Total, ...)
prebuilt-idDocument → ID/passport (FirstName, LastName, DocumentNumber, ...)
prebuilt-businessCard → business cards (ContactNames, Emails, ...)
prebuilt-read → generic text extraction with structure
prebuilt-layout → text + tables + selection marks
Analyze an invoice
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
client = DocumentAnalysisClient(
endpoint=os.environ["DOC_INTEL_ENDPOINT"],
credential=AzureKeyCredential(os.environ["DOC_INTEL_KEY"])
)
with open("invoice.pdf", "rb") as f:
poller = client.begin_analyze_document("prebuilt-invoice", f)
result = poller.result()
for invoice in result.documents:
fields = invoice.fields
print(f"Vendor: {fields.get('VendorName').value}")
print(f"Date: {fields.get('InvoiceDate').value}")
print(f"Total: {fields.get('InvoiceTotal').value.amount} {fields.get('InvoiceTotal').value.currency_symbol}")
print(f"Confidence: {fields.get('InvoiceTotal').confidence:.0%}")
Custom model — train on your documents
# 1. Upload 5+ sample documents + labels in Document Intelligence Studio
# 2. Train the custom model (3-5 minutes)
# 3. Use the generated model_id:
poller = client.begin_analyze_document(
model_id="bconcepts-contracts-model",
document=f
)
result = poller.result()
# Fields defined in the labels: ContractNumber, StartDate, AnnualValue, etc.
Integrate into a processing pipeline
# Power Automate: HTTP POST to the Document Intelligence API
# → Parse the JSON response → Save fields to SharePoint / Dataverse / SQL
# Or: an Azure Function triggered by Blob Storage → processes each PDF on arrival
Conclusion
Document Intelligence eliminates the manual work of extracting data from documents. With prebuilt models, there is no OCR code to write — just call the API and process the structured JSON. For organization-specific documents, custom models reach 95%+ accuracy with just 5 training examples.