Guide to using Azure AI Vision Read API for OCR, comparing Read and Document Intelligence, explaining JSON output structure and showing a Python example for extracting text and coordinates
Welcome to this lesson on OCR (Optical Character Recognition) with Azure AI Vision. This guide explains how Azure extracts text from images — from photos of signs and labels to scanned documents and handwritten notes — using the Read capability. You’ll learn when to use the Read API versus Document Intelligence, how the JSON output is structured, and how to call the Read feature with a concise Python example.
Choose the right service based on document complexity and scale.
Feature
Best for
Notes
Vision Read
Printed or handwritten text in images, short notes, signs, labels
Typically synchronous and optimized for smaller images and short text blocks
Document Intelligence
Multi-page PDFs, invoices, receipts, structured forms
Richer parsing, field extraction, and often asynchronous for larger workloads
Document Intelligence is intended for richer document processing (forms, invoices, multi-page PDFs). For short images or quick handwritten notes, the Read feature is usually sufficient and simpler to integrate.
The Read API returns structured JSON where each line includes a bounding polygon and words with their own polygons and confidence scores. This abbreviated example demonstrates the typical nesting and coordinates you can expect:
Sample Python code to call Read (ImageAnalysisClient)
The following Python snippet demonstrates calling the Read feature using ImageAnalysisClient, converting the SDK result to a dictionary, and extracting text from readResult → blocks → lines. Ensure you set endpoint and key variables and install required Azure SDK packages.
Copy
from azure.ai.vision import ImageAnalysisClient, VisualFeaturesfrom azure.core.credentials import AzureKeyCredentialimport json# Image URLimage_url = "https://azai102imagestore.blob.core.windows.net/images/note.webp"# Initialize the clientclient = ImageAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key))# Analyze the image for read (OCR)result = client.analyze_from_url( image_url=image_url, visual_features=[VisualFeatures.READ])try: # Convert to a dictionary if the SDK result supports as_dict() result_dict = result.as_dict() if hasattr(result, "as_dict") else result # Extract text from readResult -> blocks -> lines extracted_text = "" if "readResult" in result_dict and "blocks" in result_dict["readResult"]: for block in result_dict["readResult"]["blocks"]: for line in block.get("lines", []): extracted_text += line.get("text", "") + "\n" print("Extracted Text:\n" + extracted_text) # Optionally print the raw JSON (pretty) print("Raw JSON output:") print(json.dumps(result_dict, indent=2))except Exception as e: print("Error analyzing image:", e)
Keep your Azure endpoint and key secure. Do not hard-code secrets in source files; use environment variables or a secrets manager.
Running the script prints the extracted human-readable text and, optionally, the full JSON structure with model version, metadata, blocks, lines, words, bounding polygons, and confidence scores.Example printed output:
This lesson used a photographed handwritten birthday note. The Read API extracted the content and returned bounding polygons and confidence values for each recognized word, enabling UI overlays or further processing.
Use Vision Read for extracting printed or handwritten text from images (short notes, signs, labels).
Use Document Intelligence for multi-page, structured, or complex document extraction tasks (receipts, invoices, contracts).
The Read API returns hierarchical JSON (blocks → lines → words) with bounding polygons and confidence scores — ideal for UI overlays and downstream analytics.
The provided Python example shows a straightforward integration pattern to call the Read API and extract text.