Guide to using Azure AI Vision to analyze images via REST and SDKs, configure analysis options, and parse structured results like captions, objects, OCR, and smart crops.
In this lesson we’ll explore how to analyze images with Azure AI Vision. You’ll learn what goes into image analysis, how to call the service via the REST API and SDKs (C# and Python), how to configure analysis options, and how to parse the structured responses the service returns.We cover:
What the Analyze API returns (captions, detected objects and people, OCR/read, smart crops, etc.)
How to select Visual Features to limit and focus the response
SDK usage patterns and a full Python example to parse results
REST usage patterns and a sample query string for the Analyze endpoint
Practical options (smart crops, language, gender-neutral captions, model versioning)
Key aspects of image analysis are summarized in the table below.
features — comma-separated visual features to return (example: caption, people, objects, read, smartCrops).
model-name — model to use (e.g., latest or a specific version).
language — language for captions / OCR results.
api-version — service API version.
You include the image either as:
an image URL in the JSON request body, or
raw image bytes in the request body (binary upload).
The service responds with structured JSON containing captionResult, objectsResult, peopleResult, smartCropsResult, tagsResult/read results, metadata, and modelVersion.
Replace endpoint and key values below with the endpoint and key from your Azure AI service (Keys and Endpoint in the Azure portal). Never commit production keys into source control.
A consolidated, practical Python example demonstrating initialization, choosing visual features, calling analysis, and parsing results safely:
Copy
# pythonimport jsonfrom azure.ai.vision.imageanalysis import ImageAnalysisClientfrom azure.ai.vision.imageanalysis.models import VisualFeaturesfrom azure.core.credentials import AzureKeyCredential# Replace with your service valuesendpoint = "https://<your-endpoint>.cognitiveservices.azure.com/"key = "<your-key>"# Example image URL (replace as needed)image_url = "https://azai102imagestore.blob.core.windows.net/images/young-smiling-happy-cheerful-owner-600nw-2397244269.webp"# Initialize clientclient = ImageAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key))# Select featuresvisual_features = [ VisualFeatures.TAGS, VisualFeatures.OBJECTS, VisualFeatures.CAPTION, VisualFeatures.DENSE_CAPTIONS, VisualFeatures.READ, VisualFeatures.SMART_CROPS, VisualFeatures.PEOPLE,]# Request analysisresult = client.analyze_from_url( image_url=image_url, visual_features=visual_features, smart_crops_aspect_ratios=[0.9, 1.33], gender_neutral_caption=True, language="en")# Format SDK response into a dict for flexible parsingtry: result_dict = result.as_dict() if hasattr(result, "as_dict") else dict(result) print("Raw response as formatted JSON:") print(json.dumps(result_dict, indent=2)) print("\n")except Exception as e: print(f"Could not format result as JSON: {str(e)}") print(f"Raw response: \n {result} \n\n") result_dict = {}# Helper: safe nested getterdef safe_get(d, *keys, default=None): for key in keys: if isinstance(d, dict) and key in d: d = d[key] else: return default return d# Parse peoplepeople = safe_get(result_dict, "peopleResult", "values", default=[])if people: print("People detected:") for person in people: bbox = person.get("boundingBox", {}) confidence = person.get("confidence", 0) print(f" Bounding Box: x={bbox.get('x')}, y={bbox.get('y')}, width={bbox.get('w')}, height={bbox.get('h')}") print(f" Confidence: {confidence:.2f}")else: print("No people detected.")# Parse captioncaption = safe_get(result_dict, "captionResult")if caption: text = caption.get("text", "") conf = caption.get("confidence", 0) print(f"\nCaption: {text} (Confidence: {conf:.2f})")# Parse tags (some responses use tagsResult.values or tags)tags = safe_get(result_dict, "tagsResult", "values", default=None)if tags is None: tags = safe_get(result_dict, "tags", default=None)if tags: print("\nTags:") for t in tags: name = t.get("name") or (t.get("tag", {}) or {}).get("name") confidence = t.get("confidence", 0) print(f" {name}: {confidence:.2f}")# Parse objectsobjects = safe_get(result_dict, "objectsResult", "values", default=[])if objects: print("\nObjects:") for obj in objects: name = obj.get("name") or "object" confidence = obj.get("confidence", 0) bbox = obj.get("boundingBox", {}) print(f" {name}: bbox={bbox}, confidence={confidence:.2f}")# Metadata & model versionmetadata = safe_get(result_dict, "metadata", default={})if metadata: print(f"\nImage width: {metadata.get('width')}, height: {metadata.get('height')}")print(f"Model version: {result_dict.get('modelVersion')}")
This script:
Initializes ImageAnalysisClient with your endpoint and key.
Chooses the visual features to analyze.
Calls analyze_from_url with optional analysis options.
Prints the raw JSON response and demonstrates robust parsing of common result sections (people, caption, tags, objects).
Protect your API keys: rotate keys regularly, store secrets in a secure vault (e.g., Azure Key Vault), and avoid hard-coding secrets in source control.
Provision an Azure AI service in the Azure portal. Use the Keys and Endpoint values from the portal for your client.
Use blob storage URLs or public URLs for images. For private images, upload binary image bytes in the request body.
Gender-neutral captions help avoid gender assumptions in generated text (e.g., “a person hugging a dog”).
Smart crops return bounding boxes for the aspect ratios you specify—use these to create thumbnails that preserve important content.
Pin model versions for reproducible results; use “latest” for new features and model improvements.
Always validate and sanitize service outputs before surface-level display in production applications.
Example parsed output (illustrative):
Copy
People detected! Bounding Box: x=164, y=21, width=329, height=378 Confidence: 0.94Caption: a person hugging a dog (Confidence: 0.89)Tags: pet: 0.90 golden retriever: 0.89Model version: latest
This guide demonstrates how to call Azure AI Vision to obtain captions, detect objects and people, extract OCR text, and request smart crops. Use these structured outputs to annotate images, drive UI decisions (smart cropping), and provide accessible descriptions for your applications.