AI-900: Microsoft Certified Azure AI Fundamentals
Azure Computer Vision Capabilities
AI Vision Service and Image Analysis
Welcome to this lesson on the Azure AI Vision Service and Image Analysis. In this guide, we explore the extensive capabilities of Azure AI Vision, including generating descriptive captions, object detection, OCR, and more. By the end of this lesson, you'll have a deep understanding of these features and their practical applications for various domains.
Overview of AI Vision Capabilities
Azure AI Vision Service empowers you to analyze and interpret images with intelligence. Here are some of its key features:
- Descriptive Captioning and Tagging: Automatically generate captions like "a group of people walking on a sidewalk" along with relevant tags such as building, jeans, street, outdoor, jacket, etc.
- Object and People Detection: Identify objects (e.g., jeans, footwear) and detect individuals within images, making it ideal for scenarios like security and surveillance.
- Text Extraction (OCR): Extract text from images whether they are printed, handwritten, or digitally rendered.
- Smart Crop: Automatically crop images to highlight key areas, ensuring optimal display in websites, social media, or e-commerce platforms.
Note
Azure AI Vision Service can be customized to suit industry-specific needs, making it a powerful tool for various applications such as healthcare, retail, and security.
Detailed Capabilities
Model Customization
Tailor image analysis models to meet specific scenarios. For instance, organizations handling medical images can adjust models to detect medical instruments or conditions, vastly improving analysis accuracy.
Reading Text from Images (OCR)
Leverage Optical Character Recognition (OCR) to extract text from different image types. Whether digitizing printed literature, business documents, or handwritten notes, OCR converts content into editable text, streamlining workflows across many industries.
Detecting People in Images
The service can locate and identify people in images, making it highly suitable for crowd monitoring, security surveillance, and retail analytics. The detection is precise enough to outline individuals, aiding in both head counts and behavior analysis.
Generating Image Captions
Automatically generate descriptive captions for images. Instead of manually labeling visuals, use this feature to produce captions like "a group of people walking on a sidewalk" to improve accessibility and content organization.
Detecting and Labeling Objects
Automatically identify and label objects in images for use cases such as inventory management or autonomous systems. For example, the service can label books, shoes, or backpacks, enhancing the overall identification process.
Tagging Visual Features
Apply descriptive tags to various elements within an image. Tags such as trees, bench, or lake for a park image help in efficiently categorizing and retrieving images from large datasets.
Smart Crop
Automatically crop images to emphasize the most significant areas. This feature is especially useful for generating attractive thumbnails and ensuring that key subjects in an image are prominently displayed.
Working with Azure AI Vision in the Azure Portal
Start by accessing the Azure Portal and navigating to Azure AI Services. Although it's possible to deploy the Computer Vision Service independently, this demonstration uses an AI Services deployment that includes all necessary components.
Once your AI Services are deployed, open the AI Studio—a centralized hub offering access to all AI services including image generation and OpenAI models.
Navigating the AI Studio Vision Section
Within the AI Studio, the Vision section is organized into several key areas:
- Document: Dedicated to OCR tasks.
- Face: Focused on face detection.
- Image: Encompasses all image-related capabilities, including captioning, object detection, and more.
The following diagram illustrates the "Vision + Document" section, showcasing features such as object detection, image captioning, and OCR.
Object Detection
Within AI Studio, the object detection feature recognizes and locates items within an image. To test this feature:
- Select or create a hub.
- Choose an image from the available samples or upload your own.
- The system will detect objects (e.g., person, laptop, seating, table) and display attributes accordingly.
For example, you might see attributes like Taxi when processing a photo. Adjusting the threshold value allows you to refine detection confidence levels.
Another sample demonstrates an image with three people sitting on a couch with a laptop on a table.
Image Captioning
Leveraging the Image section, the captioning feature enables automatic descriptive captions. For example, hubs might generate captions such as "a statue of a woman holding a scale on top of a building" or describe a pile of fruits accurately. Sample code is provided below for integrating these capabilities programmatically using the SDK and REST API with Python:
httpRequest.Content = new StreamContent(stream);
var json = JsonSerializer.Serialize(new AnalyzeRequest() { Url = imageUrl });
httpRequest.Content = new StringContent(json, Encoding.UTF8, MediaTypeHeaders.ContentType);
var client = new HttpClient();
var response = await client.SendAsync(httpRequest).ConfigureAwait(false);
var result = await response.Content.ReadAsStringAsync();
var deserializedObject = JsonSerializer.Deserialize<AnalyzeResult>(result);
Console.WriteLine($"Model Version: {deserializedObject.ModelVersion}");
Console.WriteLine($"Metadata: {JsonSerializer.Serialize(deserializedObject.Metadata)}");
Console.WriteLine($"Caption Result: {deserializedObject.CaptionResult}");
// Define the AnalyzeRequest class
public class AnalyzeRequest
{
public string Url { get; set; }
}
Dense Captioning
Dense Captioning generates detailed, human-readable captions for all significant objects in your image. For instance:
- A person holding a sprig of rosemary.
- A person cutting a sprig of rosemary.
- A city street with many buildings and cars.
- Yellow taxi cabs on a street.
Image Search
The Image Search feature allows you to query a collection of images similar to popular photo management systems. For example, searching for "rocky beaches" among a collection of 260 images retrieves all relevant photos. You can also adjust relevance settings to optimize search results.
Common Tag Extraction
This feature extracts descriptive tags from images to enhance categorization. For example, after uploading an image, you might receive tags like "sports person," "skateboarder," "individual sports," or "street stunts." For a seamless experience, create a hub and link it to your Azure AI service.
Optical Character Recognition (OCR)
OCR is used for extracting both printed and handwritten text from images and documents. This is especially useful for verifying identification documents or digitizing various texts (e.g., nutrition labels showing "sodium 20 mg daily" or "vitamin A 50%").
Additional Vision Studio Features
In addition to the functionalities discussed above, Vision Studio offers several other features:
- Smart Crop for automated image adjustments.
- Generating human-readable captions for enhanced accessibility.
- Detecting common objects to streamline analysis.
- Extracting text and performing portrait processing.
- Matching photo IDs for verification purposes.
Important
Please note that the multi-account AI Services deployment does not support Vision Studio features directly. To use Vision Studio, create a dedicated Custom Vision service. Similar segregation applies for other domains (e.g., Speech Studio requires a separate Speech service).
The Vision service also offers a free tier, allowing you to experiment with these features without incurring costs. Once the service is set up, you can easily integrate it with Vision Studio for a seamless experience.
Conclusion
This lesson provided a comprehensive overview of the powerful capabilities available in Azure AI Vision Service—from model customization and OCR to object detection, image captioning, dense captioning, tag extraction, and more. Additionally, we've highlighted the integration paths available via AI Studio, illustrating practical use cases including face detection and image search.
Happy exploring!
For more details on Azure AI services, check out the Azure AI Documentation.
Watch Video
Watch video content