- Target audience: Developers, data engineers, content managers, and ML practitioners who want to automate video indexing and enhance search, compliance, and content understanding.
- Scope: Analysis and integration patterns using Azure Video Indexer (portal and REST APIs). Account provisioning and subscription setup are out of scope; see References for links.
| Capability | What you’ll achieve | Example outcome |
|---|---|---|
| Automated analysis (portal & APIs) | Run video analysis jobs using the Video Indexer portal or programmatically via REST APIs/SDKs. | Submit a video and retrieve a JSON insights payload with timestamps and confidence scores. |
| Extract transcripts and identify people | Obtain speaker-attributed transcripts, named people recognition, and detected faces. | Search spoken phrases and link them to detected speakers or face thumbnails. |
| Detect emotions, topics, and faces | Surface emotional cues, topics, and facial metadata for content classification and moderation. | Tag scenes with detected emotions and generate topic labels for content categorization. |
| Use OCR, speaker diarization, and scene/shot detection | Extract on-screen text, separate speakers (diarization), and segment content into scenes/shots for navigation. | Enable time-aligned search and create chapter markers for long videos. |
| Enrich models with domain-specific vocabulary | Add custom vocabulary or domain models to improve recognition for specialized terminology. | Improve speech-to-text accuracy for industry jargon or product names. |
| Integrate outputs into apps & pipelines | Consume Video Indexer results to power search, analytics dashboards, compliance workflows, or automated clips. | Feed insights into a search index, CMS, or downstream ML pipeline. |
Before you begin, make sure you have access to an Azure subscription and a Video Indexer account (or appropriate API keys). This module concentrates on analysis workflows, so ensure account provisioning is complete before following the walkthroughs.
- Concepts: Core features of Azure Video Indexer and key terms (transcript, diarization, OCR, shot detection, content moderation).
- Hands-on: Uploading videos (portal & API), retrieving insights JSON, and parsing common output fields.
- Customization: Using custom vocabulary and domain models to improve results.
- Integration patterns: Examples for search, CMS enrichment, analytics, and automation pipelines.
- Transcript: Time-aligned speech-to-text output, often with speaker attribution.
- Speaker diarization: The process of separating and labeling different speakers in audio.
- OCR: Optical character recognition used to extract text from video frames.
- Scene/shot detection: Automatic segmentation of video into logical scenes and shots.
- Insights JSON: The structured output produced by Video Indexer containing recognized entities, timestamps, and metadata.