Using Azure AI Services for Enterprise Applications
Monitoring Azure AI Services
Guide to monitoring Azure AI services using metrics, logs, diagnostics, and alerts.
Monitoring Azure AI services ensures performance, reliability, and security for production workloads. Azure provides integrated monitoring features—Metrics, Diagnostic Settings, Logs, and Alerts—that help you track health, analyze behavior, and respond to incidents. This guide shows where to find those features in the Azure portal and how to use them effectively.Key monitoring components at a glance:
Component
Purpose
Typical use case
Alerts
Notify or automate when conditions occur
Notify on usage spikes, trigger remediation runbooks
Metrics
Numeric, time-series measurements (e.g., response time, request count)
Real-time dashboards and trend analysis
Diagnostic Settings
Configure export of logs and platform metrics to destinations
Centralize logs to Log Analytics, archive to Storage, or forward to Event Hubs
Logs
Detailed, timestamped records for auditing and troubleshooting
Forensics, compliance, and custom alerting with KQL
This article walks through the Azure portal to locate these features for an Azure AI (Cognitive) resource (for example, Language or Vision services), and explains how to combine them for operational monitoring and security posture.
Steps to view metrics for a Cognitive Services / Azure AI resource:
In the Azure portal, open your Cognitive Services or specific Azure AI resource (e.g., Language service).
In the left-hand menu navigate to Monitoring > Metrics.
Select a metric (Total Calls, Latency, Throttled Requests, etc.), choose aggregation (Sum, Average, Count), and set the time range.
Add additional metrics to the chart to compare trends and spot correlations.
Metrics are ideal for dashboards, real-time monitoring, and identifying sudden spikes or gradual performance degradation.Example: viewing the “Total Calls” metric for a deployed language resource:
Tips:
Combine related metrics (e.g., Total Calls + Throttled Requests) to detect capacity or throttling issues.
Pin metrics charts to Azure dashboards for consolidated operational views.
Use appropriate aggregations for your scenario (Sum for totals, Average/Percentile for latency).
Diagnostic settings determine where resource logs and metrics are exported for deeper analysis, retention, or integration with SIEMs.What diagnostic settings can export:
Resource logs: request/response logs and resource-specific events.
Platform metrics (where applicable).
Activity and audit logs for Azure AI capabilities (for example, Azure OpenAI request usage).
Destinations supported:
Destination
Use case
Log Analytics workspace
Query and analyze logs with Kusto Query Language (KQL); build custom dashboards and log alerts
Storage account
Long-term archival and compliance retention
Event Hub
Stream logs to third-party analytics or SIEMs (Splunk, external systems)
Partner solutions
Forward to available partner monitoring/analytics integrations
Choose the log categories you need (Audit Logs, Request and Response Logs, Trace Logs, Azure OpenAI Request Usage, etc.).
Select one or more destinations (Log Analytics, Storage, Event Hub, Partner).
Save the diagnostic setting.
Diagnostic Settings do not automatically send logs anywhere — you must create a diagnostic setting and choose a destination (Log Analytics, Storage, Event Hub, etc.) to collect logs for analysis and retention.
Logs stored in Log Analytics are queryable using Kusto Query Language (KQL). Use KQL to:
Perform forensics and investigations (who accessed what and when).
Satisfy compliance and retention requirements.
Create custom dashboards and log-based alert rules.
Carefully consider data sensitivity before exporting request/response logs. Avoid sending Personally Identifiable Information (PII) or secrets to destinations unless you have proper data governance and encryption in place.
Metrics: Best for real-time numeric monitoring and dashboards.
Diagnostic Settings + Logs: Centralize and retain logs for deep analysis, compliance, and alerting using KQL.
Alerts: Bridge monitoring and operations by notifying teams and invoking automated responses.
A recommended monitoring approach:
Enable Metrics and pin key charts to an Azure dashboard.
Configure Diagnostic Settings to send resource logs to a Log Analytics workspace (and archive critical logs to Storage).
Create log- and metric-based alerts for operational and security thresholds.
Automate common remediations via Action Groups connected to Logic Apps or Functions.
Regularly review dashboard trends, alert history, and log queries to refine detection and reduce noise.
Monitoring is the foundation for keeping Azure AI services reliable, performant, and secure. With metrics, diagnostics, logs, and alerts configured, you can build dashboards, runbooks, and automated responses to maintain service health.