AZ-400: Designing and Implementing Microsoft DevOps Solutions
Maintain Pipelines
Discovering Pipeline Health including monitoring Failure Rate Duration Flaky Tests
Welcome to this guide on Discovering Pipeline Health in Azure DevOps. In modern CI/CD workflows, maintaining pipeline health ensures reliable, efficient software delivery. Whether you’re preparing for the AZ-400 exam or optimizing your DevOps practice, monitoring key pipeline metrics is essential.
Pipeline health monitoring means regularly tracking metrics to catch issues early, reduce failures, and accelerate feedback loops.
Key Metrics for Pipeline Health
Track these three primary metrics to understand your pipeline’s reliability, speed, and stability:
- Failure Rate: Percentage of runs that fail.
- Duration: Total time from commit to deployment.
- Flaky Tests: Tests with intermittent pass/fail results.
Metric | Definition | Impact |
---|---|---|
Failure Rate | % of runs that end in error or fail | High rate → Low confidence & slower delivery |
Duration | Time taken for pipeline execution | Long durations → Delayed feedback & releases |
Flaky Tests | Tests that pass/fail unpredictably | Flakes → Noise in test results & maintenance |
$ yarn test
127 passing (19m)
1 failing
1) Some test suite
Some test:
Note
Review your test run logs regularly. Early detection of failures or performance issues helps maintain pipeline health.
Monitoring Failure Rate
Failure rate measures how often your builds or deployments do not succeed. A low failure rate indicates a stable pipeline, whereas a consistently high failure rate signals deeper issues.
Common causes of high failure rates:
- Unstable dependencies
- Misconfigured environments
- Incomplete or missing test coverage
Warning
A sustained high failure rate can block releases and erode trust in your CI/CD process. Investigate immediately.
Monitoring Duration
Pipeline duration affects how quickly developers receive feedback. Long-running pipelines slow down releases and lower developer productivity.
Use Azure DevOps Analytics to track duration and pinpoint bottlenecks:
- In Azure DevOps, go to Pipelines.
- Select the Analytics tab.
- Install the Analytics extension if required.
Mitigating Flaky Tests
Flaky tests are those that pass or fail without code changes. They undermine confidence in your test suite and waste developer time.
Strategies to Handle Flaky Tests
Identify and Isolate
Use test-history tracking in CI to flag and quarantine flaky tests.Investigate Root Causes
Examine environmental factors (databases, networks), concurrency issues, or timeouts.Improve Test Design
- Use stable test data (avoid timestamps).
- Mock external dependencies.
- Assert specific outputs only.
Automatic Reruns
Configure retries for failed tests to distinguish real failures from flakes.Prioritize and Quarantine
Move flaky tests into a quarantine suite and schedule regular fixes.Select Robust Tools
Choose test frameworks and CI platforms with built-in stability and analytics.
Azure Test Plans for CI/CD
Azure Test Plans offers integrated test management and analytics to unify test efforts in your pipelines.
Key benefits:
- Integration with Azure Pipelines for live insights
- Collaboration via shared test plans and results
- Traceability linking tests to builds and releases
By leveraging Azure Test Plans, teams can manage flaky tests effectively and keep CI/CD processes stable.
Best Practices for Pipeline Health
- Continuously monitor failure rates, durations, and flaky tests
- Triage and resolve issues as they arise
- Optimize build scripts and test suites
- Use Azure DevOps tools for feedback loops and ongoing improvements
Links and References
- Azure DevOps Analytics Overview
- Analytics extension for Azure DevOps
- Azure Test Plans Documentation
- AZ-400: Designing and Implementing Microsoft DevOps Solutions
Watch Video
Watch video content