- Create a new canary ReplicaSet containing the updated pod template while retaining the existing stable ReplicaSet.
- Coordinate traffic splitting between the stable and canary ReplicaSets by integrating with an ingress controller or service mesh (for example, NGINX Ingress, Istio VirtualService, SMI traffic split providers, Linkerd, or other supported controllers).
- Execute automated analysis by creating an AnalysisRun from an AnalysisTemplate referenced in the Rollout. AnalysisRuns can:
- Query metrics from providers such as Prometheus, Datadog, or other metric backends.
- Launch Kubernetes Jobs to perform custom validation, smoke tests, or synthetic checks.
- Evaluate AnalysisRun results and the rollout strategy (weights, pauses, promotion rules). Based on the decision logic, the controller will either:
- Progress the canary by increasing traffic weights until it becomes the stable ReplicaSet, or
- Roll back to the previous stable ReplicaSet if the analysis indicates a failure.
| Component | Role | Typical Use Cases |
|---|---|---|
| Rollout resource | Declarative spec describing strategy (canary, blue/green), steps, traffic routing, AnalysisTemplates | Define the desired release strategy and thresholds |
| Rollout controller | Reconciles Rollout resources; manages ReplicaSets and AnalysisRuns | Implements the rollout state machine and decision logic |
| ReplicaSets | Represent stable and canary pod templates | Keep previous version available and run the new version side-by-side |
| Traffic routing integration | Adjusts traffic split via Ingress or service mesh | Gradual traffic shift by weight or route manipulation |
| AnalysisRun / AnalysisTemplate | Reusable analysis definitions that fetch metrics or run validation jobs | Gate promotion based on metrics and custom checks |
| Decision logic | Evaluates analysis outcomes and strategy rules | Continue rollout, promote canary, or trigger rollback |
- Traffic orchestration: The controller does not itself proxy traffic. Instead, it integrates with a supported provider that performs weighted routing or route patching to direct a percentage of user requests to the canary.
- Automated analysis: AnalysisRuns let you run metric queries (Prometheus, Datadog, etc.) or Jobs to validate application health and user experience before promoting the canary.
- Promotion and rollback: Promotion happens when defined success criteria are met. Rollback is triggered automatically if metrics or tests fail, minimizing user impact.
Make sure the cluster has the required CRDs and RBAC configured for Argo Rollouts, and that any external metric providers used by AnalysisTemplates are reachable from the cluster.
Traffic routing behavior depends on the integration you choose (Ingress vs. service mesh). Ensure the chosen provider is supported and properly configured before running automated canaries.
- Start small: Use small initial canary weights and short, monitored increments to limit blast radius.
- Automate validation: Use AnalysisTemplates to codify checks (latency SLOs, error rate thresholds, end-to-end smoke tests).
- Observe and alert: Integrate with observability tooling (Prometheus, Grafana, Datadog) and configure alerts for AnalysisRun failures.
- Validate routing: Test the chosen traffic provider in a non-production environment to confirm that weighted routing behaves as expected.
- Argo Rollouts documentation
- Kubernetes Documentation
- NGINX Ingress Controller
- Istio VirtualService
- Prometheus
- Datadog