Chaos Engineering
Chaos Engineering on Availability Zone
Demo Prepare Experiment AZ
In this article, you’ll learn how to create an AWS Fault Injection Simulator (FIS) experiment template that simulates an Availability Zone (AZ) power interruption for an Amazon Aurora database. You will:
- Identify the AZ of your Aurora writer instance
- Select and configure the appropriate FIS scenario
- Limit the blast radius with resource tags
- Define advanced parameters for timing
- Set up CloudWatch Logs for monitoring
1. Identify the Target AZ for Your Aurora Writer
- Open the Amazon RDS console.
- Select your Aurora cluster.
- Note the Availability Zone (AZ) where the writer instance is running (e.g., us-east-1a).
2. Select the AZ Availability: Power Interruption Scenario
- Navigate to the AWS Fault Injection Simulator (FIS) console.
- Click Scenario Library to browse pre-built scenarios.
- Choose AZ Availability: Power Interruption.
3. Review Scenario Actions and Targets
The AZ Availability: Power Interruption template orchestrates the following actions in sequence:
- Pause EC2 instance launches
- Pause EBS I/O
- Adjust Auto Scaling Group (ASG) scaling
Targets include IAM roles, EC2 instances, and EBS volumes in the specified AZ.
4. Configure Scenario Parameters
Affected Availability Zone & IAM Role
- Set Affected Availability Zone to your Aurora writer’s AZ (e.g.,
us-east-1a
). - Specify the IAM role that FIS will assume to manage EC2 and EBS resources.
Targeting Tags
Limit the blast radius by targeting only resources with predefined tags.
Tag Key | Tag Value |
---|---|
AZ-impairment | power-ready |
Note
Ensure your EC2 instances and EBS volumes are already tagged with these keys and values before running the experiment.
Advanced Timing Parameters
Align these durations with your application’s RPO (Recovery Point Objective) and RTO (Recovery Time Objective).
Phase | Description | Duration |
---|---|---|
DNS Impact | Time for DNS propagation effects | 2 minutes |
Outage | Simulated power-off window | 10 minutes |
Recovery | Time to restore power | 5 minutes |
5. Configure Logging to CloudWatch
- Under Log group, search for
fis
. - Select the CloudWatch Logs group where FIS will send experiment logs.
6. Create and Review the Experiment Template
- Verify all parameters and targets are correct.
- Capture baseline (steady-state) metrics for comparison.
- Click Create experiment template.
Warning
Chaos experiments can disrupt production workloads. Always test in a staging environment and monitor your application during the injection.
Links and References
Watch Video
Watch video content