Chaos Engineering
Building a Basic FIS experiment
Experiment 1 Chaos Engineering on ASG
In this chapter, we’ll design and execute our first AWS Fault Injection Simulator (FIS) experiment against an Auto Scaling Group (ASG). The goal is to validate that terminating a single EC2 instance does not degrade application availability because the ASG will replace it automatically.
Components of the FIS Experiment
Component | Description |
---|---|
Given | We have an application running on EC2 instances spread across multiple Availability Zones, all managed by an Auto Scaling Group. |
Hypothesis | If we terminate one EC2 instance, the Auto Scaling Group will launch a new instance, and the application will continue serving traffic without interruption. |
Note
We use AWS Fault Injection Simulator to safely inject failures and test application resilience. Make sure your IAM role has the required permissions to execute FIS experiments.
Experiment Steps
- Create FIS Experiment Template
Define the target resources (the ASG) and select theaws:ec2:terminate-instances
action. - Specify Targets and Actions
- Target: EC2 instances belonging to your ASG
- Action: Terminate one randomly selected instance
- Set Stop Conditions
Monitor CloudWatch alarms (e.g., high error rates or latency). If any alarm triggers, FIS will automatically stop the experiment. - Run and Observe
Execute the FIS experiment and watch the ASG replace the terminated instance. - Validate Outcome
Confirm that the new EC2 instance passes health checks and that no user-facing errors occur.
Warning
Always run chaos experiments in a staging or non-production environment first. Verify that your CloudWatch alarms and Auto Scaling health checks are correctly configured to avoid unintended downtime.
References
- AWS Fault Injection Simulator User Guide
- Amazon EC2 Auto Scaling User Guide
- Defining FIS Experiment Templates
Watch Video
Watch video content