Experiment 1 Chaos Engineering on ASG

In this chapter, we’ll design and execute our first AWS Fault Injection Simulator (FIS) experiment against an Auto Scaling Group (ASG). The goal is to validate that terminating a single EC2 instance does not degrade application availability because the ASG will replace it automatically.

Components of the FIS Experiment

Component	Description
Given	We have an application running on EC2 instances spread across multiple Availability Zones, all managed by an Auto Scaling Group.
Hypothesis	If we terminate one EC2 instance, the Auto Scaling Group will launch a new instance, and the application will continue serving traffic without interruption.

Note

We use AWS Fault Injection Simulator to safely inject failures and test application resilience. Make sure your IAM role has the required permissions to execute FIS experiments.

Experiment Steps

Create FIS Experiment Template
Define the target resources (the ASG) and select the aws:ec2:terminate-instances action.
Specify Targets and Actions
- Target: EC2 instances belonging to your ASG
- Action: Terminate one randomly selected instance
Set Stop Conditions
Monitor CloudWatch alarms (e.g., high error rates or latency). If any alarm triggers, FIS will automatically stop the experiment.
Run and Observe
Execute the FIS experiment and watch the ASG replace the terminated instance.
Validate Outcome
Confirm that the new EC2 instance passes health checks and that no user-facing errors occur.

Warning

Always run chaos experiments in a staging or non-production environment first. Verify that your CloudWatch alarms and Auto Scaling health checks are correctly configured to avoid unintended downtime.

References

Watch Video

Watch video content