Table of Contents
- Core Components
- Prerequisites
- Step 1: Define the Action
- Step 2: Specify the Target
- Step 3: Add a Stop Condition
- Step 4: Launch the Experiment
- Cleanup
- Links and References
Core Components
| Component | Description | Example |
|---|---|---|
| Action | The fault you inject into the system. | Terminating an EC2 instance |
| Target | The resource(s) on which the action runs. | EC2 instances identified by tag Service=api |
| Stop Condition | A criterion—such as a CloudWatch alarm—that halts and rolls back the experiment when met. | CPU utilization > 80% for 5 minutes |
Stop conditions are optional but strongly recommended. They prevent runaway experiments and ensure safety.
Prerequisites
- AWS CLI v2 installed and configured
- IAM role with permissions:
fis:CreateExperimentTemplatefis:StartExperimentec2:TerminateInstancescloudwatch:DescribeAlarms
- EC2 instances tagged with
FaultInject=true
Step 1: Define the Action
We’ll terminate EC2 instances tagged for the experiment. Use the following snippet in your experiment template:Step 2: Specify the Target
Identify EC2 instances by tag. You can also use resource IDs or ARNs.Step 3: Add a Stop Condition
Define a CloudWatch alarm that stops the experiment when CPU usage exceeds 80% for 5 minutes:Ensure your CloudWatch alarm ARN is correct. A misconfigured stop condition may not trigger, leaving experiments running longer than intended.
Step 4: Launch the Experiment
Combine the components into a single template and create it with the AWS CLI:Cleanup
After testing, delete your experiment template to avoid orphaned resources:Links and References
- AWS Fault Injection Simulator Developer Guide
- AWS CLI Command Reference: FIS
- Creating CloudWatch Alarms