Chaos Engineering
Chaos Engineering Fundamentals
FIS Experiments in this Course
In this module, you’ll run a suite of fault injection simulations using AWS Fault Injection Simulator (FIS) to validate and harden your application’s resilience. Each experiment targets a different AWS service or scenario, helping you understand how your infrastructure responds under stress.
Experiment Matrix
Experiment | Description | Target Services |
---|---|---|
EC2 Instance Termination | Terminate an EC2 instance managed by an Auto Scaling group | EC2, Auto Scaling |
EC2 Disk-Fill Simulation | Simulate disk‐fill on an EC2 instance fronted by a load balancer in EKS | EC2, EKS, ELB |
Aurora Reader Node Reboot | Reboot a reader node in an Amazon Aurora cluster | Amazon Aurora |
High I/O on ECS Fargate | Generate heavy I/O load on an ECS Fargate task | Amazon ECS (Fargate) |
High I/O on EKS Node | Stress the I/O on an EKS worker node | Amazon EKS |
Pod Deletion on EKS | Delete a running pod in an EKS cluster | Amazon EKS |
Availability Zone Outage | Simulate an AZ-wide power interruption | EC2, Auto Scaling, Networking |
1. EC2 Instance Termination
Terminate a specific EC2 instance in an Auto Scaling group to verify that new instances launch automatically and service availability is preserved.
aws fis create-experiment-template \
--description "Terminate one EC2 in ASG" \
--actions terminate-instance \
--stop-conditions "source=none"
Note
Make sure your Auto Scaling group has a minimum capacity greater than zero. Otherwise, FIS won’t be able to replace the terminated instance.
2. EC2 Disk-Fill Simulation
Fill up the root volume on an EC2 instance behind a load balancer in an EKS worker node. This test reveals how your EKS pods behave under disk pressure.
targets:
DiskFillerInstance:
resourceType: aws:ec2:instance
selectionMode: ALL
actions:
fill-disk:
actionId: aws:ssm:send-command
parameters:
documentName: AWS-RunShellScript
commands:
- "dd if=/dev/zero of=/mnt/fillfile bs=1M count=10000"
3. Aurora Reader Node Reboot
Reboot a reader node in your Amazon Aurora cluster to measure how fast failover completes and how the application handles transient database unavailability.
aws fis create-experiment-template \
--description "Reboot Aurora Reader Node" \
--actions reboot-db-instance \
--targets reader-node
Warning
Avoid running this experiment during peak traffic unless you have a multi-AZ Aurora cluster and automated failover enabled.
4. High I/O on ECS Fargate
Generate sustained I/O stress on a Fargate task to observe CPU throttling, task restarts, and overall service degradation.
actions:
high-io-stress:
actionId: aws:ssm:send-command
targets:
FargateTask:
resourceType: aws:ecs:task
parameters:
documentName: AWS-RunShellScript
commands:
- "stress-ng --io 4 --timeout 300s"
5. High I/O on EKS Node
Apply heavy I/O load on an EKS worker node to test pod eviction, node replacement, and auto scaling behaviors.
actions:
node-io-stress:
actionId: aws:ssm:send-command
targets:
EksNode:
resourceType: aws:ec2:instance
parameters:
documentName: AWS-RunShellScript
commands:
- "stress-ng --io 8 --timeout 300s"
6. Pod Deletion on EKS
Select and delete a live pod in your EKS cluster to validate Kubernetes self-healing—pods should restart automatically to maintain desired state.
kubectl delete pod <pod-name> -n <namespace>
Note
Use labels or selectors to target a specific pod group. For example:kubectl delete pod -l app=web -n production
7. Availability Zone Outage
Simulate a full Availability Zone power interruption, ensuring your workloads can fail over to other AZs without data loss or downtime.
aws fis create-experiment-template \
--description "AZ Outage Simulation" \
--actions disable-availability-zones \
--targets az-target
Links and References
- AWS Fault Injection Simulator Documentation
- Amazon EC2 Auto Scaling
- Amazon EKS Best Practices
- Amazon Aurora Failover
- Amazon ECS Task Definitions
- Kubernetes Pod Lifecycle
Watch Video
Watch video content