Chaos Engineering

Chaos Engineering Fundamentals

FIS Experiments in this Course

In this module, you’ll run a suite of fault injection simulations using AWS Fault Injection Simulator (FIS) to validate and harden your application’s resilience. Each experiment targets a different AWS service or scenario, helping you understand how your infrastructure responds under stress.

Experiment Matrix

ExperimentDescriptionTarget Services
EC2 Instance TerminationTerminate an EC2 instance managed by an Auto Scaling groupEC2, Auto Scaling
EC2 Disk-Fill SimulationSimulate disk‐fill on an EC2 instance fronted by a load balancer in EKSEC2, EKS, ELB
Aurora Reader Node RebootReboot a reader node in an Amazon Aurora clusterAmazon Aurora
High I/O on ECS FargateGenerate heavy I/O load on an ECS Fargate taskAmazon ECS (Fargate)
High I/O on EKS NodeStress the I/O on an EKS worker nodeAmazon EKS
Pod Deletion on EKSDelete a running pod in an EKS clusterAmazon EKS
Availability Zone OutageSimulate an AZ-wide power interruptionEC2, Auto Scaling, Networking

1. EC2 Instance Termination

Terminate a specific EC2 instance in an Auto Scaling group to verify that new instances launch automatically and service availability is preserved.

aws fis create-experiment-template \
  --description "Terminate one EC2 in ASG" \
  --actions terminate-instance \
  --stop-conditions "source=none"

Note

Make sure your Auto Scaling group has a minimum capacity greater than zero. Otherwise, FIS won’t be able to replace the terminated instance.


2. EC2 Disk-Fill Simulation

Fill up the root volume on an EC2 instance behind a load balancer in an EKS worker node. This test reveals how your EKS pods behave under disk pressure.

targets:
  DiskFillerInstance:
    resourceType: aws:ec2:instance
    selectionMode: ALL
actions:
  fill-disk:
    actionId: aws:ssm:send-command
    parameters:
      documentName: AWS-RunShellScript
      commands:
        - "dd if=/dev/zero of=/mnt/fillfile bs=1M count=10000"

3. Aurora Reader Node Reboot

Reboot a reader node in your Amazon Aurora cluster to measure how fast failover completes and how the application handles transient database unavailability.

aws fis create-experiment-template \
  --description "Reboot Aurora Reader Node" \
  --actions reboot-db-instance \
  --targets reader-node

Warning

Avoid running this experiment during peak traffic unless you have a multi-AZ Aurora cluster and automated failover enabled.


4. High I/O on ECS Fargate

Generate sustained I/O stress on a Fargate task to observe CPU throttling, task restarts, and overall service degradation.

actions:
  high-io-stress:
    actionId: aws:ssm:send-command
    targets:
      FargateTask:
        resourceType: aws:ecs:task
    parameters:
      documentName: AWS-RunShellScript
      commands:
        - "stress-ng --io 4 --timeout 300s"

5. High I/O on EKS Node

Apply heavy I/O load on an EKS worker node to test pod eviction, node replacement, and auto scaling behaviors.

actions:
  node-io-stress:
    actionId: aws:ssm:send-command
    targets:
      EksNode:
        resourceType: aws:ec2:instance
    parameters:
      documentName: AWS-RunShellScript
      commands:
        - "stress-ng --io 8 --timeout 300s"

6. Pod Deletion on EKS

Select and delete a live pod in your EKS cluster to validate Kubernetes self-healing—pods should restart automatically to maintain desired state.

kubectl delete pod <pod-name> -n <namespace>

Note

Use labels or selectors to target a specific pod group. For example:
kubectl delete pod -l app=web -n production


7. Availability Zone Outage

Simulate a full Availability Zone power interruption, ensuring your workloads can fail over to other AZs without data loss or downtime.

aws fis create-experiment-template \
  --description "AZ Outage Simulation" \
  --actions disable-availability-zones \
  --targets az-target

Watch Video

Watch video content

Previous
What is AWS FIS