- 44% of organizations report that 1 hour of downtime costs between $1 million and $5 million.
- In 2021, Facebook incurred $80 million+ in losses from seven hours of downtime.
- A recent “blue screen of death” outage impacted airlines, banks, healthcare providers, and countless other businesses worldwide.

Course Outline
We’ll cover seven high-level modules, each focusing on different AWS services and fault types:-
Module 1: Basic FIS Experiments
Configure IAM, create experiment templates, execute tests, and monitor results with dashboards. -
Module 2: Sample Application & Steady-State Metrics
Deploy a reference application and define baseline performance metrics. -
Module 3: Disk Fill Scenario on EC2
Simulate disk saturation on EC2 instances and analyze its impact on application behavior. -
Module 4: Aurora Reader Reboot
Inject a reboot fault into an Aurora reader node and observe recovery processes. -
Module 5: Fargate Load Stress Test
Apply CPU and memory stress to a serverless Fargate task and evaluate performance under high load. -
Module 6: EKS Memory Stress & Pod Deletion
Perform memory saturation tests and pod-deletion experiments in your EKS cluster to validate self-healing. -
Module 7: Availability Zone Power Interruption
Simulate a power outage in an entire availability zone to assess multi-AZ resilience.
Conclusion
By the end of this lesson, you’ll have a solid understanding of how to:- Design robust failure scenarios for cloud applications.
- Execute controlled experiments safely.
- Analyze results to strengthen your system’s resilience.
Further Reading
- AWS Fault Injection Simulator Documentation
- Amazon CloudWatch
- AWS X-Ray
- Amazon EC2
- Amazon Aurora
- AWS Fargate
- Amazon EKS