Chaos Engineering
Chaos Engineering on Availability Zone
Experiment Overview
In this experiment, we use AWS Fault Injection Simulator (FIS) to simulate an Availability Zone (AZ) power interruption and validate our multi-AZ architecture’s resilience. By intentionally cutting power to one AZ, we expect EC2 instances and containers in that zone to fail, while load balancers and Auto Scaling groups in healthy AZs continue serving traffic.
Goals
- Validate that the pet site remains accessible during an AZ-wide outage
- Measure performance degradation and failover behavior under controlled conditions
Experiment Components
Component | Description |
---|---|
Given | The pet site is deployed across multiple Availability Zones. |
Hypothesis | A single AZ power failure should not render the pet site unavailable; minor latency spikes are acceptable. |
Limiting the Blast Radius
To ensure a focused test, only resources tagged with the following key-value pair will be targeted by FIS:
Key: AZ impairment power
Value: ready
Note
Only resources labeled AZ impairment power: ready
are affected by the experiment, keeping the rest of your environment safe.
Expected Behavior
- FIS triggers a simulated power loss in the targeted AZ.
- EC2 instances and containers in that AZ go offline.
- Application load balancers and Auto Scaling groups in remaining AZs absorb the traffic.
- Pet site remains reachable, with possible latency increase during failover.
Warning
Always run FIS experiments in a non-production or staging environment first. Improper scoping can lead to real service disruptions.
References
Watch Video
Watch video content