Chaos Engineering

Chaos Engineering on Availability Zone

Experiment Overview

In this experiment, we use AWS Fault Injection Simulator (FIS) to simulate an Availability Zone (AZ) power interruption and validate our multi-AZ architecture’s resilience. By intentionally cutting power to one AZ, we expect EC2 instances and containers in that zone to fail, while load balancers and Auto Scaling groups in healthy AZs continue serving traffic.

Goals

  • Validate that the pet site remains accessible during an AZ-wide outage
  • Measure performance degradation and failover behavior under controlled conditions

Experiment Components

ComponentDescription
GivenThe pet site is deployed across multiple Availability Zones.
HypothesisA single AZ power failure should not render the pet site unavailable; minor latency spikes are acceptable.

Limiting the Blast Radius

To ensure a focused test, only resources tagged with the following key-value pair will be targeted by FIS:

Key:   AZ impairment power
Value: ready

Note

Only resources labeled AZ impairment power: ready are affected by the experiment, keeping the rest of your environment safe.

The image is a diagram titled "Experiment Overview," illustrating a cloud architecture setup within an AWS Virtual Private Cloud (VPC), highlighting an "AZ Power Interruption" scenario with various components like APIs, databases, and failover mechanisms.

Expected Behavior

  1. FIS triggers a simulated power loss in the targeted AZ.
  2. EC2 instances and containers in that AZ go offline.
  3. Application load balancers and Auto Scaling groups in remaining AZs absorb the traffic.
  4. Pet site remains reachable, with possible latency increase during failover.

Warning

Always run FIS experiments in a non-production or staging environment first. Improper scoping can lead to real service disruptions.

References

Watch Video

Watch video content

Previous
What is an Availability Zone AZ