Experiment Overview
We’ll design a chaos experiment with two key components:| Component | Description |
|---|---|
| Observed Architecture | The current state of the pet adoption web application built with microservices on EKS. |
| Hypothesis | Kubernetes will detect and recreate deleted pods across multiple Availability Zones (AZs). |
- An existing Amazon EKS cluster with the FIS action role attached.
- AWS CLI configured with permissions for FIS and EKS.
- kubectl access to the EKS cluster.
Ensure your EKS node IAM role has the
fis:StartExperiment permission to run AWS FIS experiments successfully.
Hypothesis
If one or more product-details pods are terminated:- Kubernetes Detects Failure: The control plane marks the pods as unavailable.
- Self-Healing: The ReplicaSet controller launches new pods.
- User Impact: Due to the low cold start time (~5s), end users experience no downtime.
Next Steps
- Create an AWS FIS experiment template targeting the EKS pod delete action.
- Execute the experiment and monitor pod restart behavior.
- Validate application responsiveness via health checks and user interface tests.