Chaos Engineering
Chaos Engineering on Kubernetes EKS
Pod Delete on EKS
In this tutorial, you’ll learn how to use AWS Fault Injection Simulator (FIS) to simulate a pod deletion on an Amazon EKS cluster. Simulating pod terminations helps validate your application’s resilience and ensures that Kubernetes automatically recreates pods without impacting end users.
Experiment Overview
We’ll design a chaos experiment with two key components:
Component | Description |
---|---|
Observed Architecture | The current state of the pet adoption web application built with microservices on EKS. |
Hypothesis | Kubernetes will detect and recreate deleted pods across multiple Availability Zones (AZs). |
Our microservices-based pet adoption application includes a product-details service deployed in multiple AZs. Because these pods have a cold start time of approximately five seconds, we anticipate no noticeable disruption for customers.
Prerequisites
- An existing Amazon EKS cluster with the FIS action role attached.
- AWS CLI configured with permissions for FIS and EKS.
- kubectl access to the EKS cluster.
Important
Ensure your EKS node IAM role has the fis:StartExperiment
permission to run AWS FIS experiments successfully.
Hypothesis
If one or more product-details pods are terminated:
- Kubernetes Detects Failure: The control plane marks the pods as unavailable.
- Self-Healing: The ReplicaSet controller launches new pods.
- User Impact: Due to the low cold start time (~5s), end users experience no downtime.
Next Steps
- Create an AWS FIS experiment template targeting the EKS pod delete action.
- Execute the experiment and monitor pod restart behavior.
- Validate application responsiveness via health checks and user interface tests.
References
Watch Video
Watch video content