Chaos Engineering

Chaos Engineering on Kubernetes EKS

Demo Steady State Pod Delete on EKS

In this walkthrough, we collect baseline metrics to define the application’s steady state. This is essential before running an AWS Fault Injection Simulator (FIS) experiment to delete an EKS pod.

1. Observe Container Insights Performance

Begin by reviewing your Amazon EKS service metrics with CloudWatch Container Insights. Track these core indicators:

MetricDescription
Running pod countNumber of pods currently in service
Pod CPU utilizationCPU usage per pod
Pod memory utilizationMemory usage per pod

The image shows an AWS CloudWatch dashboard for Container Insights, displaying performance metrics for a service named "PetSite," including running pods, CPU utilization, and memory utilization.

Note

These charts represent the steady state of the PetSite service under normal conditions.

2. Verify End-User Experience with CloudWatch RUM

Next, validate real user metrics using CloudWatch RUM. This helps you understand page load performance and client-side errors:

The image shows an AWS CloudWatch dashboard displaying performance metrics for a web application, including page loads, load time, and errors. It features various tabs and filters for monitoring application performance.

Then, examine key web vitals like Largest Contentful Paint (LCP) and First Input Delay (FID):

The image shows an AWS CloudWatch dashboard displaying web vitals, including metrics for "Largest Contentful Paint" and "First Input Delay," with graphs indicating performance levels categorized as positive, tolerable, and frustrating.

3. Simulate Load with k6

Generate realistic user traffic using k6 to ensure the baseline reflects production behavior. For example:

k6 run script.js --vus 4 --duration 1h

During the test, you might see output like:

running (@48m32.0s), 1/4 VUs, 1783 complete and 3 interrupted iterations
browser  X [ 69% ] 4 VUs @48m05.4s/1h4m59s

Steady State Results

Approximately 1% of virtual users experienced frustration, while the rest saw positive load times.

4. Inspect Distributed Tracing with CloudWatch Trace Map

Capture end-to-end request performance for the petlistadoptions endpoint on EKS Fargate:

The image shows an AWS CloudWatch Trace Map interface displaying metrics for a service called "petlistadoptions" on EKS Fargate, including latency, requests, and fault rates.

Analyze latency, throughput, and error rates to solidify your steady state baseline.


References

Watch Video

Watch video content

Previous
Pod Delete on EKS