Chaos Engineering

Chaos Engineering on Kubernetes EKS

Demo Recheck After Pod Delete on EKS

In this walkthrough, we’ll verify how Amazon EKS and Kubernetes self-heal after a Pod deletion, and confirm there’s no impact on end-users by monitoring the application UI, Real User Monitoring (RUM), and control plane metrics in CloudWatch.


1. Verify Pod Auto-Replacement

After deleting a Pod, Kubernetes should automatically spin up a new one. Run:

kubectl get pods -n default

Expected output:

NAME                                      READY   STATUS      RESTARTS       AGE
petfood-6b56846cbc-85m66                  1/1     Running     0              23h
petfood-6b56846cbc-x5t16                  1/1     Running     0              23h
petfood-metric-6bd55449d8-6jp97           1/1     Running     0              23h
petfood-metric-6bd55449d8-dbrj8           1/1     Running     0              23h
pethistory-deployment-5f96c67c674-t4mhq   2/2     Running     0              34s
petsite-deployment-6db68bf8-lv6w4         1/1     Running     0              34s
petsite-deployment-6db68bf8-xqs7h         1/1     Running     2 (68m ago)    23h
xray-daemon-jh7ft                         1/1     Running     7 (55m ago)    23h
xray-daemon-rtlhh                         1/1     Running     8 (55m ago)    23h

Note

Kubernetes replaces the deleted petsite-deployment Pod within seconds, demonstrating built-in self-healing.


2. Validate Application Availability & RUM Metrics

Next, confirm the application UI loads without errors and that RUM data in CloudWatch shows no spike in errors or user frustration signals.

The image shows an AWS CloudWatch dashboard for monitoring application performance, displaying metrics like page loads, load time, and errors. It includes navigation tabs and filtering options for detailed analysis.

MetricUse CaseObservation
Page LoadsTrack user visitsStable
Load TimeMeasure responsivenessWithin SLA
ErrorsDetect failuresNo spikes detected

A manual refresh confirms that user frustration signals remain at baseline:

The image shows an AWS CloudWatch dashboard displaying metrics for "Largest Contentful Paint" and "First Input Delay," with graphs indicating performance over time. The metrics are categorized into positive, tolerable, and frustrating levels.

Performance MetricThreshold (ms)Level
Largest Contentful Paint< 2500Positive
First Input Delay< 100Tolerable

3. Monitor EKS Control Plane Metrics

Finally, review your EKS cluster’s control plane metrics to ensure resource utilization stayed consistent throughout the experiment.

The image shows an AWS CloudWatch dashboard for monitoring EKS services, displaying metrics like pod CPU and memory utilization, and the number of running services.

MetricDescriptionObservation
CPU UtilizationAggregate Pod CPU usageStable (~30%)
Memory UtilizationAggregate Pod memory usageStable (~40%)
Running PodsTotal active Pods in default namespaceConsistent

Note

Stable control plane metrics confirm that pod deletion did not adversely affect cluster health.


Conclusion

This demo highlights Kubernetes’ resilience on Amazon EKS:

  • Self-healing: Deleted Pods are recreated almost instantly.
  • Zero user impact: No UI errors or RUM spikes.
  • Stable cluster health: Control plane metrics remain steady.

Watch Video

Watch video content

Previous
Demo Run Experiment Pod Delete on EKS