Chaos Engineering

Introduction to Real life Application

Establishing Steady State Metrics Using Cloudwatch RUMX Ray

After generating load against both the website and its containerized services, it’s essential to capture your baseline performance before introducing faults. AWS offers a suite of observability tools—CloudWatch RUM, Container Insights, and X-Ray—that together provide a complete view of front-end and back-end health.

1. Real User Monitoring (RUM)

CloudWatch RUM collects key browser metrics from actual users, helping you understand page-load performance and interactivity. We’ve enabled RUM on our homepage; let’s examine the dashboards.

Note

Ensure RUM is configured on all critical user flows to gather representative web vitals.

MetricDescription
Largest Contentful PaintTime to render the largest visible element (LCP)
First Input DelayDelay before the page responds to the first user interaction (FID)
Page Load Success RatePercentage of page loads without errors

The image shows an AWS CloudWatch dashboard displaying web vitals, including metrics for "Largest Contentful Paint" and "First Input Delay," with graphs indicating page load performance over time.

The Web Vitals dashboard tracks LCP and FID over time, showing how fast your pages load and respond for real users.

The image shows an AWS CloudWatch dashboard displaying performance metrics for a web application, including page loads, load time, and errors.

The Performance Metrics view details page loads, average load times, and error counts. With a ~85% success rate and minimal slow loads, this serves as our front-end baseline.

2. CloudWatch Container Insights for Amazon EKS

Switch to Container Insights in the CloudWatch console and select EKS Performance Monitoring. You can filter metrics by nodes, namespaces, pods, or containers.

ComponentKey Metrics
NodesCPU utilization, Memory usage, Pod count
NamespacesRequest rate, Error rate, Throughput
PodsLatency, Restart count, Health status
ContainersCPU/Memory limits, Network I/O

The image shows an AWS CloudWatch Container Insights dashboard for monitoring node performance, displaying metrics like node status, pods per node, CPU utilization, and memory utilization.

With these metrics, you establish a back-end baseline for cluster health, resource utilization, and workload distribution.

3. AWS X-Ray Trace Map

AWS X-Ray offers end-to-end request tracing across services such as EC2, Lambda, and API Gateway. Use the trace map to pinpoint latency hotspots and error sources.

Warning

Trace sampling needs to be set appropriately. Over-sampling can increase costs, while under-sampling may hide critical issues.

The image shows an AWS CloudWatch X-Ray console displaying a trace map for an EC2 instance named "PetSite," with metrics on latency, requests, and faults over time.

Drill down into individual segments—such as the client calling the PetSite EC2 instance—to view detailed latency, request, and fault counts.


By capturing steady-state metrics from CloudWatch RUM, Container Insights, and X-Ray, you’ll have a robust performance baseline. Use these figures to compare against results after injecting faults, ensuring you can accurately gauge the resilience and reliability of your application.

Watch Video

Watch video content

Previous
How to Plan Your Experiment Part 2