Chaos Engineering

Chaos Engineering on Serverless Fargate

Demo Fargate After State and Learning and Improvements

In this exercise, we simulate real user traffic on an AWS Fargate–powered pet adoption application while a parallel I/O stress test runs. Leveraging AWS CloudWatch Real User Monitoring (RUM) and ECS service metrics, we’ll validate the application’s performance and resilience under load. This aligns with Chaos Engineering principles to uncover potential bottlenecks before they impact production.


1. Generating Adoption Traffic

To emulate real-world user interactions, we’ll place multiple pet adoption orders:

  1. Navigate to the pet adoption page.
  2. Select pets and complete the adoption form.
  3. Refresh the listing to confirm adoption status.

The image shows a webpage with a message indicating that a pet adoption is complete, thanking the user for adopting. The page is titled "Adopt a Pet, Save a Life!" and includes navigation options.

Throughout this process, the UI remained fully responsive, and no errors or delays were observed.

Note

Generating concurrent adoption events helps ensure your application can handle spikes in user requests without degradation.


2. Monitoring Real User Metrics (RUM)

Next, we’ll inspect the CloudWatch RUM dashboard over the last ten minutes to detect any user frustration signals:

The image shows an AWS CloudWatch RUM (Real User Monitoring) overview dashboard, displaying metrics like page loads, average page load speed, Apdex score, and alarms. It includes options for filtering data and viewing different time ranges.

MetricValueThreshold
Page Loads1,200N/A
Avg Page Load (ms)320< 500 ms
Apdex Score0.98≥ 0.85
Frustrated Sessions00

Note

An Apdex score above 0.85 and zero frustrated sessions confirm a smooth user experience. For more details, see the CloudWatch RUM User Guide.


3. Inspecting ECS Service Metrics

Finally, let’s review our ECS service dashboard for the pay-for-adoption task to observe CPU, memory, and network activity during the I/O stress period:

The image shows an AWS CloudWatch dashboard displaying various metrics such as CPU utilization, memory utilization, network activity, and other performance indicators for a service. There are no alerts present.

MetricBaseline (%)Under Stress (%)Threshold (%)
CPU Utilization2518< 80
Memory Utilization3022< 75
Network In (MB/s)57N/A
Network Out (MB/s)46N/A

Warning

Keep an eye on sustained CPU and memory drops—they may indicate resource throttling or misconfiguration in your task definition.


Conclusion

Our AWS Fargate service demonstrated robust performance under concurrent I/O stress and real user load. No critical spikes or resource exhaustion occurred, confirming that the application design scales reliably.

Watch Video

Watch video content

Previous
Demo Run experiment Task IO stress