Chaos Engineering
Chaos Engineering on Database Aurora
Demo Create and Run FIS experiment and After Metrics and DB state
In this tutorial, you’ll learn how to build an AWS Fault Injection Simulator (FIS) experiment template that reboots an Amazon Aurora RDS reader instance. You’ll then inspect CloudWatch metrics and verify the database state before and after the experiment to confirm minimal impact on your application.
Prerequisites
Service | Purpose | Documentation |
---|---|---|
Amazon RDS | Host Aurora reader & writer instances | https://docs.aws.amazon.com/rds/latest/Concepts/Welcome.html |
AWS Fault Injection Simulator (FIS) | Inject faults into AWS resources | https://docs.aws.amazon.com/fis/latest/userguide/what-is-fis.html |
Amazon CloudWatch | Collect metrics, logs, and ServiceLens traces | https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ServiceLens.html |
Note
Ensure you have an IAM role with the managed policy AWSFISFullAccess
and permissions to reboot RDS instances. Assign this role when you create the FIS experiment template.
1. Identify the Reader Instance
Navigate to the Amazon RDS console and locate the reader instance you plan to reboot. In this example, the reader node identifier ends with BGDUC.
2. Create the FIS Experiment Template
- Open the AWS FIS console.
- In the sidebar, choose Experiment templates → Create experiment template.
- Fill in:
- Description: reboot RDS Aurora
- Template name: FIS-workshop-RDS
3. Add the Reboot Action
- Click Add action.
- Configure:
- Action name: rebootAurora
- Service: RDS
- Action type: reboot-db-instances
- Leave Start after blank to execute immediately. Keep the default target placeholder.
4. Review the Action & Target Panel
After adding the action, confirm it appears alongside the default target in the template editor.
5. Define the Target
- Under Targets, click the default target name.
- Set Resource type to Resource ID.
- Paste the reader instance identifier (ending in BGDUC).
- Click Save.
6. Assign IAM Role & Enable Logs
- In Experiment role, select the FIS role you created earlier.
- Under Logs, enable CloudWatch Logs and choose the FIS experiments log group.
- Click Create to finalize the template.
7. Generate a Preview
Before starting, click Generate preview to verify your target matches the reader instance.
8. Start the Experiment
Click Start experiment. You will see the status progress through Initiating, Pending, Running, and finally Completed.
9. Review Post-Experiment State
- Return to the RDS console. Observe the reader node restarting, then returning to Available.
- Open CloudWatch ServiceLens to inspect latency, request rates, and fault rates. Since only the reader was rebooted, the writer node continues to serve writes uninterrupted.
Warning
Rebooting the reader node confines the blast radius, but always validate your high-availability configuration before injecting faults in production.
By following these steps, you confirm that rebooting an Aurora reader instance via AWS FIS has minimal impact on application availability and maintains overall cluster resilience.
Watch Video
Watch video content