Architecture Overview
Our target deployment is a two-node Aurora PostgreSQL cluster in a multi-Availability Zone configuration:| Instance Role | Description |
|---|---|
| Writer | Handles all write operations (INSERT, UPDATE). |
| Reader | Serves read-only queries (SELECT) to offload reads. |
When the reader node reboots, Aurora automatically redirects incoming read traffic to the writer instance. After the reboot completes, the reader rejoins the cluster without manual intervention.
Prerequisites
- An existing Aurora PostgreSQL cluster with one writer and one reader in multi-AZ.
- AWS CLI v2 configured with permissions for AWS FIS and RDS.
- IAM role for FIS with
fis:StartExperimentandrds:RebootDBInstancepermissions. - Cluster identifiers:
- Writer:
aurora-writer-1 - Reader:
aurora-reader-1
- Writer:
AWS FIS Experiment Components
Every AWS FIS experiment consists of:- Target: The AWS resource(s) to inject faults into.
- Action: The fault to inject (e.g., reboot).
- Role ARN: IAM role that grants FIS the required permissions.
- Stop conditions (optional): When to halt the experiment automatically.
Step-by-Step: Injecting a Reader Node Reboot
1. Define the FIS Experiment Template
Save the following JSON asfis-reboot-reader.json:
2. Start the Experiment
Run the following AWS CLI command:3. Monitor the Experiment
- CLI:
- Console: Visit the AWS FIS Experiments page.
completed.
Expected Results & Hypothesis
We hypothesize that rebooting the reader node will not impact application availability:- Reader goes offline: Aurora shifts read traffic to the writer.
- Writer handles all requests: No downtime for your application.
- Reader rejoins: After reboot, reads distribute back across both nodes.
Do not target the writer instance in production without a failover plan. Rebooting the writer can cause a brief primary failover and potential downtime.
Cleanup
- Delete the FIS experiment template (if created separately).
- Verify that both Aurora instances are healthy: