Chaos Engineering
Chaos Engineering on Compute EC2
Demo FIS Experiment Disk Fill Scenario on EC2 and before metrics in X Ray
In this tutorial, you’ll run a disk fill Fault Injection Simulation (FIS) on an EC2 instance in the “EKS pet site” group. You will:
- Record baseline metrics
- Configure the AWSFIS-Run-Disk-Fill SSM document
- Create and run an FIS experiment template
- Compare post-fault metrics against the baseline
Prerequisites
- EC2 instances tagged EKS pet site, running Amazon Linux or Ubuntu
- SSM Agent installed on target instances
- IAM role with permissions for SSM commands and CloudWatch logging
1. Record Baseline Metrics
Filter the running instances by name “EKS pet site” and select instance 03DF9.
Metric | Baseline Value |
---|---|
CPU Utilization | ~4% |
Network Out (Bytes) | 16–18 million |
These values establish our steady-state before fault injection.
2. Configure the Disk Fill SSM Document
Navigate to AWS Systems Manager and search for fis services.
Select the AWSFIS-Run-Disk-Fill document. This automation uses fallocate
to stress disk allocation and installs bc
and fallocate
if needed.
description: |
### Document name - AWSFIS-Run-Disk-Fill
Runs disk filling stress on an instance using fallocate.
Installs dependencies if InstallDependencies=true.
schemaVersion: '2.2'
parameters:
DurationSeconds:
type: String
description: (Required) Duration of the disk fill stress in seconds.
Percent:
type: String
description: (Optional) Percent of disk space to allocate (1–100).
default: '95'
InstallDependencies:
type: String
description: (Optional) Install `bc` & `fallocate` if missing (True/False).
default: 'True'
Note
This SSM document does not uninstall dependencies after the test. To avoid repeated installs, pre-bake bc
and fallocate
and set InstallDependencies=false
.
Input Parameters
Parameter | Required | Default | Description |
---|---|---|---|
DurationSeconds | Yes | — | Duration of the disk fill stress (in seconds). |
Percent | No | '95' | Percentage of disk space to allocate. |
InstallDependencies | No | 'True' | Install dependencies if missing (True/False). |
3. Create the FIS Experiment Template
- Open AWS Fault Injection Simulator (FIS) and click Create experiment template.
- Enter your AWS account, a name (e.g.,
ec2-disk-fill-test
), and a description. - Under Actions, choose Add action:
- Name: disk-fill
- Action type: aws:ssm:send-command
- Document: AWSFIS-Run-Disk-Fill
- Parameters:
{"DurationSeconds":"120","Percent":"95","InstallDependencies":"True"}
- Action timeout: 600 seconds
- In Targets:
- Resource type:
AWS::EC2::Instance
- Resource IDs:
03DF9
- Selection mode: All
- Resource type:
- Assign an IAM role with SSM permissions. For logging, choose CloudWatch log group
FIS experiments
. - Review and create the template.
Generate a preview to confirm instance 03DF9 is selected.
4. Run and Monitor the Experiment
Start the experiment. Observe its state transition from Pending → Running → Completed.
State | Description |
---|---|
Pending | Scheduled but not yet started. |
Running | Disk fill stress is in progress. |
Completed | Experiment finished; metrics recorded. |
Warning
Disk fill stress can cause system instability and data loss. Only run on non-production or isolated environments.
5. Analyze Post-Fault Metrics
After completion, revisit the CPU and Network Out metrics on instance 03DF9 and compare them to your baseline.
References
Watch Video
Watch video content