Chaos Engineering

Chaos Engineering on Compute EC2

Demo FIS Experiment Disk Fill Scenario on EC2 and before metrics in X Ray

In this tutorial, you’ll run a disk fill Fault Injection Simulation (FIS) on an EC2 instance in the “EKS pet site” group. You will:

  1. Record baseline metrics
  2. Configure the AWSFIS-Run-Disk-Fill SSM document
  3. Create and run an FIS experiment template
  4. Compare post-fault metrics against the baseline

Prerequisites

  • EC2 instances tagged EKS pet site, running Amazon Linux or Ubuntu
  • SSM Agent installed on target instances
  • IAM role with permissions for SSM commands and CloudWatch logging

1. Record Baseline Metrics

Filter the running instances by name “EKS pet site” and select instance 03DF9.

The image shows an AWS EC2 dashboard with two running instances, displaying monitoring metrics such as CPU utilization, network in/out bytes, and network packets.

MetricBaseline Value
CPU Utilization~4%
Network Out (Bytes)16–18 million

These values establish our steady-state before fault injection.


2. Configure the Disk Fill SSM Document

Navigate to AWS Systems Manager and search for fis services.

The image shows an AWS console with a search for "fis," displaying services like AWS FIS, AWS Firewall Manager, and Amazon Inspector. The background includes a document related to AWS Systems Manager.

Select the AWSFIS-Run-Disk-Fill document. This automation uses fallocate to stress disk allocation and installs bc and fallocate if needed.

description: |
  ### Document name - AWSFIS-Run-Disk-Fill
  Runs disk filling stress on an instance using fallocate.
  Installs dependencies if InstallDependencies=true.
schemaVersion: '2.2'
parameters:
  DurationSeconds:
    type: String
    description: (Required) Duration of the disk fill stress in seconds.
  Percent:
    type: String
    description: (Optional) Percent of disk space to allocate (1–100).
    default: '95'
  InstallDependencies:
    type: String
    description: (Optional) Install `bc` & `fallocate` if missing (True/False).
    default: 'True'

The image shows a document from AWS Systems Manager titled "AWSFIS-Run-Disk-Fill," detailing a script that runs disk filling stress on an instance using fallocate, with input parameters for duration and percentage of disk space to allocate.

Note

This SSM document does not uninstall dependencies after the test. To avoid repeated installs, pre-bake bc and fallocate and set InstallDependencies=false.

Input Parameters

ParameterRequiredDefaultDescription
DurationSecondsYesDuration of the disk fill stress (in seconds).
PercentNo'95'Percentage of disk space to allocate.
InstallDependenciesNo'True'Install dependencies if missing (True/False).

3. Create the FIS Experiment Template

  1. Open AWS Fault Injection Simulator (FIS) and click Create experiment template.
  2. Enter your AWS account, a name (e.g., ec2-disk-fill-test), and a description.
  3. Under Actions, choose Add action:
    • Name: disk-fill
    • Action type: aws:ssm:send-command
    • Document: AWSFIS-Run-Disk-Fill
    • Parameters:
      {"DurationSeconds":"120","Percent":"95","InstallDependencies":"True"}
      
    • Action timeout: 600 seconds
  4. In Targets:
    • Resource type: AWS::EC2::Instance
    • Resource IDs: 03DF9
    • Selection mode: All
  5. Assign an IAM role with SSM permissions. For logging, choose CloudWatch log group FIS experiments.
  6. Review and create the template.

The image shows a user interface for creating an experiment template, with fields for description, name, and action type, and a dropdown menu listing various AWS SSM commands.

The image shows a configuration screen for creating an experiment template in AWS, specifically for an EC2 disk fill test. It includes fields for description, name, action parameters, and duration settings.

Generate a preview to confirm instance 03DF9 is selected.

The image shows an AWS Fault Injection Simulator (FIS) interface for an EC2 Disk Fill Test experiment template. It includes details like the experiment template ID, creation time, and target information.


4. Run and Monitor the Experiment

Start the experiment. Observe its state transition from PendingRunningCompleted.

The image shows an AWS Fault Injection Simulator (FIS) dashboard with details of a running experiment, including its ID, state, and associated resources. The experiment is related to an EC2 Disk Fill Test.

StateDescription
PendingScheduled but not yet started.
RunningDisk fill stress is in progress.
CompletedExperiment finished; metrics recorded.

Warning

Disk fill stress can cause system instability and data loss. Only run on non-production or isolated environments.


5. Analyze Post-Fault Metrics

After completion, revisit the CPU and Network Out metrics on instance 03DF9 and compare them to your baseline.


References

Watch Video

Watch video content

Previous
Disk Fill Scenario on EC2