Chaos Engineering

Building a Basic FIS experiment

Demo Run FIS Experiment

In this tutorial, you’ll learn how to run a Fault Injection Simulator (FIS) experiment in AWS Resilience Hub to terminate 50% of instances in an Auto Scaling group. We’ll cover:

  • Selecting an experiment template
  • Verifying target EC2 instances
  • Executing the experiment
  • Troubleshooting and adjusting your Auto Scaling group
  • Rerunning and monitoring the experiment

1. Select the FIS Experiment Template

  1. Open the AWS Resilience Hub console.
  2. Navigate to Experiment templates.
  3. Choose the template named Terminate 50% of instances in an Auto Scaling group.

The image shows the AWS Resilience Hub interface, specifically the "Experiment templates" section, with a template listed for terminating half of the instances in an auto-scaling group.

2. Confirm Affected Instances

Before you start, ensure your resource filters match the instances you want to target:

  • State: running
  • Tag: experiment-ready

In the EC2 console, apply these filters to see only the instances that the experiment will affect.

The image shows an AWS EC2 console with a list of running instances, displaying details like instance ID, state, type, and status checks. A specific instance’s summary is partially visible at the bottom.

Here, instance 37FF9… matches both filters and is the only target.

3. Start the FIS Experiment

Back in the FIS console:

  1. Select your experiment template.
  2. Click Start experiment.
  3. Confirm the prompt (this action is disruptive).

Warning

Starting a Chaos Engineering experiment will terminate live EC2 instances. Ensure this is performed in a non-production environment or during a maintenance window.

Once initiated, you’ll see the experiment state transition to Initiating and then Pending. In our first run, the experiment failed immediately:

The image shows an AWS Resilience Hub interface with a failed experiment notification. It includes details about the experiment ID, state, and other related information.

4. Inspect Experiment Logs

To diagnose the failure, open the Timeline or Log events in the FIS console and refresh:

{
  "id": "EXpPGlBontFGnDXQ6",
  "log_type": "target-resolution-start",
  "event_timestamp": "2024-08-05T00:13:06.913Z",
  "details": {
    "target_name": "KKFisWorkshopAsg-50Percent"
  }
}
{
  "id": "EXpPGlBontFGnDXQ6",
  "log_type": "target-resolution-detail",
  "event_timestamp": "2024-08-05T00:13:07.189Z",
  "details": {
    "resolved_targets_count": 0,
    "status": "completed"
  }
}

The resolved_targets_count is 0 because with a single instance, terminating 50% rounds down to zero.

Note

Chaos experiments require sufficient targets. If your Auto Scaling group has only one instance, 50% of one equals zero.

5. Identify & Fix the Root Cause

Inspect your Auto Scaling group in the EC2 console:

The image shows an AWS EC2 Auto Scaling group details page, displaying information about group capacity, launch template, and associated resources. It includes settings like desired, minimum, and maximum capacity, as well as launch template details such as instance type and AMI ID.

Current settings:

SettingValue
Desired capacity1
Minimum capacity1
Maximum capacity3

To guarantee at least one instance is terminated:

SettingUpdated Value
Desired capacity2
Minimum capacity2
Maximum capacity3

This ensures 50% of 2 = 1 instance will be terminated.

6. Preview Targets Before Rerunning

Wait for the second instance to launch, then in the FIS console:

  1. Select your experiment template.
  2. Click Preview.

You’ll see exactly one target instance:

The image shows an AWS console screen displaying details of an experiment template for terminating half of the instances in an auto-scaling group. It includes information such as the experiment template ID, ARN, description, and target settings.

Generate the preview:

The image shows an AWS Fault Injection Simulator (FIS) interface with details of an experiment template, including target information and a completed status. It displays resource types, selection mode, and experiment ID.

7. Rerun the Experiment

With your template preview confirming one target:

  1. Click Start experiment again.
  2. Watch the state change to Running.

The image shows an AWS Fault Injection Simulator (FIS) dashboard with details of a running experiment, including its ID, state, and associated actions. The experiment involves terminating instances in an auto-scaling group.

8. Monitor the Impact

Switch to the EC2 console to view instance states. One instance will terminate, while the other continues running, ensuring high availability:

The image shows an AWS EC2 console with a list of instances, displaying their instance IDs, states, and types. One instance is running, another is shutting down, and a third is terminated.

With two instances in the group, terminating 50% yields a controlled disruption without downtime.


Watch Video

Watch video content

Previous
Demo Create FIS Experiment