Chaos Engineering

Chaos Engineering Fundamentals

What is AWS FIS

In the world of Chaos Engineering, tools like Netflix’s Chaos Monkey pioneered fault injection testing. Today, platforms such as Gremlin, Azure Chaos Studio, and AWS Fault Injection Simulator (FIS) help teams validate system resilience under real-world failure scenarios.

AWS FIS is a fully managed service that lets you run fault-injection experiments on AWS workloads. By deliberately introducing failures, you can:

  • Identify weaknesses before they affect customers
  • Validate auto-scaling, failover, and recovery processes
  • Ensure SLAs are met under adverse conditions

Note

AWS FIS supports both simple and complex scenarios—from terminating individual EC2 instances to simulating an Availability Zone outage.

Key Benefits

BenefitDescription
Fully ManagedNo need to provision infrastructure for fault injection
Native AWS IntegrationWorks with IAM, CloudWatch Alarms, AWS X-Ray, EventBridge, and more
Prebuilt & CustomizableUse built-in templates or define your own experiments
Multi-Environment SupportInject faults in EC2, ECS, EKS, RDS, Lambda, and more

AWS FIS Architecture

AWS FIS integrates seamlessly with your AWS environment:

  • CloudWatch Alarms: Trigger experiments or remediation workflows when thresholds are crossed.
  • AWS X-Ray: Correlate faults with distributed traces to pinpoint failures.
  • EventBridge: Automate experiment scheduling and notifications.

<img src="path/to/fis-architecture-diagram.png" alt="AWS FIS Architecture Diagram" />

You can leverage AWS FIS to simulate:

  • EC2 instance terminations and CPU/network stress
  • ECS and EKS pod failures
  • RDS instance failovers
  • Availability Zone outages
  • Network latency and packet loss between resources

Managing Experiments

AWS FIS experiments are defined as JSON documents. You can manage them through:

InterfaceCommand / Action
AWS Management ConsoleCreate, configure, and run experiments via the web UI
AWS CLIaws fis create-experiment-template
AWS CloudFormationUse the AWS::FIS::ExperimentTemplate resource
# Example CloudFormation snippet
Resources:
  MyFISExperiment:
    Type: AWS::FIS::ExperimentTemplate
    Properties:
      Description: "Terminate EC2 instance for resilience testing"
      Actions:
        - ActionId: aws:ec2:terminate-instances
          Parameters:
            instanceIds: ["i-0123456789abcdef0"]
      Targets:
        - ResourceType: "aws:ec2:instance"
          ResourceTags:
            chaos-test: "true"
      RoleArn: arn:aws:iam::123456789012:role/FISRole
      StopConditions:
        - Source: "aws:cloudwatch:alarm"
          Value: arn:aws:cloudwatch:us-west-2:123456789012:alarm:HighCPUUtilization

Warning

Always run experiments in a staging or non-production environment first. Fault injection can cause service interruptions!

Security & Permissions

Leverage AWS Identity and Access Management (IAM) to grant granular permissions:

  • fis:CreateExperimentTemplate
  • fis:StartExperiment
  • fis:StopExperiment
  • cloudwatch:DescribeAlarms
  • ec2:TerminateInstances

Use IAM policies and roles to restrict who can create, modify, or execute FIS experiments.


Watch Video

Watch video content

Previous
What is Chaos Engineering