AWS Solutions Architect Associate Certification
Designing for Reliability
Turning up Reliability on Application Integration
Welcome to this lesson on enhancing reliability in application integration services. In this guide, we explore how various AWS services—such as Auto Scaling, Load Balancers, API Gateway, AppFlow, messaging systems, and orchestration tools—work together to create highly resilient architectures.
Auto Scaling and Application Load Balancers
Auto Scaling automatically launches new instances when failures occur, ensuring that your system remains resilient during disruptions. Although an auto scaling group might appear to have a single endpoint (the load balancer), the Application Load Balancer (ALB) is actually a virtual service comprised of multiple devices. It efficiently distributes traffic across instances, ensuring continuous availability even if one or more instances fail.
Auto Scaling not only replaces failed instances but also evenly distributes instances across multiple Availability Zones (AZs). Note that enabling cross-zone load balancing is crucial for even connection distribution. For example, without cross-zone load balancing, you might see connection counts like 8.3, 8, or 12.5 per zone instead of a consistent value (e.g., 10-10).
In addition, Auto Scaling improves resiliency during traffic surges by launching instances on-demand. By placing a load balancer between clients and services, it decouples components, buffering the system against potential failures.
Auto Scaling also integrates with load balancers to ensure that even minimal auto scaling groups are automatically replenished when instances are replaced, maintaining system loose coupling.
When cross-zone load balancing is enabled, connections are uniformly distributed across zones, further enhancing overall resiliency.
Load Balancers: Application, Network, and Gateway
Application Load Balancer (ALB)
Operating at layer 7, ALBs can abstract HTTP/HTTPS traffic and perform target group health checks to detect and replace unhealthy instances. They support multiple listeners and rules, which allows you to distribute traffic based on geography or URL path patterns.
For example, a Fintech company might use SSL termination at the ALB to decrypt TLS traffic, offloading backend servers and boosting overall performance and resiliency.
ALBs also generate rich metrics (e.g., response times) via CloudWatch, enabling proactive monitoring of application availability across AZs.
Furthermore, ALB access logs can be sent to S3 or CloudWatch Logs to capture critical data such as client IP addresses, request paths, and response codes for in-depth analysis and troubleshooting.
When deployed across multiple AZs, integrating ALBs with Auto Scaling and ensuring cross-zone load balancing are essential practices to maintain availability and balanced routing.
Network Load Balancer (NLB)
Operating at the transport layer, Network Load Balancers are engineered for high performance and handle TCP, UDP, and TLS traffic efficiently. Their straightforward configurations include target group health checks on specified ports. While NLBs do not support path-based routing, they integrate seamlessly with Auto Scaling as reliable targets.
Enabling cross-zone load balancing allows the NLB to associate virtual endpoints under a single DNS name and direct connections to healthy target instances without interruption.
Even with its simplicity, correctly configured health checks (which may even support TLS) are essential to guarantee that the service remains available.
NLBs can also be deployed with static IP addresses and integrated with on-premises resources, making them a flexible choice for hybrid architectures.
For detailed traffic monitoring, VPC Flow Logs capture low-level metadata that aids in troubleshooting network issues.
If detailed logs for TLS listeners are required, NLB listener access logs offer standard metrics and logging capabilities.
Gateway Load Balancer (GLB)
Gateway Load Balancers are built for traffic inspection appliances such as firewalls and packet inspectors. They receive traffic from endpoints defined in routing tables, forward it to inspection devices, and then deliver approved traffic to application servers. GLBs work closely with Auto Scaling groups to adjust capacity on demand.
Monitoring the health of GLBs is essential. Use dedicated health checks and enable cross-zone load balancing to ensure continuous availability, even if an AZ encounters issues.
API Gateway
Amazon API Gateway enables you to create, deploy, and monitor both REST and WebSocket APIs. It offers additional features beyond what ALBs provide, such as caching, throttling, and rate limiting.
For instance, API Gateway supports stateful WebSocket APIs and stateless REST APIs through distinct deployment stages. Throttling policies are implemented to protect backend services from sudden traffic spikes, ensuring controlled burst capacity and preventing cascading failures.
Canary deployments allow you to test new API versions with a small percentage of traffic before a full rollout.
API Gateway is engineered with fault tolerance in mind. Features such as client token policies and built-in error handling help prevent service failures by isolating issues effectively.
AppFlow
AWS AppFlow is a fully managed integration service that securely transfers data between AWS services and third-party applications. With built-in redundancy across multiple AZs, AppFlow ensures continuous data transfers even during an AZ failure.
For enhanced monitoring and visibility, CloudWatch dashboards combined with service logs provide insights into flow metrics and data transfer statuses.
Messaging and Queueing Services
Simple Notification Service (SNS)
SNS is a fully managed publish/subscribe messaging service designed for high availability and fault tolerance. It automatically handles message retries and supports dead-letter queues for undeliverable messages, ensuring that no critical message is lost.
Using dead-letter queues allows you to capture and analyze any messages that fail to be delivered, preventing cascading failures.
CloudWatch and AWS X-Ray can further augment your observability into SNS message processing and detect potential issues.
Simple Queue Service (SQS)
SQS is a robust, fully managed queueing service designed to ensure that messages are processed at least once. Features like long polling reduce unnecessary responses, and the visibility timeout mechanism prevents duplicate processing.
For applications that require maintaining order, FIFO (First-In-First-Out) queues provide strict ordering, albeit with a trade-off in throughput.
SQS’s distributed architecture also provides excellent scalability across multiple servers.
Amazon MQ
Amazon MQ offers managed messaging brokers that support industry-standard protocols such as MQTT, AMQP, and more. With capabilities like message mirroring, redelivery policies, and priority dispatch, Amazon MQ ensures reliable message delivery even during broker failures.
Dead-letter queues are also available in Amazon MQ to handle messages that exceed the maximum redelivery count.
Additionally, priority dispatch policies and message eviction strategies further help prioritize and manage critical workloads.
EventBridge
Amazon EventBridge, the evolution of CloudWatch Events, is a serverless event bus that enables event-driven architectures by connecting AWS services with SaaS applications. It supports features like event replay, retry policies, and filtering to manage events appropriately.
Event replay capabilities allow you to resend unprocessed events, while integration with SQS dead-letter queues helps capture failed events.
Simple Email Service (SES)
Amazon Simple Email Service (SES) is a managed email service engineered for high deliverability. With built-in redundancy and virus scanning, SES integrates with CloudWatch for performance metrics while the Virtual Deliverability Manager helps optimize email campaigns.
A mailbox simulator is available in SES to test email rendering and performance across different providers.
Workflows and Orchestration
AWS Step Functions
AWS Step Functions offer serverless workflow orchestration with built-in error handling through retry and catch configurations. This visual state machine enables you to monitor workflow executions via both the console and CloudWatch.
Enhanced error handling is achievable by configuring catchers that manage unexpected failures within workflows.
For visual execution tracking, the Step Functions console provides detailed state transitions, error messages, and metrics.
Simple Workflow Service (SWF) and Managed Apache Airflow
While the Simple Workflow Service (SWF) is an older orchestration solution requiring manual setup and consideration of AZ distribution, it offers reliability for long-running workflows. For modern orchestration demands, Managed Workflows for Apache Airflow (MWAA) provides a scalable, managed solution for complex data orchestration tasks. Airflow automatically retries failed tasks and supports multi-AZ deployments to ensure high availability.
A typical MWAA configuration distributes tasks across multiple AZs with integrated CloudWatch logging for observability.
Conclusion
In summary, this lesson has covered a broad range of AWS services that enhance reliability and resiliency in application integration. From Auto Scaling and various load balancers to API Gateway, AppFlow, messaging services (SNS, SQS, Amazon MQ, and EventBridge), SES, and orchestration tools such as Step Functions and Managed Apache Airflow, the guiding principles of loose-coupling and redundancy are paramount. By integrating these services effectively, you can design robust, fault-tolerant architectures that gracefully handle failures and manage traffic surges.
Thank you for joining this lesson on turning up reliability in application integration. We hope these insights help you design resilient and scalable systems.
Watch Video
Watch video content