AWS Solutions Architect Associate Certification
Designing for Reliability
Turning up Reliability on Network Services
Welcome to this detailed guide on designing reliable network services in AWS. This article covers core concepts, best practices, and architectural decisions to enhance the resilience of your network infrastructure. Throughout the guide, you will find technical diagrams that illustrate each concept clearly.
Core Networking and VPC Architecture
When discussing core networking, we primarily refer to Virtual Private Clouds (VPCs). In a traditional VPC setup, you typically see the following components:
- The AWS account, region, and the VPC itself.
- Internet Gateway and IPv6 egress-only gateway.
- Public and private subnets within a specific Availability Zone.
- A load balancer spanning public subnets.
- A NAT gateway performing outbound internet traffic translation.
- DNS resolvers and route tables assigned to each subnet.
- Security groups safeguarding EC2 instances in private subnets.
- VPC endpoints for services (gateway endpoints for S3/DynamoDB and interface endpoints for other services).
Additionally, some AWS services—such as Route 53 and Simple Storage Service (S3)—are hosted in separate data centers within the same region, ensuring isolation from EC2 subnets.
Connecting Departmental VPCs
Consider a scenario where departmental VPCs need to be interconnected with a shared VPC while maintaining isolation between departments. The following options are commonly considered:
- AWS Transit Gateway: Offers a highly scalable solution for connecting multiple VPCs.
- VPC Peering: Suitable for simpler setups with four or five VPCs, but less scalable.
- Direct Connect and VPN connections: Generally used to connect back to your corporate network rather than interconnecting VPCs directly.
For multi-VPC environments, AWS Transit Gateway is the recommended solution. In simpler architectures, VPC Peering might be acceptable.
Enhancing VPC Reliability and High Availability
To build a highly reliable network, deploy applications across multiple Availability Zones within your VPC. Key considerations include:
- Hosting workloads in private subnets.
- Positioning load balancers in public subnets for secure, internet-facing access.
- Implementing a transit VPC connected with VPN gateways across several VPCs for inherent resiliency.
Note that while default VPCs are convenient for testing, they are not recommended for production environments. Creating a custom VPC with configurations that prevent single points of failure is best practice. Spreading resources across multiple subnets ensures effective failover; for example, if a load balancer spans two public subnets and one goes down, the other can continue to handle traffic.
Routing and Traffic Segmentation
Routing is critical for maintaining network reliability. By using separate routing tables at both the subnet and VPC levels, you can control traffic flow effectively. Consider a multi-tier application where:
- The web tier requires access only to the Internet Gateway.
- The application tier communicates with both the web tier and the database tier.
- The database tier remains isolated from direct internet access to enhance security.
This segmented approach minimizes vulnerabilities and reduces dependency on a single routing table.
Internet Gateways, NAT Gateways, and Egress-Only Gateways
AWS handles critical network components with built-in redundancy:
- Internet Gateway: Although it appears as a single device, it consists of multiple devices managed transparently by AWS.
- NAT Gateway: Recommended for instances in private subnets to handle outbound traffic. A NAT gateway should be deployed in each Availability Zone.
- Egress-Only Gateway: Specifically used for enabling outbound IPv6 connectivity without allowing inbound traffic.
When exam scenarios require outbound IPv6 access without inbound connectivity, an egress-only Internet gateway is the ideal choice.
Managing IP Addresses and Elastic IPs
Effective IP address management is essential for network reliability. Best practices include:
- Allocating a large CIDR block for your VPC.
- Assigning smaller CIDR blocks to public subnets and larger ones to private subnets to avoid IP range overlap.
- Supporting both IPv4 and IPv6 structures.
Elastic IP addresses improve network resiliency by allowing reassignment to other network interfaces or EC2 instances in case of failure. This is especially useful for legacy applications that do not support load balancing.
For high-throughput workloads, choose advanced network interfaces such as the Elastic Network Adapter (ENA) or Elastic Fabric Adapter (EFA).
For instance, ENA can improve throughput and reduce latency, critical for real-time applications such as multiplayer gaming backends or high-performance computing.
For ultra-low latency scenarios, consider the Elastic Fabric Adapter, which bypasses parts of the operating system to accelerate performance.
Security Devices and Load Balancers
AWS leverages virtual network security devices with built-in redundancy:
- Network ACLs and Security Groups: These devices prevent single points of failure while enforcing security across multiple Availability Zones.
- Application and Network Load Balancers: These load balancers distribute traffic across Availability Zones. Enabling cross-zone load balancing enhances the resiliency of your applications.
For UDP-based applications, a Network Load Balancer can be configured per Availability Zone with cross-zone load balancing enabled.
Gateway load balancers, used for traffic inspection with firewall appliances, are also engineered for resilience and often deployed with auto-scaling groups to handle failovers effortlessly.
Transit Networking: VPN, Direct Connect, and Transit Gateway
Hybrid networking scenarios require redundant connections for high availability. When migrating an application to AWS that must connect to on-premises networks, consider the following:
- Transit Gateway with Dual VPN Connections: Enhances resiliency through multiple connectivity paths.
- Redundant Direct Connect Circuits: Important for ensuring continuous high-bandwidth connections.
Since both Direct Connect and VPN offer single connectivity paths, adding a second connection is critical for true redundancy.
Within AWS, interconnections such as VPC Peering and Transit Gateway benefit from AWS’s highly redundant physical network infrastructure.
Endpoints and PrivateLink
VPC endpoints provide secure and low-latency connectivity without traversing the public Internet. This is achieved by using:
- Gateway Endpoints: For services like S3 and DynamoDB.
- Interface Endpoints via PrivateLink: These endpoints, while not placed in subnets, offer high scalability and redundancy through AWS’s internal infrastructure.
Edge Networking with CloudFront
Amazon CloudFront is AWS’s global content delivery network (CDN), featuring numerous edge locations that provide low latency and high availability worldwide. Key points include:
- A single CloudFront distribution leverages multiple edge locations to ensure resilience.
- Combining CloudFront with DNS failover and regional origins offers robust global content delivery.
- CloudFront routes requests to regional servers that retrieve the content from the origin (e.g., an S3 bucket) and then cache it at the edge.
For dynamic content, AWS offers CloudFront Functions and Lambda@Edge, which execute code at edge locations for low-latency processing of viewer requests and origin responses.
DNS and Route 53 for Network Discovery
AWS Route 53 provides Domain Name System (DNS) services with a 100% SLA, utilizing multiple DNS resolvers per subnet to avoid single points of failure. Key features include:
- Use of various routing policies (latency-based, weighted, failover, geolocation) to direct traffic across regions.
- In a multi-region setup, Route 53 health checks trigger failover routing to a standby site when the primary site is unavailable.
- The Route 53 Application Recovery Controller conducts deep analytics to facilitate automated failover decisions.
Summary
This guide reviewed several critical aspects of ensuring network service reliability on AWS:
- Designing VPCs with multiple subnets across different Availability Zones.
- Connecting departmental VPCs using AWS Transit Gateway and VPC Peering.
- Configuring routing tables to segregate traffic among different application tiers.
- Securing outbound connectivity using NAT and egress-only gateways.
- Improving network resiliency with Elastic IPs and advanced network interfaces.
- Leveraging global AWS services such as CloudFront and Route 53 designed with high availability in mind.
Key Takeaway
AWS’s robust and inherently redundant infrastructure allows you to focus on deploying resilient applications without worrying about the underlying network reliability.
Thank you for reading this guide on enhancing network service reliability. Continue to explore further AWS services to deepen your understanding of best practices in network design and resilience.
For additional resources and in-depth documentation, please refer to the AWS Documentation and the AWS Well-Architected Framework.
Watch Video
Watch video content