Plan to scale Cluster autoscaling and Node auto provisioning

Cluster Autoscaler Overview
Core Functions
Node Pools as Managed Instance Groups
Scaling Logic
Node Auto-Provisioning
Key Benefits of GKE Autoscaler
Autoscaling Limits
Best Practices
References

Consider a builder constructing a house. He needs just the right number of carpenters, electricians, and other tradespeople to finish on time and within budget. In Google Kubernetes Engine (GKE), the Cluster Autoscaler plays that role—continuously monitoring workloads and adjusting node counts automatically.

Cluster Autoscaler Overview

GKE’s Cluster Autoscaler watches your pods and resizes node pools within the min/max bounds you define.

The image illustrates a GKE Autoscaler setup with a Kubernetes cluster consisting of three nodes, each running kubelet instances.

Core Functions

The image is a diagram illustrating the GKE Autoscaler, showing its functions like monitoring workload, scaling clusters, ensuring resources, and making workloads available, with a focus on a per-node pool and a specific node.

Monitors pending pods.
Scales up when there aren’t enough nodes.
Scales down when nodes are underutilized.
Ensures workloads remain available.

The image is a diagram illustrating the GKE Autoscaler, showing a structure with GKE at the top, followed by a per-node pool, and a node labeled "Node 1" with options to specify minimum and maximum values.

If your workloads don’t tolerate pod evictions—for example, single-replica controllers—a scale-down can cause service interruption. Always use multiple replicas so pods can reschedule without downtime.

The image is a diagram illustrating the concept of "GKE Autoscaler," featuring a central icon labeled "GKE" and a section marked "Autoscaling."

Node Pools as Managed Instance Groups

Cluster Autoscaler operates per node pool. Each pool corresponds to a Compute Engine managed instance group:

The image illustrates how a cluster autoscaler works in Google Kubernetes Engine (GKE), showing a node pool with options to increase or decrease its size.

Scaling Logic

Scaling decisions rely on pending pod resource requests, not on current CPU/memory utilization:

If pods remain unscheduled due to insufficient nodes, the autoscaler adds nodes until pods fit or the pool’s max is reached.
If nodes are underutilized and all pods fit on fewer nodes, it removes nodes down to the pool’s min.

Node Auto-Provisioning

With Node Auto-Provisioning enabled in a Standard GKE cluster, GKE creates and deletes node pools automatically based on pending pod requirements (CPU, memory, storage, affinities, taints, tolerations).

Cluster Mode	Node Pool Management	Configuration Needed
Autopilot	Fully managed by GKE	None
Standard	User-managed node pools	Enable and configure NAP

The image is a diagram illustrating Node Auto-Provisioning in Google Kubernetes Engine (GKE), comparing "Autopilot" where GKE manages node pool configuration, and "Standard" where the user manages it. It highlights considerations like CPU, memory, storage requests, and node affinities.

Key Benefits of GKE Autoscaler

Cost Reduction: Scale down when demand is low.
Improved Performance: Add nodes proactively to avoid resource exhaustion.
Enhanced Reliability: Maintain availability during failures and spikes.

The image is an infographic titled "Some Key Benefits," highlighting the advantages of GKE Autoscaler, including reduced costs, improved performance, and increased reliability.

Autoscaling Limits

Limit Category	Constraint
Cluster Size	Up to 15,000 nodes & 150,000 pods (project quotas apply)
Node Pool Size	Defined by Compute Engine quotas and zone-specific limits
Scaling Interval	Evaluations run approximately every 10 seconds
Scaling Policies	Based on pending pod requests and available project resources

Best Practices

Use multiple node pools to isolate workloads by resource profiles or zones.
Enable Cluster Autoscaler and Node Auto-Provisioning to match real-time demand.
Configure monitoring and alerts for CPU, memory, and pod scheduling metrics.

Review your Compute Engine quota limits and scaling policies regularly to prevent quota exhaustion during high-traffic periods.

The image provides additional tips for managing clusters, including using multiple node pools, employing autoscalers for automatic scaling, and configuring performance metrics.

References

Watch Video

Demo Apply labels and tags to GKE clusters

Demo Configure cluster autoscaling

⌘I

Introduction

High Level Overview

Playground Instructions

GKE Deployment and Administration

Networking for GKE clusters

Managing Security Aspects

Plan Deploy And Manage Workloads On GKE

GKE Design Considerations

Its a wrap

Plan to scale Cluster autoscaling and Node auto provisioning

Cluster Autoscaler Overview

Core Functions

Node Pools as Managed Instance Groups

Scaling Logic

Node Auto-Provisioning

Key Benefits of GKE Autoscaler

Autoscaling Limits

Best Practices

References

Watch Video

Introduction

High Level Overview

Playground Instructions

GKE Deployment and Administration

Networking for GKE clusters

Managing Security Aspects

Plan Deploy And Manage Workloads On GKE

GKE Design Considerations

Its a wrap

​Cluster Autoscaler Overview

​Core Functions

​Node Pools as Managed Instance Groups

​Scaling Logic

​Node Auto-Provisioning

​Key Benefits of GKE Autoscaler

​Autoscaling Limits

​Best Practices

​References

Watch Video

Cluster Autoscaler Overview

Core Functions

Node Pools as Managed Instance Groups

Scaling Logic

Node Auto-Provisioning

Key Benefits of GKE Autoscaler

Autoscaling Limits

Best Practices

References