Plan to scale Cluster autoscaling and Node auto provisioning

Consider a builder constructing a house. He needs just the right number of carpenters, electricians, and other tradespeople to finish on time and within budget. In Google Kubernetes Engine (GKE), the Cluster Autoscaler plays that role—continuously monitoring workloads and adjusting node counts automatically.

Cluster Autoscaler Overview

GKE’s Cluster Autoscaler watches your pods and resizes node pools within the min/max bounds you define.

The image illustrates a GKE Autoscaler setup with a Kubernetes cluster consisting of three nodes, each running kubelet instances.

Core Functions

The image is a diagram illustrating the GKE Autoscaler, showing its functions like monitoring workload, scaling clusters, ensuring resources, and making workloads available, with a focus on a per-node pool and a specific node.

Monitors pending pods.
Scales up when there aren’t enough nodes.
Scales down when nodes are underutilized.
Ensures workloads remain available.

The image is a diagram illustrating the GKE Autoscaler, showing a structure with GKE at the top, followed by a per-node pool, and a node labeled "Node 1" with options to specify minimum and maximum values.

Note

If your workloads don’t tolerate pod evictions—for example, single-replica controllers—a scale-down can cause service interruption. Always use multiple replicas so pods can reschedule without downtime.

The image is a diagram illustrating the concept of "GKE Autoscaler," featuring a central icon labeled "GKE" and a section marked "Autoscaling."

Node Pools as Managed Instance Groups

Cluster Autoscaler operates per node pool. Each pool corresponds to a Compute Engine managed instance group:

The image illustrates how a cluster autoscaler works in Google Kubernetes Engine (GKE), showing a node pool with options to increase or decrease its size.

Scaling Logic

Scaling decisions rely on pending pod resource requests, not on current CPU/memory utilization:

If pods remain unscheduled due to insufficient nodes, the autoscaler adds nodes until pods fit or the pool’s max is reached.
If nodes are underutilized and all pods fit on fewer nodes, it removes nodes down to the pool’s min.

Node Auto-Provisioning

With Node Auto-Provisioning enabled in a Standard GKE cluster, GKE creates and deletes node pools automatically based on pending pod requirements (CPU, memory, storage, affinities, taints, tolerations).

The image is a diagram illustrating "Node Auto-Provisioning" in Google Kubernetes Engine (GKE), showing a hierarchy from GKE to Node Pool 1 and Node 1.

Cluster Mode	Node Pool Management	Configuration Needed
Autopilot	Fully managed by GKE	None
Standard	User-managed node pools	Enable and configure NAP

The image is a diagram illustrating Node Auto-Provisioning in Google Kubernetes Engine (GKE), comparing "Autopilot" where GKE manages node pool configuration, and "Standard" where the user manages it. It highlights considerations like CPU, memory, storage requests, and node affinities.

Key Benefits of GKE Autoscaler

Cost Reduction: Scale down when demand is low.
Improved Performance: Add nodes proactively to avoid resource exhaustion.
Enhanced Reliability: Maintain availability during failures and spikes.

The image is an infographic titled "Some Key Benefits," highlighting the advantages of GKE Autoscaler, including reduced costs, improved performance, and increased reliability.

Autoscaling Limits

Limit Category	Constraint
Cluster Size	Up to 15,000 nodes & 150,000 pods (project quotas apply)
Node Pool Size	Defined by Compute Engine quotas and zone-specific limits
Scaling Interval	Evaluations run approximately every 10 seconds
Scaling Policies	Based on pending pod requests and available project resources

The image is a diagram titled "Autoscaling Limits," showing four categories: Cluster Size, Node Pool Size, Scaling Intervals, and Scaling Policies, each represented by icons and vertical bars.

Best Practices

Use multiple node pools to isolate workloads by resource profiles or zones.
Enable Cluster Autoscaler and Node Auto-Provisioning to match real-time demand.
Configure monitoring and alerts for CPU, memory, and pod scheduling metrics.

Warning

Review your Compute Engine quota limits and scaling policies regularly to prevent quota exhaustion during high-traffic periods.

The image provides additional tips for managing clusters, including using multiple node pools, employing autoscalers for automatic scaling, and configuring performance metrics.

References

Watch Video

Watch video content