Kubernetes Autoscaling

Horizontal Pod Autoscaler HPA

Scaling Behavior

Welcome to this guide on Kubernetes Horizontal Pod Autoscaler (HPA) scaling behavior. We’ll explore how HPA automatically adjusts pod replica counts to maintain performance, optimize resource usage, and control costs.

HPA Scaling Overview

The HPA controller periodically observes metrics and scales the number of pods in a Deployment, StatefulSet, or ReplicaSet. It supports:

Metric TypeSourceUse Case
ResourceNative Kubernetes metrics (CPU, memory)Basic autoscaling based on pod resource consumption
CustomIn-cluster metrics exposed via Metrics APIApplication-level metrics (e.g., QPS, latency)
ExternalMetrics from outside the cluster (Prometheus, etc.)Third-party or business metrics

Note

By default, HPA syncs every 15 seconds. You can adjust this with the --horizontal-pod-autoscaler-sync-period flag on the controller manager.

When a metric crosses your defined threshold (for example, CPU utilization > 70%), HPA calculates a new replica count and applies any configured scale-up policies:

The image outlines the scale-up scaling behavior in three steps: Trigger, Action, and Policy Configuration, explaining how metrics like CPU utilization lead to increased pod replicas and policy settings for scaling rates.

  1. Trigger: Observed metric exceeds the target (e.g., CPU > 70%).
  2. Action: HPA computes the desired replicas based on current usage.
  3. Policy Configuration: Applies your scale-up limits (percent or count) to control growth rate.

You can tune:

  • minReplicas and maxReplicas
  • Percent-based vs. fixed-count scaling
  • Resource, custom, or external metrics

Scale-Down and Stabilization Window

Scale-down operates similarly but adds a stabilization window to prevent rapid oscillations during brief low-load periods:

The image explains the "Scale-Down Scaling Behavior" in three steps: Trigger, Action, and Stabilization Window, detailing how HPA (Horizontal Pod Autoscaler) manages pod replicas based on metrics.

StepDescription
TriggerMetric drops below the defined target
Stabilization WindowWaits for a cool-off period (default 300 s) before scaling down
ActionApplies scale-down policies (percent or count) to reduce replicas

Warning

Setting a very short stabilization window can cause pod thrashing. Ensure it aligns with your application’s cool-off characteristics.

Example HPA Configuration

Below is an autoscaling/v2beta2 manifest that scales a Deployment named my-app-deployment between 1 and 10 replicas. It targets 50% average CPU, allows instant scale-up up to 100% more pods per minute, and delays scale-down by 5 minutes, reducing only 10% per minute.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Note

Percent-based scaling policies let you grow or shrink pods relative to the current replica count. Use count-based if you need fixed increments.

Best Practices

Implement these recommendations to ensure reliable and efficient autoscaling:

PracticeBenefit
Align stabilization windows with app cool-offPrevents rapid up/down scaling oscillations
Define clear, testable policiesEnsures predictable scaling behavior
Use metrics tied to business or user experienceOptimizes for actual demand, not just resource
Continuously monitor and adjustKeeps autoscaling tuned to evolving workloads

The image outlines best practices for scaling behavior, including avoiding rapid scaling, having clear policies, and monitoring and adjusting.

Further Reading

Watch Video

Watch video content

Practice Lab

Practice lab

Previous
HPA Scaling Policy