Kubernetes Autoscaling

Horizontal Pod Autoscaler HPA

HPA Scaling Policy

In this guide, you'll learn how to configure Kubernetes Horizontal Pod Autoscaler (HPA) scaling policies. HPA scaling policies define rules that direct the control plane on adjusting pod replicas based on metrics like CPU, memory, or custom/external metrics. Think of these rules as the “brain” that scales your application in sync with demand—adding pods when needed and removing them when load subsides.

Defining Scaling Policies

Under the HPA manifest’s behavior section, the policies field specifies how scaling decisions occur. Each policy entry includes:

  • type: Pods for a fixed number of pods or Percent for a percentage of current replicas.
  • value: A numeric value corresponding to the type.
  • periodSeconds: Minimum interval between consecutive scaling actions under this rule.

Example configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 100
  behavior:
    scaleUp:
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
        - type: Percent
          value: 10
          periodSeconds: 60
TypeValueperiodSecondsEffect
Pods460Adds up to 4 pods every 60 seconds
Percent1060Increases replicas by up to 10% every 60 seconds

HPA fetches metrics every 15 seconds by default but only enforces a scaling action when the associated periodSeconds has elapsed. Combining fixed and percentage-based policies provides precise control over autoscaling behavior.

Custom Metric Intervals

You can adjust the default 15-second metric scraping interval via your metrics-server or an external metrics adapter configuration.

Stabilization Window

To prevent rapid fluctuations (thrashing) in pod counts, HPA uses a stabilization window. This buffer retains recent scaling recommendations and ensures you scale up quickly while scaling down more conservatively.

Window TypeDefault IntervalPurpose
scale-up0 secondsImmediate scaling up based on current metrics
scale-down300 seconds (5 min)Delays scale-down by using the highest recommendation in the last 5 min

Customizing the Scale-Down Window

Adjust your HPA manifest to modify the scale-down stabilization window:

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

With a 300-second window:

  1. At minute 1, metrics recommend scaling up to 40 pods.
  2. At minute 2, metrics suggest scaling down to 30 pods, but the window holds at 40 pods.
  3. From minutes 3–5, even if load decreases further, HPA maintains 40 pods until the buffer expires.

The image shows a stabilization window chart with time intervals and corresponding pod counts, indicating a five-minute stabilization window.

This approach ensures your application rapidly scales up under load while avoiding premature downscaling that could impact performance.

Further Reading

Watch Video

Watch video content

Previous
HPA Multiple Metrics