KodeKloud Notes

In this lesson, you'll learn about the Horizontal Pod Autoscaler (HPA) in Kubernetes and discover how it automatically adjusts the number of running pods based on the current resource usage. As its name implies, the HPA scales the pods horizontally—deploying more instances when demand increases and removing excess pods when demand decreases.

Key Concept

The HPA is one of Kubernetes' many controllers that works by adjusting the replica count of a Deployment. It scales up the replicas as demand grows and scales them down when the pressure eases.

How the HPA Works

The HPA monitors your application's resource consumption by leveraging metrics collected from a metrics server. This server continuously tracks resource usage across nodes and pods in the cluster, providing accurate data for the HPA to decide when to scale.

The HPA configuration typically involves:

Defining a Deployment for the application.
Setting resource limits and requests within the Deployment.
Creating an HPA object that targets the Deployment.

Let's explore an example configuration.

Example Configuration

Below is an example YAML configuration for a Deployment that uses a container image with defined CPU resource limits:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m

The following YAML defines the Horizontal Pod Autoscaler to ensure the Deployment can scale based on CPU utilization:

# hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Explanation

The scaleTargetRef section specifies that the HPA targets the Deployment named "myapp".
The autoscaler will maintain the replica count between 1 and 10 based on CPU usage.
It continuously monitors the average CPU utilization and aims to maintain a 50% utilization threshold across all pods.

Managing the HPA

To create the HPA, run the following command:

kubectl create -f hpa.yaml

Upon successful creation, you will receive a confirmation message:

HorizontalPodAutoscaler Created

To view details of the created HPA, use the following command:

kubectl get hpa

The expected output might resemble:

NAME        REFERENCE             TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
myapp-hpa   Deployment/myapp      50%/50%      1         10        3          10m

In the output:

The "TARGETS" column shows both the current and the desired average CPU utilization.
"MINPODS" and "MAXPODS" indicate the minimum and maximum replica limits.
"REPLICAS" reflects the current number of running pods.
"AGE" shows the duration for which the HPA has been active.

Caution

Ensure your metrics server is properly configured and running, as the HPA relies on accurate metrics to make scaling decisions.

To remove the HPA, execute the following command:

kubectl delete hpa myapp-hpa

Conclusion

This lesson demonstrated how to configure and monitor a Horizontal Pod Autoscaler for a Kubernetes Deployment. By dynamically adjusting the number of pods according to real-time resource usage, Kubernetes ensures that your application efficiently handles varying loads while optimizing resource use.

Watch Video

Watch video content