Kubernetes Autoscaling

Vertical Pod Autoscaler VPA

VPA CPU Lab

In this hands-on lab, you will:

  1. Deploy a Flask sample application on Kubernetes.
  2. Monitor CPU utilization using kubectl top.
  3. Create a Vertical Pod Autoscaler (VPA) manifest to gather CPU recommendations.
  4. Generate load and validate the VPA’s CPU recommendations.

The image is a lab overview with three steps: deploying a sample application, monitoring application resource usage, and applying VPA configuration to capture recommendations. It includes a stylized computer icon.


Prerequisites

  • A running Kubernetes cluster (v1.18+).
  • Metrics Server installed for kubectl top.
  • kubectl configured to target your cluster.

Note

Ensure the Metrics Server is deployed in your cluster so you can retrieve pod metrics.


1. Deploy the Flask Sample Application

Apply the provided deployment manifest to launch the Flask app named flask-app-4:

kubectl apply -f flask-app-deployment.yaml

Once the pods are ready, verify CPU usage:

kubectl top pods

Expected output:

NAME                          CPU(cores)   MEMORY(bytes)
flask-app-4-<pod-id>          1m           <some-memory>

You should see the pod consuming around 1 mCPU, indicating minimal load.


2. Create the VPA Configuration

Next, define a VPA object that collects CPU recommendations without modifying the pods. Save this as vpa-cpu.yml:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: flask-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: flask-app-4
  updatePolicy:
    updateMode: "Off"         # Only recommendations; no automatic updates
  resourcePolicy:
    containerPolicies:
      - containerName: '*'    # Apply to all containers
        minAllowed:
          cpu: 100m
        maxAllowed:
          cpu: 1000m
        controlledResources: ["cpu"]

This manifest:

  • Sets 100 mCPU as the minimum and 1000 mCPU (1 CPU) as the maximum.
  • Restricts recommendations to CPU only.

Warning

Using updateMode: "Off" means your pods will not be resized automatically. Switch to Auto if you want VPA to apply changes.

You can compare other updateMode options:

updateModeDescription
OffOnly provide recommendations; no pod modifications
InitialApply recommendations on first pod creation
AutoAutomatically update requests based on VPA advice

For more details, see the VerticalPodAutoscaler API reference.


3. Apply the VPA Manifest

Create the VPA resource:

kubectl apply -f vpa-cpu.yml

You should see:

verticalpodautoscaler.autoscaling.k8s.io/flask-app created

4. Inspect Initial Recommendations

Before generating any load, fetch the current VPA status:

kubectl get vpa flask-app -o yaml

Because the app is idle, the recommendation will default to the minimum bound (around 100 mCPU).


5. Generate Load and Validate Recommendations

Use your preferred load-testing tool (e.g., hey, ab, wrk) to apply CPU pressure:

hey -z 30s -c 50 http://<service-ip>/

Once the load test completes, check the VPA again:

kubectl get vpa flask-app -o yaml

You should see something like:

status:
  recommendation:
    containerRecommendations:
      - containerName: flask-app-4
        lowerBound:
          cpu: 100m
        target:
          cpu: 126m
        uncappedTarget:
          cpu: 126m
        upperBound:
          cpu: "1"

Here, the VPA now recommends 126 mCPU, reflecting the increased CPU demand.


Summary

In this lab you have:

  1. Deployed a Flask application and observed its CPU usage.
  2. Created a VPA manifest to collect CPU recommendations.
  3. Inspected initial recommendations at the minimum setting.
  4. Generated workload to trigger higher CPU recommendations.

Feel free to switch updateMode to Auto and watch VPA adjust pod resource requests automatically.


Watch Video

Watch video content

Practice Lab

Practice lab

Previous
VPA Memory Lab