Kubernetes Autoscaling
Vertical Pod Autoscaler VPA
VPA CPU Lab
In this hands-on lab, you will:
- Deploy a Flask sample application on Kubernetes.
- Monitor CPU utilization using
kubectl top
. - Create a Vertical Pod Autoscaler (VPA) manifest to gather CPU recommendations.
- Generate load and validate the VPA’s CPU recommendations.
Prerequisites
- A running Kubernetes cluster (v1.18+).
- Metrics Server installed for
kubectl top
. kubectl
configured to target your cluster.
Note
Ensure the Metrics Server is deployed in your cluster so you can retrieve pod metrics.
1. Deploy the Flask Sample Application
Apply the provided deployment manifest to launch the Flask app named flask-app-4
:
kubectl apply -f flask-app-deployment.yaml
Once the pods are ready, verify CPU usage:
kubectl top pods
Expected output:
NAME CPU(cores) MEMORY(bytes)
flask-app-4-<pod-id> 1m <some-memory>
You should see the pod consuming around 1 mCPU, indicating minimal load.
2. Create the VPA Configuration
Next, define a VPA object that collects CPU recommendations without modifying the pods. Save this as vpa-cpu.yml
:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: flask-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: flask-app-4
updatePolicy:
updateMode: "Off" # Only recommendations; no automatic updates
resourcePolicy:
containerPolicies:
- containerName: '*' # Apply to all containers
minAllowed:
cpu: 100m
maxAllowed:
cpu: 1000m
controlledResources: ["cpu"]
This manifest:
- Sets 100 mCPU as the minimum and 1000 mCPU (1 CPU) as the maximum.
- Restricts recommendations to CPU only.
Warning
Using updateMode: "Off"
means your pods will not be resized automatically. Switch to Auto
if you want VPA to apply changes.
You can compare other updateMode
options:
updateMode | Description |
---|---|
Off | Only provide recommendations; no pod modifications |
Initial | Apply recommendations on first pod creation |
Auto | Automatically update requests based on VPA advice |
For more details, see the VerticalPodAutoscaler API reference.
3. Apply the VPA Manifest
Create the VPA resource:
kubectl apply -f vpa-cpu.yml
You should see:
verticalpodautoscaler.autoscaling.k8s.io/flask-app created
4. Inspect Initial Recommendations
Before generating any load, fetch the current VPA status:
kubectl get vpa flask-app -o yaml
Because the app is idle, the recommendation will default to the minimum bound (around 100 mCPU).
5. Generate Load and Validate Recommendations
Use your preferred load-testing tool (e.g., hey
, ab
, wrk
) to apply CPU pressure:
hey -z 30s -c 50 http://<service-ip>/
Once the load test completes, check the VPA again:
kubectl get vpa flask-app -o yaml
You should see something like:
status:
recommendation:
containerRecommendations:
- containerName: flask-app-4
lowerBound:
cpu: 100m
target:
cpu: 126m
uncappedTarget:
cpu: 126m
upperBound:
cpu: "1"
Here, the VPA now recommends 126 mCPU, reflecting the increased CPU demand.
Summary
In this lab you have:
- Deployed a Flask application and observed its CPU usage.
- Created a VPA manifest to collect CPU recommendations.
- Inspected initial recommendations at the minimum setting.
- Generated workload to trigger higher CPU recommendations.
Feel free to switch updateMode
to Auto
and watch VPA adjust pod resource requests automatically.
Links and References
- Kubernetes Vertical Pod Autoscaler
- Metrics Server on GitHub
- kubectl top – Resource Metrics
- Load Testing with hey
Watch Video
Watch video content
Practice Lab
Practice lab