CKA Certification Course - Certified Kubernetes Administrator
Cluster Maintenance
Solution OS Upgrades optional
In this lesson, we walk through a cluster maintenance lab. We begin by exploring the current cluster environment and inspecting its nodes before proceeding to maintenance tasks on node01.
Checking Nodes and Deployments
First, verify the cluster nodes by setting an alias for kubectl and running the following commands:
root@controlplane:~# alias k=kubectl
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 12m v1.20.0
node01 Ready <none> 12m v1.20.0
root@controlplane:~#
There are two nodes in the cluster: the control plane and node01.
Next, check the deployments running in the default namespace:
root@controlplane:~# k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
blue 3/3 3 3 7s
root@controlplane:~#
The output indicates that the "blue" deployment is running three pods.
Locating the Pods
To determine which nodes are hosting these application pods, use the wide option to list pods with additional details:
root@controlplane:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
blue-746c87566d-mp45f 1/1 Running 0 63s 10.244.0.5 controlplane <none>
blue-746c87566d-mp8t8 1/1 Running 0 63s 10.244.0.6 controlplane <none>
blue-746c87566d-vdx2b 1/1 Running 0 63s 10.244.0.4 controlplane <none>
root@controlplane:~#
All pods are currently running on the control plane node.
Draining Node01 for Maintenance
It is time to remove node01 from scheduling temporarily for maintenance. Draining the node marks it as unschedulable and evicts non-DaemonSet pods. Executing the drain command without additional flags results in an error due to pods managed by DaemonSets:
root@controlplane:~# k drain node01
node/node01 cordoned
error: unable to drain node "node01", aborting command...
There are pending nodes to be drained:
node01
error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-5hcnr
root@controlplane:~#
Tip
To successfully drain the node, use the --ignore-daemonsets
flag, which bypasses eviction of DaemonSet-managed pods.
Running the command with the flag evicts the pods under the "blue" deployment:
root@controlplane:~# k drain node01 --ignore-daemonsets
node/node01 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-5hcw
evicting pod default/blue-746c87566d-vq7pq
evicting pod default/blue-746c87566d-bn5gc
evicting pod default/blue-746c87566d-t2tgs
After waiting for the eviction, confirm that all pods have been rescheduled on the control plane:
root@controlplane:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
blue-746c87566d-mp45f 1/1 Running 0 63s 10.244.0.5 controlplane <none>
blue-746c87566d-mp8t8 1/1 Running 0 63s 10.244.0.6 controlplane <none>
blue-746c87566d-vdx2b 1/1 Running 0 63s 10.244.0.4 controlplane <none>
root@controlplane:~#
Also, verify the node status for node01:
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 15m v1.20.0
node01 Ready,SchedulingDisabled 14m v1.20.0
Node01 now displays as "SchedulingDisabled."
Uncordoning Node01
To allow node01 to accept new pods, uncordon it. This command restores the node to a normal schedulable state:
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 15m v1.20.0
node01 Ready,SchedulingDisabled 15m v1.20.0
root@controlplane:~# k uncordon node01
node "node01" uncordoned
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 15m v1.20.0
node01 Ready <none> 15m v1.20.0
Note that even after uncordoning, existing pods remain on the control plane because they were already rescheduled there. Confirm that node01 currently has zero pods:
root@controlplane:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
blue-746c87566d-mp45f 1/1 Running 0 2m14s 10.244.0.5 controlplane <none>
blue-746c87566d-mpkt8 1/1 Running 0 2m14s 10.244.0.6 controlplane <none>
blue-746c87566d-vdx2b 1/1 Running 0 2m14s 10.244.0.4 controlplane <none>
root@controlplane:~#
Scheduling Considerations on the Control Plane
Typically, control plane nodes are tainted to prevent regular application pods from being scheduled on them. In this scenario, the control plane lacks such taints, which explains why pods remained there. To check for taints, inspect node details:
Taints: <none>
Unschedulable: false
This configuration is why pods did not automatically migrate back to node01 after being uncordoned.
Second Maintenance Window and the HR App Issue
In a subsequent maintenance window, another drain command is executed on node01:
root@controlplane:~# kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
error: unable to drain node "node01", aborting command...
There are pending nodes to be drained:
node01
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-app
root@controlplane:~#
A new pod, "hr-app," is running on node01 and is not controlled by a higher-level controller. Its details are:
root@controlplane:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
blue-746c87566d-mp45f 1/1 Running 0 5m34s 10.244.0.5 controlplane <none>
blue-746c87566d-mpkt8 1/1 Running 0 5m34s 10.244.0.6 controlplane <none>
blue-746c87566d-vdx2b 1/1 Running 0 5m34s 10.244.0.4 controlplane <none>
hr-app 1/1 Running 0 78s 10.244.1.5 node01 <none>
root@controlplane:~#
Because "hr-app" isn’t managed by a Deployment (or equivalent controller), draining node01 without forcing eviction is blocked to prevent accidental data loss.
Warning
Forcing a drain on such pods using --force
will delete them along with any local data permanently. Exercise caution when using such options.
If you force the drain, the pod is deleted as shown:
root@controlplane:~# kubectl drain node01 --ignore-daemonsets --force
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/hr-app; ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-xy-5hcwr
evicting pod default/hr-app
pod/hr-app evicted
node/node01 evicted
root@controlplane:~#
To prevent such disruptions for critical applications, the HR app is redeployed as a Deployment. This ensures that it is managed by a ReplicaSet, allowing automatic rescheduling if pods are evicted. Verify the updated deployments:
root@controlplane:~# k get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
blue 3/3 3 3 9m36s
hr-app 1/1 1 1 17s
root@controlplane:~#
Confirm that pods are now scheduled correctly:
root@controlplane:~# k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
blue-746c87566d-mp45f 1/1 Running 0 8m29s 10.244.0.5 controlplane <none>
blue-746c87566d-mpkt8 1/1 Running 0 8m29s 10.244.0.6 controlplane <none>
blue-746c87566d-vdx2b 1/1 Running 0 8m29s 10.244.0.4 controlplane <none>
hr-app-764d475c57d-9vmts 1/1 Running 0 24s 10.244.1.6 node01 <none>
root@controlplane:~#
Preventing Future Scheduling on Node01
Because node01 is scheduled for maintenance and hosts the critical HR app, it is important to prevent new pods from being scheduled on it unintentionally. Although the HR app is now managed by a Deployment, new pods could still be scheduled on node01 if it remains in a "Ready" state.
To prevent this, mark node01 as unschedulable (cordon it) without affecting already running pods:
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 23m v1.20.0
node01 Ready <none> 22m v1.20.0
root@controlplane:~# k cordon node01
node/node01 cordoned
root@controlplane:~# k get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane,master 23m v1.20.0
node01 Ready,SchedulingDisabled <none> 22m v1.20.0
This action ensures that while the HR app continues running on node01, no new pods will be scheduled until maintenance is complete.
This concludes the lab on cluster maintenance, covering topics such as draining nodes, uncordoning, and managing critical applications during maintenance windows.
Tables and References
Below is a summary table of key cluster commands:
Command | Description | Example Output Snippet |
---|---|---|
kubectl get nodes | Lists all nodes in the cluster | "controlplane Ready control-plane,master" |
kubectl drain <node> --ignore-daemonsets | Drains a node while ignoring DaemonSet-managed pods | "node/node01 cordoned" |
kubectl uncordon <node> | Marks a node as schedulable again | "node "node01" uncordoned" |
For more information on Kubernetes cluster operations and maintenance, check out the following resources:
Watch Video
Watch video content