This article covers cluster maintenance tasks including checking nodes, draining, uncordoning, and managing applications during maintenance.
In this lesson, we walk through a cluster maintenance lab. We begin by exploring the current cluster environment and inspecting its nodes before proceeding to maintenance tasks on node01.
First, verify the cluster nodes by setting an alias for kubectl and running the following commands:
Copy
root@controlplane:~# alias k=kubectlroot@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 12m v1.20.0node01 Ready <none> 12m v1.20.0root@controlplane:~#
There are two nodes in the cluster: the control plane and node01.Next, check the deployments running in the default namespace:
Copy
root@controlplane:~# k get deployNAME READY UP-TO-DATE AVAILABLE AGEblue 3/3 3 3 7sroot@controlplane:~#
The output indicates that the “blue” deployment is running three pods.
It is time to remove node01 from scheduling temporarily for maintenance. Draining the node marks it as unschedulable and evicts non-DaemonSet pods. Executing the drain command without additional flags results in an error due to pods managed by DaemonSets:
Copy
root@controlplane:~# k drain node01node/node01 cordonederror: unable to drain node "node01", aborting command...There are pending nodes to be drained: node01error: cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-5hcnrroot@controlplane:~#
To successfully drain the node, use the --ignore-daemonsets flag, which bypasses eviction of DaemonSet-managed pods.
Running the command with the flag evicts the pods under the “blue” deployment:
Copy
root@controlplane:~# k drain node01 --ignore-daemonsetsnode/node01 already cordonedWARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-5hcwevicting pod default/blue-746c87566d-vq7pqevicting pod default/blue-746c87566d-bn5gcevicting pod default/blue-746c87566d-t2tgs
After waiting for the eviction, confirm that all pods have been rescheduled on the control plane:
Copy
root@controlplane:~# k get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODEblue-746c87566d-mp45f 1/1 Running 0 63s 10.244.0.5 controlplane <none>blue-746c87566d-mp8t8 1/1 Running 0 63s 10.244.0.6 controlplane <none>blue-746c87566d-vdx2b 1/1 Running 0 63s 10.244.0.4 controlplane <none>root@controlplane:~#
Also, verify the node status for node01:
Copy
root@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 15m v1.20.0node01 Ready,SchedulingDisabled 14m v1.20.0
To allow node01 to accept new pods, uncordon it. This command restores the node to a normal schedulable state:
Copy
root@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 15m v1.20.0node01 Ready,SchedulingDisabled 15m v1.20.0root@controlplane:~# k uncordon node01node "node01" uncordonedroot@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 15m v1.20.0node01 Ready <none> 15m v1.20.0
Note that even after uncordoning, existing pods remain on the control plane because they were already rescheduled there. Confirm that node01 currently has zero pods:
Copy
root@controlplane:~# k get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODEblue-746c87566d-mp45f 1/1 Running 0 2m14s 10.244.0.5 controlplane <none>blue-746c87566d-mpkt8 1/1 Running 0 2m14s 10.244.0.6 controlplane <none>blue-746c87566d-vdx2b 1/1 Running 0 2m14s 10.244.0.4 controlplane <none>root@controlplane:~#
Typically, control plane nodes are tainted to prevent regular application pods from being scheduled on them. In this scenario, the control plane lacks such taints, which explains why pods remained there. To check for taints, inspect node details:
Copy
Taints: <none>Unschedulable: false
This configuration is why pods did not automatically migrate back to node01 after being uncordoned.
In a subsequent maintenance window, another drain command is executed on node01:
Copy
root@controlplane:~# kubectl drain node01 --ignore-daemonsetsnode/node01 cordonederror: unable to drain node "node01", aborting command...There are pending nodes to be drained:node01error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/hr-approot@controlplane:~#
A new pod, “hr-app,” is running on node01 and is not controlled by a higher-level controller. Its details are:
Copy
root@controlplane:~# k get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODEblue-746c87566d-mp45f 1/1 Running 0 5m34s 10.244.0.5 controlplane <none>blue-746c87566d-mpkt8 1/1 Running 0 5m34s 10.244.0.6 controlplane <none>blue-746c87566d-vdx2b 1/1 Running 0 5m34s 10.244.0.4 controlplane <none>hr-app 1/1 Running 0 78s 10.244.1.5 node01 <none>root@controlplane:~#
Because “hr-app” isn’t managed by a Deployment (or equivalent controller), draining node01 without forcing eviction is blocked to prevent accidental data loss.
Forcing a drain on such pods using --force will delete them along with any local data permanently. Exercise caution when using such options.
If you force the drain, the pod is deleted as shown:
Copy
root@controlplane:~# kubectl drain node01 --ignore-daemonsets --forceWARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/hr-app; ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4k7hj, kube-system/kube-proxy-xy-5hcwrevicting pod default/hr-apppod/hr-app evictednode/node01 evictedroot@controlplane:~#
To prevent such disruptions for critical applications, the HR app is redeployed as a Deployment. This ensures that it is managed by a ReplicaSet, allowing automatic rescheduling if pods are evicted. Verify the updated deployments:
Copy
root@controlplane:~# k get deployNAME READY UP-TO-DATE AVAILABLE AGEblue 3/3 3 3 9m36shr-app 1/1 1 1 17sroot@controlplane:~#
Confirm that pods are now scheduled correctly:
Copy
root@controlplane:~# k get pods -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODEblue-746c87566d-mp45f 1/1 Running 0 8m29s 10.244.0.5 controlplane <none>blue-746c87566d-mpkt8 1/1 Running 0 8m29s 10.244.0.6 controlplane <none>blue-746c87566d-vdx2b 1/1 Running 0 8m29s 10.244.0.4 controlplane <none>hr-app-764d475c57d-9vmts 1/1 Running 0 24s 10.244.1.6 node01 <none>root@controlplane:~#
Because node01 is scheduled for maintenance and hosts the critical HR app, it is important to prevent new pods from being scheduled on it unintentionally. Although the HR app is now managed by a Deployment, new pods could still be scheduled on node01 if it remains in a “Ready” state.To prevent this, mark node01 as unschedulable (cordon it) without affecting already running pods:
Copy
root@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 23m v1.20.0node01 Ready <none> 22m v1.20.0root@controlplane:~# k cordon node01node/node01 cordonedroot@controlplane:~# k get nodesNAME STATUS ROLES AGE VERSIONcontrolplane Ready control-plane,master 23m v1.20.0node01 Ready,SchedulingDisabled <none> 22m v1.20.0
This action ensures that while the HR app continues running on node01, no new pods will be scheduled until maintenance is complete.This concludes the lab on cluster maintenance, covering topics such as draining nodes, uncordoning, and managing critical applications during maintenance windows.