CKA Certification Course - Certified Kubernetes Administrator
Troubleshooting
Control Plane Failure
In this guide, we explore effective methods for troubleshooting control plane failures in your Kubernetes cluster. The process begins with checking the health of the nodes, followed by verifying the status of control plane components—whether they are deployed as pods or native services—and finally, reviewing detailed logs to identify any issues.
1. Verify Cluster Node Health
Begin by ensuring all cluster nodes are operational. Execute the command below to list the status of each node:
kubectl get nodes
This step helps quickly identify any node-related issues that might impact the control plane.
2. Confirm Control Plane Component Status
Control plane components can either be deployed as pods (common with kubeadm setups) or as native system services. Depending on your configuration, follow the appropriate checks.
Note
If your control plane components are deployed as pods in the kube-system namespace, ensure each pod is healthy. For configurations using native services, verify that the Kubernetes API server, controller manager, scheduler, and the kube-proxy service (on worker nodes) are running properly.
a. Checking the Kubernetes API Server
For systems using native services, inspect the API server status with:
service kube-apiserver status
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 07:57:25 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 15767 (kube-apiserver)
Tasks: 13 (limit: 2362)
b. Checking the Controller Manager
Next, verify that the kube-controller-manager is up and running:
service kube-controller-manager status
● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 07:57:25 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 15771 (kube-controller)
Tasks: 10 (limit: 2362)
c. Checking the Scheduler
Also, verify the status of the kube-scheduler service by running:
service kube-scheduler status
Ensure that this command returns complete and active service details.
3. Review Control Plane Logs
After confirming the service statuses, proceed to review the logs for deeper insights.
a. Retrieving Pod Logs (When Using kubeadm)
If the control plane components are deployed as pods, fetch the logs for the API server from the kube-system namespace:
kubectl logs kube-apiserver-master -n kube-system
An example excerpt from the logs might appear as follows:
I0401 13:45:38.190735 1 server.go:703] external host was not specified, using 172.17.0.117
I0401 13:45:38.194290 1 server.go:145] Version: v1.11.3
I0401 13:45:38.819075 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order:
NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
...
W0401 13:45:41.381736 1 genericapiserver.go:319] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
b. Using the Journal for Native Service Logs
For control plane components configured as native services on the master nodes, use the journal control utility to view the logs:
sudo journalctl -u kube-apiserver
This command allows you to review the logs specific to the kube-apiserver service, helping in pinpointing issues.
4. Additional Resources
For further details and advanced troubleshooting techniques, refer to the official Kubernetes Documentation.
By following these steps, you can systematically diagnose and address control plane issues within your Kubernetes cluster, ensuring a stable and resilient environment.
Watch Video
Watch video content
Practice Lab
Practice lab