Kubernetes Troubleshooting for Application Developers
Troubleshooting Scenarios
Case of the Missing Pods
In this guide, we explore a common Kubernetes issue—missing pods. You will learn how to deploy applications in the staging namespace and troubleshoot why the expected pods fail to start.
Deploying the API Application
We begin by creating a deployment for a simple web application named "api" in the staging namespace. The deployment manifest specifies that five replicas should run.
Before applying the new deployment, inspect the current resources in the staging namespace. At this point, only one deployment, "data processor," exists with three running pods:
controlplane ~ ➜ cat api-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: staging
spec:
replicas: 5
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: kodekloud/webapp-color
ports:
- containerPort: 8080
controlplane ~ ➜ k get all -n staging
NAME READY STATUS RESTARTS AGE
pod/data-processor-75597df6-6kkst 1/1 Running 0 2m41s
pod/data-processor-75597df6-bzd1q 1/1 Running 0 2m41s
pod/data-processor-75597df6-gnthx 1/1 Running 0 2m41s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/data-processor 3/3 3 3 2m41s
NAME READY DESIRED CURRENT READY AGE
replicaset.apps/data-processor-75597df6 3 3 3 3 2m41s
Apply the API deployment with the following command:
controlplane ~ ➜ k apply -f api-deployment.yml
deployment.apps/api created
Monitor the pods as they start up:
controlplane ~ ➜ k get pods -n staging --watch
NAME READY STATUS RESTARTS AGE
api-7548899bdb-7nbgb 1/1 Running 0 12s
api-7548899bdb-hmqkz 1/1 Running 0 12s
data-processor-75597df6-6kkst 1/1 Running 0 3m10s
data-processor-75597df6-bzd1q 1/1 Running 0 3m10s
data-processor-75597df6-gnthx 1/1 Running 0 3m10s
Even though five replicas were specified for the API deployment, only two pods are running. There are no pods in a pending or container-creating state, which suggests that node resource unavailability or taints are not the issue. The deployment is attempting to create additional pods, but three replicas remain unavailable.
Investigating the Deployment Status
Gather more details by describing the deployment:
controlplane ~ ➜ k describe deployment -n staging api
Name: api
Namespace: staging
CreationTimestamp: Sat, 18 May 2024 00:08:41 +0000
Labels: <none>
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=api
Replicas: 5 desired | 2 updated | 2 total | 2 available | 3 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=api
Containers:
api:
Image: kodekloud/webapp-color
Port: 8080/TCP
Host Port: 0/TCP
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: api-7548899bdb (2/5 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 85s deployment-controller Scaled up replica set api-7548899bdb to 5
The events reveal that the deployment controller scaled the replica set to five, but there is an error related to resource quotas. A closer look at the events shows the following error:
2m31s Warning FailedCreate replicaset/api-7548899bdb Error creating: pods "api-7548899bdb-n4k9j" is forbidden: exceeded quota: pod-quota, requested: pods=1, used: 5, limited: pods=5
This error indicates that the "pod-quota" resource quota is capping the number of pods in the staging namespace at five. Since there are already five pods running (including those from the data processor deployment), the API deployment cannot create additional pods.
Adjusting the Namespace Resource Quota
First, confirm the existing resource quota for the staging namespace:
controlplane ~ ➜ k describe ns staging
Name: staging
Labels: kubernetes.io/metadata.name=staging
Annotations: <none>
Status: Active
Resource Quotas
Name: pod-quota
Resource Used Hard
-------- --- ---
pods 5 5
Tip
To support additional deployments, consider increasing the pod quota in the namespace.
Edit the resource quota to raise the hard limit—for example, to 10 pods:
controlplane ~ ➜ k edit resourcequota -n staging pod-quota
Update the manifest as shown below:
apiVersion: v1
kind: ResourceQuota
metadata:
name: pod-quota
namespace: staging
spec:
hard:
pods: "10"
status:
hard:
pods: "5"
used:
pods: "5"
After updating the quota, restart the API deployment to initiate the creation of the additional pods:
controlplane ~ ➜ k rollout restart deployment -n staging api
Verify that new pods are created by listing them:
controlplane ~ ➜ k get pods -n staging
NAME READY STATUS RESTARTS AGE
api-7548899bd-7bngb 1/1 Running 0 5m10s
api-7548899bdb-mkqkz 1/1 Terminating 0 5m10s
api-7548899bdb-srcvg 1/1 Terminating 0 7s
api-7548899bdb-t88x7 1/1 Terminating 0 7s
api-79744dcd-4b4j2 1/1 Running 0 7s
api-79744dcd-fgktmz 1/1 Running 0 7s
api-79744dcd-fgtmgr 1/1 Running 0 7s
data-processor-755794d6-6fks7 0/1 Running 0 8m8s
data-processor-755794d6-bzd1q 0/1 Running 0 8m8s
data-processor-755794d6-gnthx 0/1 Running 0 8m8s
The older pods are terminating as the new ones come up, and eventually, five API pods are running.
Deploying the Analytics Application
Next, deploy another web application, "analytics," with a single replica. With the updated namespace quota, this deployment should not encounter quota issues. The deployment manifest is as follows:
controlplane ~ ➜ cat analytics-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: analytics
namespace: staging
spec:
replicas: 1
selector:
matchLabels:
name: web-dashboard
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
name: web-dashboard
spec:
serviceAccountName: analytics-service-account
containers:
- name: web-dashboard
image: kodekloud/webapp-contest
imagePullPolicy: Always
Apply the deployment:
controlplane ~ ➜ k apply -f analytics-deployment.yml
deployment.apps/analytics created
Monitor the pods:
controlplane ~ ➜ k get pods -n staging --watch
NAME READY STATUS RESTARTS AGE
api-7974d4cdf9-4b4jt 1/1 Running 0 3m44s
api-7974d4cdf9-fttpj 1/1 Running 0 3m3s
api-7974d4cdf9-ktmzk 1/1 Running 0 3m44s
api-7974d4cdf9-tgmzr 1/1 Running 0 3m44s
api-7974d4cdf9-v7rph 1/1 Running 0 3m3s
data-processor-75597d6f-6kkst 1/1 Running 0 11m
data-processor-75597d6f-bzdlq 1/1 Running 0 11m
data-processor-75597d6f-gnthx 1/1 Running 0 11m
However, the analytics pod does not appear to be created. Describing the deployment indicates that the desired replica is unavailable:
controlplane ~ ➜ k describe deployments.apps -n staging analytics
Name: analytics
Namespace: staging
CreationTimestamp: Sat, 18 May 2024 00:17:16 +0000
Labels: <none>
Annotations: deployment.kubernetes.io/revision: 1
Selector: name=web-dashboard
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: name=web-dashboard
Service Account: analytics-service-account
Containers:
web-dashboard:
Image: kodekloud/webapp-contest
ImagePullPolicy: Always
Conditions:
Type Status Reason
---- ------ -----
Progressing True NewReplicaSetCreated
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
OldReplicaSets: <none>
NewReplicaSet: analytics-7dd875747b (0/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 57s deployment-controller Scaled up replica set analytics-7dd875747b to 1
Event logs further indicate an error related to the service account:
... Error creating: pods "analytics-7dd875747b-" is forbidden: error looking up service account staging/analytics-service-account: ...
Examine the service accounts in the staging namespace:
controlplane ~ ➜ k get sa -n staging
NAME SECRETS AGE
default 0 13m
Since the service account "analytics-service-account" is missing, create it using the following command:
controlplane ~ ➜ k create sa analytics-service-account -n staging
serviceaccount/analytics-service-account created
Restart the analytics deployment to apply the changes:
controlplane ~ ➜ k rollout restart deployment -n staging analytics
deployment.apps/analytics restarted
Check the pods again to verify that the analytics pod is created:
controlplane ~ ➜ k get pods -n staging
NAME READY STATUS RESTARTS AGE
analytics-5cfbc5fc7b-lr8wd 0/1 ContainerCreating 0 6s
api-79744dc4df-4b4jt 1/1 Running 0 11m
api-79744dc4df-fgktj 1/1 Running 0 11m
api-79744dc4df-tgzmz 1/1 Running 0 11m
api-79744dc4df-v7rph 1/1 Running 0 11m
data-processor-755797d6-6kkst 1/1 Running 0 19m
data-processor-755797d6-bzd1q 1/1 Running 0 19m
data-processor-755797d6-gnthx 1/1 Running 0 19m
After a short period, the analytics pod transitions from ContainerCreating to Running.
Conclusion
In this guide, we addressed two common issues that can lead to missing pods in a Kubernetes cluster:
- A resource quota that restricts the creation of new pods in a namespace.
- A missing dependency—in this case, a required service account.
Key Takeaways
- Check resource quotas imposed on the namespace if pods are not being created as expected.
- Verify that all required service accounts and other dependencies are present.
- Use "kubectl describe" to access detailed event logs and error messages.
By increasing the pod quota and creating the missing service account, the deployments functioned as intended, ensuring proper pod creation in the staging namespace.
For more in-depth Kubernetes troubleshooting, consider reviewing the Kubernetes Documentation for additional best practices.
Watch Video
Watch video content