Kubernetes Troubleshooting for Application Developers
Troubleshooting Scenarios
Create Container Errors
In this guide, we explore common Kubernetes container creation errors—specifically, CreateContainerConfigError and CreateContainerError. We will walk through the Kubernetes container lifecycle, highlighting the stages—from image pulling to container startup—and identify where these errors occur.
Below is an initial pod status output displaying both errors:
NAME READY STATUS RESTARTS CPU MEM IP NODE
demo-a-deployment-75bf694876-2rxs4 0/1 CreateContainerConfigError 0 0 0 10.244.192.23 node0
demo-b-deployment-f959d8b884-kvvpc 0/1 CreateContainerError 0 0 0 n/a 10.244.192.20 node0
An extended output with additional flags looks as follows:
NAME PF READY STATUS RESTARTS CPU MEM %CPU/R %MEM/R %CPU/L %MEM/L IP NODE
demo-a-deployment-75bf694876-2rxs4 ✔ 0/1 CreateContainerConfigError 0 0 0 n/a n/a n/a 10.244.192.23 node0
demo-b-deployment-f959db884-kvvpc ✖ 0/1 CreateContainerError 0 0 0 n/a n/a n/a 10.244.192.20 node0
Container Creation Lifecycle in Kubernetes
Understanding the container lifecycle is crucial for troubleshooting these issues. The container creation process is divided into four main stages:
Image Pulling
Kubernetes pulls the container image (e.g.,nginx:latest
) from the image registry to the node. With correct registry credentials, this step completes successfully.Container Configuration
Kubernetes creates the container configuration by setting environment variables, command arguments, resource limits, volume mounts, network settings, security contexts, and more.Note
A CreateContainerConfigError is raised if required configurations (such as a referenced Secret or ConfigMap) are missing.
Container Creation
The container runtime (e.g., containerd or Docker) creates the container using the pulled image. This involves establishing the filesystem and Linux namespaces.Warning
A CreateContainerError is typically due to issues encountered by the container runtime while setting up the container.
Container Start
Finally, the container's process starts by executing the defined command or entry point. Errors occurring at this stage are often referred to as run container errors, signaling problems with the process startup.
The CreateContainerConfigError usually points to misconfigurations such as referencing a Secret or a ConfigMap that doesn't exist, while the CreateContainerError suggests that the container runtime could not complete the container creation.
Troubleshooting CreateContainerConfigError
Let’s begin with demo A, which encounters a CreateContainerConfigError. The pod description for demo A shows the error in the container state:
Describe(default/demo-a-deployment-75bf694876-2rxs4)
Annotations:
pod-template-hash: 75bf694876
Status: Pending
IP: 10.244.192.23
IPs:
IP: 10.244.192.23
Controlled By: ReplicaSet/demo-a-deployment-75bf694876
Containers:
example-container:
Container ID:
Image: nginx:latest
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: CreateContainerConfigError
Ready: False
Restart Count: 0
Environment:
KEY_A: <set to the key 'some_key' in secret 'demo-a-secret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-trh8x (ro)
Conditions:
Type: PodReadyToStartContainers
Status: True
Reviewing the Events section reveals the root cause:
Describe(default/demo-a-deployment-75bf694876-2rxs4)
Events:
Type Reason Age From Message
---- ----- ---- ---- ------
Normal Scheduled 7m27s default-scheduler Successfully assigned default/demo-a-deployment-75bf694876-2rxs4 to node0
Normal Pulled 7m21s kubelet Successfully pulled image "nginx:latest" in 2.973s (4.377s including wait)
...
Warning Failed 5m59s kubelet Error: secret "demo-a-secret" not found
...
Normal Pulling 2m17s kubelet Pulling image "nginx:latest"
The pod is referencing a Secret named demo-a-secret
that does not exist in the cluster. The environment variable KEY_A
is configured to use the value from this Secret (key: some_key
). Without this Secret, Kubernetes cannot inject the necessary environment variable, leading to the CreateContainerConfigError.
Below is the relevant portion of the pod’s YAML configuration:
name: demo-a-deployment-75bf694876-2rxs4
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: demo-a-deployment-75bf694876
uid: bfb8961f-93a0-46b1-8a0b-30167dd9017
resourceVersion: "1772"
uid: d2547de8-626c-4c91-8a37-5f4106463bf6
spec:
containers:
- env:
- name: KEY_A
valueFrom:
secretKeyRef:
key: some_key
name: demo-a-secret
image: nginx:latest
imagePullPolicy: Always
name: example-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
To resolve this error, create the missing Secret by applying a YAML file similar to the following:
controlplane ~ ➜ k9s
controlplane ~ ➜ k apply -f missing-secret.yaml
secret/demo-a-secret created
controlplane ~ ➜
After creating the Secret, the environment variable can be properly injected and the pod transitions from the error state. An updated pod status might look like this:
NAME READY STATUS RESTARTS CPU MEM IP NODE AGE
demo-a-deployment-75bf694876-vcftf 1/1 Running 0 0 17 10.244.192.4 node01 118
demo-b-deployment-f959db884-kvvpc 0/1 CreateContainerError 0 0 0 n/a n/a 12m
Troubleshooting CreateContainerError
Next, let’s investigate demo B which shows the CreateContainerError. The pod description for demo B indicates that no command or entry point is specified:
Context: kubernetes-admin@kubernetes
Cluster: kubernetes
User: kubernetes-admin
K9s Rev: v0.32.5
K8s Rev: v1.30.0
CPU: 0%
MEM: 1%
NAME PF READY STATUS RESTARTS CPU MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
demo-a-deployment-75bf694876-vcftf 1/1 Running 0 0 0 n/a n/a n/a 10.244.192.4 node01 2m
demo-b-deployment-f959db884-kvvpc 0/2 CreateContainerError 0 0 0 n/a n/a n/a 10.244.192.20 node01 12m
The detailed pod description for the failing container points out the issue:
Describe(default/demo-b-deployment-f959db884-kvpc)
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations:
node.kubernetes.io/not-ready: NoExecute op=Exists for 300s
node.kubernetes.io/unreachable: NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ --- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/demo-b-deployment-f959db884-kvpc to node01
Normal Pulling 12m kubelet Pulling image "ngheith/no-entry-point:v5"
Normal Pulled 12m kubelet Successfully pulled image "ngheith/no-entry-point:v5" in 548ms (1.422s including waiting).
Warning Failed 12m kubelet Error: failed to generate spec: no command specified
The image ngheith/no-entry-point:v5
does not define an entry point, and the pod configuration does not override this by providing a command. To resolve this issue, update the deployment to include a valid command that keeps the container running—for example:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: demo-b
spec:
containers:
- image: ngheith/no-entry-point:v5
imagePullPolicy: IfNotPresent
name: example-container
command: ["sleep", "3600"]
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2024-08-04T22:09:40Z"
lastUpdateTime: "2024-08-04T22:09:40Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2024-08-04T22:19:41Z"
lastUpdateTime: "2024-08-04T22:19:41Z"
message: ReplicaSet "demo-b-deployment-f959db884" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
After updating the deployment, verifying the pod status should show that the container is now running without the creation error:
NAME READY STATUS RESTARTS CPU MEM IP NODE AGE
demo-a-deployment-75bf694876-vctf 1/1 Running 0 0 17 10.244.192.4 node01 3m
demo-b-deployment-f959db884-kvvtc 0/2 CreateContainerError 0 0 0 n/a n/a n/a
Demonstrating a Run Container Error
In some cases, the container is created successfully but fails to start its process due to an invalid command. For example, a deployment might define an incorrect command (gibberish), causing the container to crash after creation:
controlplane ~ ⮕ k apply -f demo
The resulting pod status shows a CrashLoopBackOff:
NAME PF READY STATUS RESTARTS CPU MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
demo-a-deployment-75bf694876-vcttf ● 1/1 Running 0 0 17 n/a n/a n/a 10.244.192.4 node01 6m52s
demo-b-deployment-9cbbbd94f-l2xsc ● 1/1 Running 0 0 1 n/a n/a n/a 10.244.192.23 node01 3m1s
demo-c-deployment-5f47f795c-ghkl4 ● 0/1 CrashLoopBackOff 1 0 0 n/a n/a n/a 10.244.192.20 node01 8s
Describing the troubled pod for demo C indicates that the defined command (jibberish
) is not recognized:
Describe(default/demo-c-deployment-5f47f795c-ghkl4)
...
Command: jibberish
State: Waiting
Reason: RunContainerError
Last State: Terminated
Reason: StartError
Message: failed to create contained task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "jibberish": executable file not found in $PATH: unknown
Exit Code: 128
Started: Thu, 01 Jan 1970 00:00:00 +0000
Finished: Sun, 04 Aug 2024 22:26:55 +0000
Ready: False
Restart Count: 2
...
To resolve this RunContainerError, update the deployment with a valid command. For example, to keep the container active, modify the deployment snippet as follows:
# Updated deployment snippet for demo-c
spec:
containers:
- name: example-container
image: ngheith/no-entry-point:v5
command: ["sleep", "3600"]
imagePullPolicy: IfNotPresent
After applying this change, checking the pod statuses should reveal that the container runs without errors:
NAME PF READY STATUS RESTARTS CPU MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
demo-a-deployment-75bf694876-vcftr ▢ 1/1 Running 0 0 17 n/a n/a n/a 10.244.192.4 node01 8m25s
demo-b-deployment-9cbbbd944f-l2xsc ▢ 1/1 Running 0 0 0 n/a n/a n/a 10.244.192.23 node01 4m34s
demo-c-deployment-5f47f7f795c-ghkl4 ▢ 0/1 Terminating 4 0 0 n/a n/a n/a n/a node01 10s
demo-c-deployment-86d746c6f9-sbss9 ▢ 1/1 Running 0 0 0 n/a n/a n/a 10.244.192.24 node01 2s
Final Status Summary
Below is a summary of the final pod status after troubleshooting all container creation steps:
Context: kubernetes-admin@kubernetes
Cluster: kubernetes
User: kubernetes-admin
K9s Rev: v0.32.5
K8s Rev: v1.30.0
CPU: 0%
MEM: 1%
NAME PF READY STATUS RESTARTS CPU MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
demo-a-deployment-75bf694876-vctf ● 1/1 Running 0 0 17 n/a n/a n/a n/a 10.244.192.4 node01 8m45s
demo-b-deployment-9cbbdb94f-l2xsc ● 1/1 Running 0 0 0 n/a n/a n/a n/a 10.244.192.23 node01 4m54s
demo-c-deployment-86d746c6f9-sbss9 ● 1/1 Running 0 0 0 n/a n/a n/a n/a 10.244.192.24 node01 22s
By understanding the container lifecycle and the specific error messages, you can more effectively narrow down troubleshooting efforts in Kubernetes environments.
For more detailed troubleshooting techniques, check out the Kubernetes Documentation and resources on Container Runtime Troubleshooting.
Watch Video
Watch video content