Kubernetes is an incredibly powerful platform that allows developers to deploy and manage containerized applications at scale. While Kubernetes has many benefits, it can be challenging to troubleshoot issues that arise with individual pods. Fortunately, there are several best practices that developers can follow to quickly and efficiently troubleshoot Kubernetes pods.
Use kubectl describe to gather information
When troubleshooting Kubernetes pods, the first step is to gather as much information as possible. The kubectl describe command is an invaluable tool for this task. This command provides detailed information about the state of a particular pod, including its current status, any events that have occurred, and any containers that are running within the pod.
To use the kubectl describe command, simply run the following command:
kubectl describe pod
This will provide you with a wealth of information about the current state of the pod, including any error messages that may be present.
Check the logs
The next step in troubleshooting Kubernetes pods is to check the logs of the container running within the pod. Kubernetes provides a centralized logging system that allows developers to access logs from all containers within a cluster.
To view the logs of a particular container, you can use the kubectl logs command. For example, to view the logs of the container running within a pod named my-pod, you can run the following command:
kubectl logs my-pod
This will display the logs of the specified container, allowing you to identify any issues that may be present.
Check resource allocation
Kubernetes allows developers to allocate resources to individual pods, including CPU and memory. If a pod is experiencing issues, it’s important to check the resource allocation to ensure that the pod has enough resources to operate correctly.
To check the resource allocation of a particular pod, you can use the kubectl top command. This command provides real-time information about the CPU and memory usage of each pod within a cluster.
For example, to view the resource allocation of a pod named my-pod, you can run the following command:
kubectl top pod my-pod
This will display the CPU and memory usage of the specified pod, allowing you to identify any resource allocation issues.
Check network connectivity
Finally, it’s important to check the network connectivity of a pod if it’s experiencing issues. Kubernetes provides several networking options, including service discovery and load balancing, that can be used to ensure that pods can communicate with each other.
To check the network connectivity of a pod, you can use the kubectl exec command to execute commands within the pod’s container. For example, to check the network connectivity of a pod named my-pod, you can run the following command:
kubectl exec my-pod -- curl http://<service-name>
This will execute the curl command within the specified container, allowing you to check the connectivity of the pod.
In conclusion, Kubernetes is a powerful platform for managing containerized applications, but troubleshooting individual pods can be challenging. By following these best practices, developers can quickly and efficiently troubleshoot Kubernetes pods, ensuring that their applications are running smoothly and without interruption.
Based on the guideline above, let us walk through the steps;
Everyone wants a healthy Pod. Your applications rely on a healthy Pod state for successful delivery of services for consumers. Just as life has its challenges, sometimes you may experience issues with your pods which could put them in any of these states.
Pending: Pods can be in a pending state if there are insufficient resources in the cluster to schedule the pod, or if there is a scheduling issue due to resource constraints, node affinity/anti-affinity, or pod affinity/anti-affinity.
CrashLoopBackOff: Pods can be in a CrashLoopBackOff state if the container in the pod is crashing repeatedly. This can be due to issues with the container image, configuration, dependencies, or resources.
Error: Pods can be in an Error state if there is an issue with the pod’s configuration or if the container is unable to start or run due to issues with the container image, configuration, or dependencies.
Check pod status: The first step in troubleshooting Kubernetes pods is to check the pod status. You can use the kubectl get pods command to view the status of all pods in a given namespace. If a pod is in a Pending, CrashLoopBackOff, or Error state, it indicates that there is an issue that needs to be resolved.
kubectl get pods
# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-585449566-4rqvm 1/1 Running 0 59s
Check container logs: Once you have identified the pod with issues, the next step is to check the container logs. You can use the kubectl logs
kubectl logs
# kubectl logs nginx-585449566-4rqvm /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/ /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf 10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh /docker-entrypoint.sh: Configuration complete; ready for start up 2023/03/06 11:04:49 [notice] 1#1: using the "epoll" event method 2023/03/06 11:04:49 [notice] 1#1: nginx/1.23.3 2023/03/06 11:04:49 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 2023/03/06 11:04:49 [notice] 1#1: OS: Linux 5.15.0-56-generic 2023/03/06 11:04:49 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 2023/03/06 11:04:49 [notice] 1#1: start worker processes 2023/03/06 11:04:49 [notice] 1#1: start worker process 29 2023/03/06 11:04:49 [notice] 1#1: start worker process 30 #
Check container configuration: If the container logs do not provide any clues, the next step is to check the container configuration. You can use the kubectl describe pod
kubectl describe pod
Name: nginx-585449566-4rqvm Namespace: default Priority: 0 Service Account: default Node: vectra-worker2/172.18.0.2 Start Time: Mon, 06 Mar 2023 11:04:30 +0000 Labels: app=nginx pod-template-hash=585449566 Annotations:Status: Running IP: 10.244.2.2 IPs: IP: 10.244.2.2 Controlled By: ReplicaSet/nginx-585449566 Containers: nginx: Container ID: containerd://470ecc0771e2fd3a828e228016677c1084792d8a26ad9d100337d9dcc6086597 Image: nginx:latest Image ID: docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2 Port: 80/TCP Host Port: 0/TCP State: Running Started: Mon, 06 Mar 2023 11:04:48 +0000 Ready: True Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6h4xk (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-6h4xk: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 8m24s default-scheduler Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2 Normal Pulling 8m24s kubelet Pulling image "nginx:latest" Normal Pulled 8m7s kubelet Successfully pulled image "nginx:latest" in 17.21068267s Normal Created 8m7s kubelet Created container nginx Normal Started 8m7s kubelet Started container nginx #
Check cluster events: If the container configuration looks correct, the next step is to check the cluster events. You can use the kubectl get events command to view the events in the cluster. This can help you identify any issues or changes in the cluster that may be affecting the pod.
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE 13m Normal Scheduled pod/nginx-585449566-4rqvm Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2 13m Normal Pulling pod/nginx-585449566-4rqvm Pulling image "nginx:latest" 13m Normal Pulled pod/nginx-585449566-4rqvm Successfully pulled image "nginx:latest" in 17.21068267s 13m Normal Created pod/nginx-585449566-4rqvm Created container nginx 13m Normal Started pod/nginx-585449566-4rqvm Started container nginx 13m Normal SuccessfulCreate replicaset/nginx-585449566 Created pod: nginx-585449566-4rqvm 13m Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-585449566 to 1 18m Normal Starting node/vectra-control-plane Starting kubelet. 18m Normal NodeHasSufficientMemory node/vectra-control-plane Node vectra-control-plane status is now: NodeHasSufficientMemory 18m Normal NodeHasNoDiskPressure node/vectra-control-plane Node vectra-control-plane status is now: NodeHasNoDiskPressure 18m Normal NodeHasSufficientPID node/vectra-control-plane Node vectra-control-plane status is now: NodeHasSufficientPID 18m Normal NodeAllocatableEnforced node/vectra-control-plane Updated Node Allocatable limit across pods 17m Normal Starting node/vectra-control-plane Starting kube-proxy. 17m Normal RegisteredNode node/vectra-control-plane Node vectra-control-plane event: Registered Node vectra-control-plane in Controller 18m Normal Starting node/vectra-worker Starting kubelet. 17m Normal NodeHasSufficientMemory node/vectra-worker Node vectra-worker status is now: NodeHasSufficientMemory 17m Normal NodeHasNoDiskPressure node/vectra-worker Node vectra-worker status is now: NodeHasNoDiskPressure 17m Normal NodeHasSufficientPID node/vectra-worker Node vectra-worker status is now: NodeHasSufficientPID 18m Normal NodeAllocatableEnforced node/vectra-worker Updated Node Allocatable limit across pods 17m Warning Rebooted node/vectra-worker Node vectra-worker has been rebooted, boot id: fd9ad342-d276-46dd-a64e-c852092e755b 17m Normal Starting node/vectra-worker Starting kube-proxy. 17m Normal RegisteredNode node/vectra-worker Node vectra-worker event: Registered Node vectra-worker in Controller 18m Normal Starting node/vectra-worker2 Starting kubelet. 17m Normal NodeHasSufficientMemory node/vectra-worker2 Node vectra-worker2 status is now: NodeHasSufficientMemory 17m Normal NodeHasNoDiskPressure node/vectra-worker2 Node vectra-worker2 status is now: NodeHasNoDiskPressure 17m Normal NodeHasSufficientPID node/vectra-worker2 Node vectra-worker2 status is now: NodeHasSufficientPID 18m Normal NodeAllocatableEnforced node/vectra-worker2 Updated Node Allocatable limit across pods 17m Warning Rebooted node/vectra-worker2 Node vectra-worker2 has been rebooted, boot id: fd9ad342-d276-46dd-a64e-c852092e755b 17m Normal Starting node/vectra-worker2 Starting kube-proxy. 17m Normal RegisteredNode node/vectra-worker2 Node vectra-worker2 event: Registered Node vectra-worker2 in Controller
Check network connectivity: If the pod is still experiencing issues, it may be a network connectivity issue. You can use the kubectl exec -it
kubectl exec -it-- bash
# kubectl exec -it nginx-585449566-h8htf bash nginx-585449566-h8htf:/# ls -al total 88 drwxr-xr-x 1 root root 4096 Mar 6 11:52 . drwxr-xr-x 1 root root 4096 Mar 6 11:52 .. drwxr-xr-x 2 root root 4096 Feb 27 00:00 bin drwxr-xr-x 2 root root 4096 Dec 9 19:15 boot drwxr-xr-x 5 root root 360 Mar 6 11:52 dev drwxr-xr-x 1 root root 4096 Mar 1 18:43 docker-entrypoint.d -rwxrwxr-x 1 root root 1616 Mar 1 18:42 docker-entrypoint.sh drwxr-xr-x 1 root root 4096 Mar 6 11:52 etc drwxr-xr-x 2 root root 4096 Dec 9 19:15 home drwxr-xr-x 1 root root 4096 Feb 27 00:00 lib drwxr-xr-x 2 root root 4096 Feb 27 00:00 lib64 drwxr-xr-x 2 root root 4096 Feb 27 00:00 media drwxr-xr-x 2 root root 4096 Feb 27 00:00 mnt drwxr-xr-x 2 root root 4096 Feb 27 00:00 opt dr-xr-xr-x 432 root root 0 Mar 6 11:52 proc drwx------ 1 root root 4096 Mar 6 11:57 root drwxr-xr-x 1 root root 4096 Mar 6 11:52 run drwxr-xr-x 2 root root 4096 Feb 27 00:00 sbin drwxr-xr-x 2 root root 4096 Feb 27 00:00 srv dr-xr-xr-x 13 root root 0 Mar 6 11:52 sys drwxrwxrwt 1 root root 4096 Mar 1 18:43 tmp drwxr-xr-x 1 root root 4096 Feb 27 00:00 usr drwxr-xr-x 1 root root 4096 Feb 27 00:00 var
Check storage: Finally, if the pod is using storage, make sure that the storage is correctly mounted and accessible by the container. You can use the kubectl describe pod
kubectl describe pod
# kubectl describe pod nginx-585449566-4rqvm Name: nginx-585449566-4rqvm Namespace: default Priority: 0 Service Account: default Node: vectra-worker2/172.18.0.2 Start Time: Mon, 06 Mar 2023 11:04:30 +0000 Labels: app=nginx pod-template-hash=585449566 Annotations:Status: Running IP: 10.244.2.2 IPs: IP: 10.244.2.2 Controlled By: ReplicaSet/nginx-585449566 Containers: nginx: Container ID: containerd://470ecc0771e2fd3a828e228016677c1084792d8a26ad9d100337d9dcc6086597 Image: nginx:latest Image ID: docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2 Port: 80/TCP Host Port: 0/TCP State: Running Started: Mon, 06 Mar 2023 11:04:48 +0000 Ready: True Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6h4xk (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-6h4xk: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 29m default-scheduler Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2 Normal Pulling 29m kubelet Pulling image "nginx:latest" Normal Pulled 28m kubelet Successfully pulled image "nginx:latest" in 17.21068267s Normal Created 28m kubelet Created container nginx Normal Started 28m kubelet Started container nginx #
By following these steps, you should be able to identify and resolve most issues with Kubernetes pods.