Efficiently Troubleshoot Kubernetes Pods – Expert Network Consultant

Kubernetes is an incredibly powerful platform that allows developers to deploy and manage containerized applications at scale. While Kubernetes has many benefits, it can be challenging to troubleshoot issues that arise with individual pods. Fortunately, there are several best practices that developers can follow to quickly and efficiently troubleshoot Kubernetes pods.

Use kubectl describe to gather information
When troubleshooting Kubernetes pods, the first step is to gather as much information as possible. The kubectl describe command is an invaluable tool for this task. This command provides detailed information about the state of a particular pod, including its current status, any events that have occurred, and any containers that are running within the pod.

To use the kubectl describe command, simply run the following command:

kubectl describe pod

This will provide you with a wealth of information about the current state of the pod, including any error messages that may be present.

Check the logs
The next step in troubleshooting Kubernetes pods is to check the logs of the container running within the pod. Kubernetes provides a centralized logging system that allows developers to access logs from all containers within a cluster.

To view the logs of a particular container, you can use the kubectl logs command. For example, to view the logs of the container running within a pod named my-pod, you can run the following command:

kubectl logs my-pod

This will display the logs of the specified container, allowing you to identify any issues that may be present.

Check resource allocation
Kubernetes allows developers to allocate resources to individual pods, including CPU and memory. If a pod is experiencing issues, it’s important to check the resource allocation to ensure that the pod has enough resources to operate correctly.

To check the resource allocation of a particular pod, you can use the kubectl top command. This command provides real-time information about the CPU and memory usage of each pod within a cluster.

For example, to view the resource allocation of a pod named my-pod, you can run the following command:

kubectl top pod my-pod

This will display the CPU and memory usage of the specified pod, allowing you to identify any resource allocation issues.

Check network connectivity
Finally, it’s important to check the network connectivity of a pod if it’s experiencing issues. Kubernetes provides several networking options, including service discovery and load balancing, that can be used to ensure that pods can communicate with each other.

To check the network connectivity of a pod, you can use the kubectl exec command to execute commands within the pod’s container. For example, to check the network connectivity of a pod named my-pod, you can run the following command:

kubectl exec my-pod -- curl http://<service-name>

This will execute the curl command within the specified container, allowing you to check the connectivity of the pod.

In conclusion, Kubernetes is a powerful platform for managing containerized applications, but troubleshooting individual pods can be challenging. By following these best practices, developers can quickly and efficiently troubleshoot Kubernetes pods, ensuring that their applications are running smoothly and without interruption.

Based on the guideline above, let us walk through the steps;

Everyone wants a healthy Pod. Your applications rely on a healthy Pod state for successful delivery of services for consumers. Just as life has its challenges, sometimes you may experience issues with your pods which could put them in any of these states.

Pending: Pods can be in a pending state if there are insufficient resources in the cluster to schedule the pod, or if there is a scheduling issue due to resource constraints, node affinity/anti-affinity, or pod affinity/anti-affinity.

CrashLoopBackOff: Pods can be in a CrashLoopBackOff state if the container in the pod is crashing repeatedly. This can be due to issues with the container image, configuration, dependencies, or resources.

Error: Pods can be in an Error state if there is an issue with the pod’s configuration or if the container is unable to start or run due to issues with the container image, configuration, or dependencies.

Check pod status: The first step in troubleshooting Kubernetes pods is to check the pod status. You can use the kubectl get pods command to view the status of all pods in a given namespace. If a pod is in a Pending, CrashLoopBackOff, or Error state, it indicates that there is an issue that needs to be resolved.

kubectl get pods

# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-585449566-4rqvm   1/1     Running   0          59s

Check container logs: Once you have identified the pod with issues, the next step is to check the container logs. You can use the kubectl logs command to view the logs of a specific container in a pod. This can help you identify any errors or issues that the container is encountering.

kubectl logs

# kubectl logs nginx-585449566-4rqvm
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/03/06 11:04:49 [notice] 1#1: using the "epoll" event method
2023/03/06 11:04:49 [notice] 1#1: nginx/1.23.3
2023/03/06 11:04:49 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
2023/03/06 11:04:49 [notice] 1#1: OS: Linux 5.15.0-56-generic
2023/03/06 11:04:49 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/03/06 11:04:49 [notice] 1#1: start worker processes
2023/03/06 11:04:49 [notice] 1#1: start worker process 29
2023/03/06 11:04:49 [notice] 1#1: start worker process 30
#

Check container configuration: If the container logs do not provide any clues, the next step is to check the container configuration. You can use the kubectl describe pod command to view the pod configuration, including container image, environment variables, and resource limits. Make sure that the container configuration is correct and that all required resources are available.

kubectl describe pod

Name:             nginx-585449566-4rqvm
Namespace:        default
Priority:         0
Service Account:  default
Node:             vectra-worker2/172.18.0.2
Start Time:       Mon, 06 Mar 2023 11:04:30 +0000
Labels:           app=nginx
                  pod-template-hash=585449566
Annotations:      
Status:           Running
IP:               10.244.2.2
IPs:
  IP:           10.244.2.2
Controlled By:  ReplicaSet/nginx-585449566
Containers:
  nginx:
    Container ID:   containerd://470ecc0771e2fd3a828e228016677c1084792d8a26ad9d100337d9dcc6086597
    Image:          nginx:latest
    Image ID:       docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 06 Mar 2023 11:04:48 +0000
    Ready:          True
    Restart Count:  0
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6h4xk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-6h4xk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  8m24s  default-scheduler  Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2
  Normal  Pulling    8m24s  kubelet            Pulling image "nginx:latest"
  Normal  Pulled     8m7s   kubelet            Successfully pulled image "nginx:latest" in 17.21068267s
  Normal  Created    8m7s   kubelet            Created container nginx
  Normal  Started    8m7s   kubelet            Started container nginx
#

Check cluster events: If the container configuration looks correct, the next step is to check the cluster events. You can use the kubectl get events command to view the events in the cluster. This can help you identify any issues or changes in the cluster that may be affecting the pod.

kubectl get events

LAST SEEN   TYPE      REASON                    OBJECT                       MESSAGE
13m         Normal    Scheduled                 pod/nginx-585449566-4rqvm    Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2
13m         Normal    Pulling                   pod/nginx-585449566-4rqvm    Pulling image "nginx:latest"
13m         Normal    Pulled                    pod/nginx-585449566-4rqvm    Successfully pulled image "nginx:latest" in 17.21068267s
13m         Normal    Created                   pod/nginx-585449566-4rqvm    Created container nginx
13m         Normal    Started                   pod/nginx-585449566-4rqvm    Started container nginx
13m         Normal    SuccessfulCreate          replicaset/nginx-585449566   Created pod: nginx-585449566-4rqvm
13m         Normal    ScalingReplicaSet         deployment/nginx             Scaled up replica set nginx-585449566 to 1
18m         Normal    Starting                  node/vectra-control-plane    Starting kubelet.
18m         Normal    NodeHasSufficientMemory   node/vectra-control-plane    Node vectra-control-plane status is now: NodeHasSufficientMemory
18m         Normal    NodeHasNoDiskPressure     node/vectra-control-plane    Node vectra-control-plane status is now: NodeHasNoDiskPressure
18m         Normal    NodeHasSufficientPID      node/vectra-control-plane    Node vectra-control-plane status is now: NodeHasSufficientPID
18m         Normal    NodeAllocatableEnforced   node/vectra-control-plane    Updated Node Allocatable limit across pods
17m         Normal    Starting                  node/vectra-control-plane    Starting kube-proxy.
17m         Normal    RegisteredNode            node/vectra-control-plane    Node vectra-control-plane event: Registered Node vectra-control-plane in Controller
18m         Normal    Starting                  node/vectra-worker           Starting kubelet.
17m         Normal    NodeHasSufficientMemory   node/vectra-worker           Node vectra-worker status is now: NodeHasSufficientMemory
17m         Normal    NodeHasNoDiskPressure     node/vectra-worker           Node vectra-worker status is now: NodeHasNoDiskPressure
17m         Normal    NodeHasSufficientPID      node/vectra-worker           Node vectra-worker status is now: NodeHasSufficientPID
18m         Normal    NodeAllocatableEnforced   node/vectra-worker           Updated Node Allocatable limit across pods
17m         Warning   Rebooted                  node/vectra-worker           Node vectra-worker has been rebooted, boot id: fd9ad342-d276-46dd-a64e-c852092e755b
17m         Normal    Starting                  node/vectra-worker           Starting kube-proxy.
17m         Normal    RegisteredNode            node/vectra-worker           Node vectra-worker event: Registered Node vectra-worker in Controller
18m         Normal    Starting                  node/vectra-worker2          Starting kubelet.
17m         Normal    NodeHasSufficientMemory   node/vectra-worker2          Node vectra-worker2 status is now: NodeHasSufficientMemory
17m         Normal    NodeHasNoDiskPressure     node/vectra-worker2          Node vectra-worker2 status is now: NodeHasNoDiskPressure
17m         Normal    NodeHasSufficientPID      node/vectra-worker2          Node vectra-worker2 status is now: NodeHasSufficientPID
18m         Normal    NodeAllocatableEnforced   node/vectra-worker2          Updated Node Allocatable limit across pods
17m         Warning   Rebooted                  node/vectra-worker2          Node vectra-worker2 has been rebooted, boot id: fd9ad342-d276-46dd-a64e-c852092e755b
17m         Normal    Starting                  node/vectra-worker2          Starting kube-proxy.
17m         Normal    RegisteredNode            node/vectra-worker2          Node vectra-worker2 event: Registered Node vectra-worker2 in Controller

Check network connectivity: If the pod is still experiencing issues, it may be a network connectivity issue. You can use the kubectl exec -it — /bin/bash command to access the pod shell and perform network connectivity tests. For example, you can use ping to test connectivity to other pods or services.

kubectl exec -it  -- bash

# kubectl exec -it nginx-585449566-h8htf bash

nginx-585449566-h8htf:/# ls -al
total 88
drwxr-xr-x   1 root root 4096 Mar  6 11:52 .
drwxr-xr-x   1 root root 4096 Mar  6 11:52 ..
drwxr-xr-x   2 root root 4096 Feb 27 00:00 bin
drwxr-xr-x   2 root root 4096 Dec  9 19:15 boot
drwxr-xr-x   5 root root  360 Mar  6 11:52 dev
drwxr-xr-x   1 root root 4096 Mar  1 18:43 docker-entrypoint.d
-rwxrwxr-x   1 root root 1616 Mar  1 18:42 docker-entrypoint.sh
drwxr-xr-x   1 root root 4096 Mar  6 11:52 etc
drwxr-xr-x   2 root root 4096 Dec  9 19:15 home
drwxr-xr-x   1 root root 4096 Feb 27 00:00 lib
drwxr-xr-x   2 root root 4096 Feb 27 00:00 lib64
drwxr-xr-x   2 root root 4096 Feb 27 00:00 media
drwxr-xr-x   2 root root 4096 Feb 27 00:00 mnt
drwxr-xr-x   2 root root 4096 Feb 27 00:00 opt
dr-xr-xr-x 432 root root    0 Mar  6 11:52 proc
drwx------   1 root root 4096 Mar  6 11:57 root
drwxr-xr-x   1 root root 4096 Mar  6 11:52 run
drwxr-xr-x   2 root root 4096 Feb 27 00:00 sbin
drwxr-xr-x   2 root root 4096 Feb 27 00:00 srv
dr-xr-xr-x  13 root root    0 Mar  6 11:52 sys
drwxrwxrwt   1 root root 4096 Mar  1 18:43 tmp
drwxr-xr-x   1 root root 4096 Feb 27 00:00 usr
drwxr-xr-x   1 root root 4096 Feb 27 00:00 var

Check storage: Finally, if the pod is using storage, make sure that the storage is correctly mounted and accessible by the container. You can use the kubectl describe pod command to view the pod’s volume mounts and their associated storage classes.

kubectl describe pod

# kubectl describe pod nginx-585449566-4rqvm
Name:             nginx-585449566-4rqvm
Namespace:        default
Priority:         0
Service Account:  default
Node:             vectra-worker2/172.18.0.2
Start Time:       Mon, 06 Mar 2023 11:04:30 +0000
Labels:           app=nginx
                  pod-template-hash=585449566
Annotations:      
Status:           Running
IP:               10.244.2.2
IPs:
  IP:           10.244.2.2
Controlled By:  ReplicaSet/nginx-585449566
Containers:
  nginx:
    Container ID:   containerd://470ecc0771e2fd3a828e228016677c1084792d8a26ad9d100337d9dcc6086597
    Image:          nginx:latest
    Image ID:       docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 06 Mar 2023 11:04:48 +0000
    Ready:          True
    Restart Count:  0
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6h4xk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-6h4xk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  29m   default-scheduler  Successfully assigned default/nginx-585449566-4rqvm to vectra-worker2
  Normal  Pulling    29m   kubelet            Pulling image "nginx:latest"
  Normal  Pulled     28m   kubelet            Successfully pulled image "nginx:latest" in 17.21068267s
  Normal  Created    28m   kubelet            Created container nginx
  Normal  Started    28m   kubelet            Started container nginx
#

By following these steps, you should be able to identify and resolve most issues with Kubernetes pods.