Search code examples
kuberneteskubectlkubelet

dev k8s master is showing extra load and resulting in to not getting output for getting pods


My dev k8s master is showing extra load and resulting in to not getting output for getting pods:

admin@ip-172-20-49-150:~$ kubectl get po -n cog-stage

^C
admin@ip-172-20-49-150:~$

admin@ip-172-20-49-150:~$ top

top - 04:36:52 up 2 min,  2 users,  load average: 14.39, 4.43, 1.55
Tasks: 140 total,   2 running, 138 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy,  0.0 ni,  0.0 id, 99.6 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem:   3857324 total,  3778024 used,    79300 free,      192 buffers
KiB Swap:        0 total,        0 used,        0 free.    15680 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   32 root      20   0       0      0      0 S   2.4  0.0   0:03.75 kswapd0
 1263 root      20   0   97388  19036      0 S   1.3  0.5   0:01.06 kube-controller
 1224 root      20   0   28764  11380      0 S   0.7  0.3   0:01.86 etcd
 1358 root      20   0   46192  10608      0 S   0.7  0.3   0:00.69 kube-scheduler
 1243 root      20   0  372552 343024      0 S   0.6  8.9   0:10.51 etcd
  695 root      20   0  889180  52352      0 S   0.4  1.4   0:05.34 dockerd
  752 root      20   0  205800  13756      0 S   0.4  0.4   0:00.56 protokube
  816 root      20   0  449964  30804      0 S   0.4  0.8   0:02.26 kubelet
 1247 root      20   0 3207664 2.856g      0 S   0.4 77.6   0:55.90 kube-apiserver
 1279 root      20   0   40848   8900      0 S   0.4  0.2   0:00.46 kube-proxy
    1 root      20   0   28788   1940      0 R   0.2  0.1   0:02.06 systemd
  157 root       0 -20       0      0      0 S   0.2  0.0   0:00.06 kworker/1:1H
 1562 admin     20   0   78320   1092      0 S   0.2  0.0   0:00.04 sshd
 1585 admin     20   0   23660    540      0 R   0.2  0.0   0:00.11 top
 1758 admin     20   0   33512    320     32 D   0.2  0.0   0:00.04 kubectl
 1779 root      20   0   39368    436      0 D   0.2  0.0   0:00.01 docker-containe

Please let me know how to troubleshoot this issue!

Update kubelet logs on master : admin@ip-172-20-49-150:~$ journalctl -u kubelet -f

Jan 06 05:41:44 ip-172-20-49-150 kubelet[819]: E0106 05:41:44.454586     819 pod_workers.go:182] Error syncing pod 685c903f9066f69a2e17c802cb043ed6 ("etcd-server-events-ip-172-20-49-150.us-west-1.compute.internal_kube-system(685c903f9066f69a2e17c802cb043ed6)"), skipping: failed to "StartContainer" for "etcd-container" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=etcd-container pod=etcd-server-events-ip-172-20-XX-XXX.us-west-1.compute.internal_kube-system(685c903f906b043ed6)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454266     819 kuberuntime_manager.go:500] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager:v1.8.4 Command:[/bin/sh -c /usr/local/bin/kube-controller-manager --allocate-node-cidrs=true --attach-detach-reconcile-sync-period=1m0s --cloud-provider=aws --cluster-cidr=100.96.0.0/11 --cluster-name=uw1b.k8s.ops.goldenratstud.io --cluster-signing-cert-file=/srv/kubernetes/ca.crt --cluster-signing-key-file=/srv/kubernetes/ca.key --configure-cloud-routes=true --kubeconfig=/var/lib/kube-controller-manager/kubeconfig --leader-elect=true --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --use-service-account-credentials=true --v=2 2>&1 | /bin/tee -a /var/log/kube-controller-manager.log] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI}]} VolumeMounts:[{Name:etcssl ReadOnly:true MountPath:/etc/ssl SubPath: MountPropagation:<nil>} {Name:etcpkitls ReadOnly:true MountPath:/etc/pki/tls SubPath: MountPropagation:<nil>} {Name:etcpkica-trust ReadOnly:true MountPath:/etc/pki/ca-trust SubPath: MountPropagation:<nil>} {Name:usrsharessl ReadOnly:true MountPath:/usr/share/ssl SubPath: MountPropagation:<nil>} {Name:usrssl ReadOnly:true MountPath:/usr/ssl SubPath: MountPropagation:<nil>} {Name:usrlibssl ReadOnly:true MountPath:/usr/lib/ssl SubPath: MountPropagation:<nil>} {Name:usrlocalopenssl ReadOnly:true MountPath:/usr/local/openssl SubPath: MountPropagation:<nil>} {Name:varssl ReadOnly:true MountPath:/var/ssl SubPath: MountPropagation:<nil>} {Name:etcopenssl ReadOnly:true MountPath:/etc/openssl SubPath: MountPropagation:<nil>} {Name:srvkube ReadOnly:true MountPath:/srv/kubernetes SubPath: MountPropagation:<nil>} {Name:logfile ReadOnly:false MountPath:/var/log/kube-controller-manager.log SubPath: MountPropagation:<nil>} {Name:varlibkcm ReadOnly:true MountPath:/var/lib/kube-controller-manager SubPath: MountPropagation:<nil>}] Live
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: nessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454658     819 kuberuntime_manager.go:739] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: I0106 05:41:45.454781     819 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)
Jan 06 05:41:45 ip-172-20-49-150 kubelet[819]: E0106 05:41:45.454813     819 pod_workers.go:182] Error syncing pod ef6f03ef0b14d853dd38e4c2a5f426dc ("kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-20-49-150.us-west-1.compute.internal_kube-system(ef6f03ef0b14d853dd38e4c2a5f426dc)"
Jan 06 05:41:47 ip-172-20-49-150 kubelet[819]: I0106 05:41:47.432074     819 container.go:471] Failed to update stats for container "/kubepods/burstable/pod2a5faee9437283d8ac7f396d86d07a03/0f62ea06693a7d4aaf6702d8ca370f2d5d2f1f3c4fdeab09aede15ea5311e47c": unable to determine device info for dir: /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94: stat failed on /var/lib/docker/overlay/ce30183e915076727e708ed10b2ada4d55d1fe6d5c989c1cffc3e29cc00dad94 with error: no such file or directory, continuing to push stats

Solution

  • I replaced old K8s Dev master node with new, but still was getting the same issue, now when vertically scaled k8s master to c4.xlarge from c4.large it's working fine!