Master can't connect to cluster

After a cluster upgrade, one of three masters can't connect back to the cluster. I have a HA cluster running in us-east-1a, us-east-1b and us-east-1c, my master that is running in us-east-1a can't join back to the cluster.

I tried to scale down the master-us-east-1a instance group to zero nodes and back it to one node but the EC2 machine starts with the same problem, can't join back to the cluster again, seems to starts with a backup or something.

I tried to connect to the master to restart the services, maybe protukube or docker, but I can't solve the problem too.

Connecting via ssh in the master I noticed that the flannel service is not running in this machine. I tried to run manually via docker without success. Seems that flannel is the network service that should be running and is not.

  • Can I reset the master of us-east-1a and create it from zero?
  • Any ideas about getting flannel service running in this master?

Thanks in advance.


> kubectl get nodes
NAME                             STATUS     ROLES    AGE   VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      master   33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      master   33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9


> sudo systemctl status kubelet

Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.026553    2502 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.027005    2502 kubelet_node_status.go:79] Attempting to register node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: E0110 21:00:55.027764    2502 kubelet_node_status.go:103] Unable to register node "ip-xxx-xxx-xxx-xxx.ec2.internal" with API server: Post dial tcp connect: connection refused


> sudo docker logs k8s_kube-apiserver_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_16

F0110 20:59:35.581865       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry []    true false 1000 0xc42013c480 <nil> 5m0s 1m0s}), err (dial tcp connect: connection refused)


> sudo docker version

 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:31:19 2017
 OS/Arch:      linux/amd64

 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:31:19 2017
 OS/Arch:      linux/amd64
 Experimental: false


> kubectl version

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:40:24Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server was refused - did you specify the right host or port?


> sudo docker images

REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
protokube                            1.15.0              6b00e7216827        7 weeks ago         288 MB                v1.11.9             e18fcce798b8        9 months ago        98.1 MB   v1.11.9             634ccbd18a0f        9 months ago        155 MB            v1.11.9             ef9a84756d40        9 months ago        187 MB            v1.11.9             e00d30bd3a71        9 months ago        56.9 MB               3.0                 99e59f495ffa        3 years ago         747 kB
kopeio/etcd-manager                  3.0.20190930        7937b67f722f        50 years ago        656 MB


> sudo docker ps

CONTAINER ID        IMAGE                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
b4eb0ec9e6a2            "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-scheduler_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
8f827dc0eade        kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8                  "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_etcd-manager_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
5bebb169b8b3   "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-controller-manager_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
4467d550824e                "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-proxy_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
0a5c23006e18        kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8                  "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_etcd-manager_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
3efa9ae55618                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
4e451bc007ac                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
7c5c301e034a                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_0
d88f075fa61f                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
69e8844e9c14                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
05e67c2e8f98                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
eee0a4d563c0        protokube:1.15.0                                                                                             "/usr/bin/protokub..."   15 hours ago        Up 15 hours                             hungry_shirley


  • The Kubelet is trying to register the master node us-east-1a with an API Server endpoint I believe this should be API server endpoint of any of the other two masters. Kubelet uses kubelet.conf file to talk to the API Server to register node.Change the server in kubelet.conf file located at /etc/kubernetes to point to one of the below:

    1. Elastic IP or public IP of master node at us-east-1b or us-east-1c ex https://xx.xx.xx.xx:6443
    2. Private IP of current master node us-east-1b or us-east-1c ex https://xx.xx.xx.xx:6443
    3. FQDN of current master node if you have a Load balancer in-front of your master nodes running the kubernetes API server.

    After changing kubelet.conf restart kubelet.

    Edit: Since you are using etcd manager can you try the Kubernetes service unavailable / flannel issues troubleshooting step described here