Search code examples
ubuntukubernetesmicrok8s

Microk8s stopped working. Status says not running, inspect just returns four services


Two of my microk8s clusters running version 1.21 just stopped working.

kubectl locally returns The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?

microk8s.status says not running, and microk8s.inspect just checks four services:

Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-kubelite is running

Apiserver not mentioned, and it's not running (checking status for that separately says "Will not run along with kubelite")

I didn't change anything on any of the machines.

I tried upgrading microk8s to 1.22 - no change.

journal.log for apiserver says:

Oct 18 07:57:05 myserver microk8s.daemon-kubelite[30037]: I1018 07:57:05.143264   30037 daemon.go:65] Starting API Server
Oct 18 07:57:05 myserver microk8s.daemon-kubelite[30037]: Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
Oct 18 07:57:05 myserver microk8s.daemon-kubelite[30037]: I1018 07:57:05.144650   30037 server.go:654] external host was not specified, using 192.168.1.10
Oct 18 07:57:05 myserver microk8s.daemon-kubelite[30037]: W1018 07:57:05.144719   30037 authentication.go:507] AnonymousAuth is not allowed with the AlwaysAllow authorizer. Resetting AnonymousAuth to false. You should use a different authorizer

snap services:

Service                               Startup  Current   Notes
microk8s.daemon-apiserver             enabled  inactive  -
microk8s.daemon-apiserver-kicker      enabled  active    -
microk8s.daemon-cluster-agent         enabled  active    -
microk8s.daemon-containerd            enabled  active    -
microk8s.daemon-control-plane-kicker  enabled  inactive  -
microk8s.daemon-controller-manager    enabled  inactive  -
microk8s.daemon-etcd                  enabled  inactive  -
microk8s.daemon-flanneld              enabled  inactive  -
microk8s.daemon-kubelet               enabled  inactive  -
microk8s.daemon-kubelite              enabled  active    -
microk8s.daemon-proxy                 enabled  inactive  -
microk8s.daemon-scheduler             enabled  inactive  -

It's not this (https://github.com/ubuntu/microk8s/issues/2486), both info.yaml and cluster.yaml have the correct contents.

All machines are virtual Ubuntus running in Hyper-V in a Windows Server cluster.


Solution

  • Turns out there were two different problems in the cluster, and that I hadn't changed anything was not entirely true.

    Single-node cluster:

    cluster.yaml was not correct, it was empty. Copying the contents of localnode.yaml to cluster.yaml fixed the problem.

    Multi-node cluster:

    One node had gone offline (microk8s not running) due to a stuck unsuccessful auto-refresh of the microk8s snap.

    I had temporarily shut down one node for a couple of days. That left only one node to hold the vote on master for dqlite, which failed. When the shut down node was turned back on the cluster had already failed. Unsticking the auto-refresh on the third node fixed the cluster.