Search code examples
kubernetesflannelkubeadm

Kubernetes: Frequently gets "Error adding network: no IP addresses available in network: cbr0"


I set up a single-node Kubernetes cluster, using kubeadm, on Ubuntu 16.04 LTS with flannel.

Most of the time everything works well, but every couple of days, the cluster gets into a state where it can't schedule new pods - the pods are stuck in "Pending" state and When I kubectl describe pod of those pods, I error messages like these:

Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----                -------------   --------    ------      -------
  2m        2m      1   {default-scheduler }                Normal      Scheduled   Successfully assigned dex-1939802596-zt1r3 to superserver-03
  1m        2s      21  {kubelet superserver-03}            Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "somepod-1939802596-zt1r3_somenamespace" with SetupNetworkError: "Failed to setup network for pod \"somepod-1939802596-zt1r3_somenamespace(167f8345-faeb-11e6-94f3-0cc47a9a5cf2)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"

I've found this stackoverflow question and the workaround he's suggested. It does help to recover (it takes a several minutes though), but the problem comes back after a while...

I've also encountered this open issue, and also got the issue recovered using the suggested workaround, but again, the problem comes back. Also, it's not exactly my case, and the issue was closed after just finding a workaround... :\

Technical details:

kubeadm version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.0.2074+a092d8e0f95f52", GitCommit:"a092d8e0f95f5200f7ae2cba45c75ab42da36537", GitTreeState:"clean", BuildDate:"2016-12-13T17:03:18Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Started the cluster with these commands:

kubeadm init --pod-network-cidr 10.244.0.0/16 --api-advertise-addresses 192.168.1.200

kubectl taint nodes --all dedicated-

kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Some syslog logs that may be relevant (I got many of those):

Feb 23 11:07:49 server-03 kernel: [  155.480669] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 23 11:07:49 server-03 dockerd[1414]: time="2017-02-23T11:07:49.735590817+02:00" level=warning msg="Couldn't run auplink before unmount /var/lib/docker/aufs/mnt/89bb7abdb946d858e175d80d6e1d2fdce0262af8c7afa9c6ad9d776f1f5028c4-init: exec: \"auplink\": executable file not found in $PATH"
Feb 23 11:07:49 server-03 kernel: [  155.496599] aufs au_opts_verify:1597:dockerd[24704]: dirperm1 breaks the protection by the permission bits on the lower branch
Feb 23 11:07:49 server-03 systemd-udevd[29313]: Could not generate persistent MAC address for vethd4d85eac: No such file or directory
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.756976    1228 cni.go:255] Error adding network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kernel: [  155.514994] IPv6: eth0: IPv6 duplicate address fe80::835:deff:fe4f:c74d detected!
Feb 23 11:07:49 server-03 kernel: [  155.515380] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 23 11:07:49 server-03 kernel: [  155.515588] device vethd4d85eac entered promiscuous mode
Feb 23 11:07:49 server-03 kernel: [  155.515643] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kernel: [  155.515663] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757001    1228 cni.go:209] Error while adding to cni network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757056    1228 docker_manager.go:2201] Failed to setup network for pod "somepod-752955044-58g59_somenamespace(5d6c28e1-f8dd-11e6-9843-0cc47a9a5cf2)" using network plugins "cni": no IP addresses available in network: cbr0; Skipping pod

Many thanks!

Edit:

I am able to reproduce it. It seems like it is an exhaust of the IP addresses in the kubelet CIDR. Findings:

  • First, the podCIDR of the node is (got it through kubectl get node -o yaml): podCIDR: 10.244.0.0/24 (BTW, why not /16 as the cluster CIDR I've set in the kubeadm commnad?).

  • Second:

    $ sudo ls -la /var/lib/cni/networks/cbr0 | wc -l

    256 (that is, 256 IPs are assigned, right?)

  • But, that happens although I currently have no more than 256 running Kubernetes pods and services:

    $ kubectl get all --all-namespaces | wc -l

    180

    ### (Yes, this includes not only pods and services, but also jobs, deployments and replicasets)

So, home comes the IP addresses are exhausted? How to fix that? It can't be that those workarounds are the only ways...

Thanks again.

Edit (2)

Another related issue: https://github.com/containernetworking/cni/issues/306


Solution

  • For now, this is the best workaround I've found:

    https://github.com/kubernetes/kubernetes/issues/34278#issuecomment-254686727

    I've set up a cron job to run this script on @reboot.

    It seems like that issue has been resolved with a temp fix of Garbage Collecting the pods on an event of Docker daemon restart, but that feature was probably not enabled in my cluster.

    A few days ago, the new better long-term fix was just merged, so I hope this issue will be fixed in the next Kubernetes 1.6.0 release.