Search code examples
dockerkuberneteskubeadmflannelweave

Kubernetes' container creation with flannel gets stuck in "ContainerCreating"-state


Context

I installed Docker following this instruction on my Ubuntu 18.04 LTS (Server) and later on Kubernetes followed via kubeadm. After initializing (kubeadm init --pod-network-cidr=10.10.10.10/24) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running.

As pod network I tried both, Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml) and Weave Net - Nothing changed. It still shows status ContainerCreating, even after hours of waiting:

enter image description here

Question

Why doesn't the container creation work as expected and what might be the root cause for this? And most importantly: How do I solve this?

Edit

Summing up my answer below, here are the reasons why:

  • Docker used cgroups instead of systemd
  • I did not configure iptables correctly
  • I used a wrong kubeadm init since flannels standard-yaml requires --pod-network-cidr to be 10.244.0.0/16

Solution

  • Since answering this questions took me a lot of time, I wanted to share what got me out of this. There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.



    First it all started with Docker...

    I figured out that it presumably all started with the way I installed Docker. Following the linked online-instructions I used sudo apt-get install docker.io in order to install Docker and used it with cgroups by doing sudo usermod -aG docker $USER.

    Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd is the recommended way to go!

    So I completly purged all I ever did with docker by following these great instructions from Mayur Bhandare:

    sudo apt-get purge -y docker-engine docker docker.io docker-ce  
    sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce  
    sudo rm -rf /var/lib/docker /etc/docker
    sudo rm /etc/apparmor.d/docker
    sudo groupdel docker
    sudo rm -rf /var/run/docker.sock
    
    # Reboot to be sure
    

    Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):

    # Install Docker CE
    ## Set up the repository:
    ### Install packages to allow apt to use a repository over HTTPS
    apt-get update && apt-get install -y \
      apt-transport-https ca-certificates curl software-properties-common gnupg2
    
    ### Add Docker’s official GPG key
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
    
    ### Add Docker apt repository.
    add-apt-repository \
      "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
      $(lsb_release -cs) \
      stable"
    
    ## Install Docker CE.
    apt-get update && apt-get install -y \
      containerd.io=1.2.10-3 \
      docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
      docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)
    
    # Setup daemon.
    cat > /etc/docker/daemon.json <<EOF
    {
      "exec-opts": ["native.cgroupdriver=systemd"],
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m"
      },
      "storage-driver": "overlay2"
    }
    EOF
    
    mkdir -p /etc/systemd/system/docker.service.d
    
    # Restart docker.
    systemctl daemon-reload
    systemctl restart docker
    

    Note that this explicitly uses systemd!



    ... and then it went on with Flannel...

    Above I wrote my sudo kubeadm init was done with --pod-network-cidr=10.10.10.10/24 since the latter was the IP of my master. Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16 results in an error for example using kubectl proxy or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml. This is due to the fact that 10.244.0.0/16 is hard-linked in the .yaml and, hence, mandatory - Or you just change it in the .yaml.

    In order to get rid of the false configuration I did a full reset. This can be achieved using sudo kubeadm reset and by deleting the config with sudo rm -r ~/.kube/config. Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm and making sure it did use iptables this time (which I also forgot to do before...).

    Here is a nice link how to fully uninstall all kubeadm-parts.

    kubeadm reset
    sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*   
    sudo apt-get autoremove  
    sudo rm -rf ~/.kube
    

    For the sake of completeness, here is the reinstall as well:

    # ensure legacy binaries are installed
    sudo apt-get install -y iptables arptables ebtables
    
    # switch to legacy versions
    sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
    sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
    sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
    sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
    
    # Install Kubernetes with kubeadm
    sudo apt-get update && sudo apt-get install -y apt-transport-https curl
    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
    deb https://apt.kubernetes.io/ kubernetes-xenial main
    EOF
    sudo apt-get update
    sudo apt-get install -y kubelet kubeadm kubectl
    sudo apt-mark hold kubelet kubeadm kubectl
    
    #reboot
    



    ... and finally it worked!

    After the clean reinstallation I did the following:

    # Initialize with correct cidr
    sudo kubeadm init --pod-network-cidr=10.244.0.0/16
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
    

    And then be astouned by the result:

    kubectl get pods --all-namespaces
    

    enter image description here

    On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory-error I encountered prior to these steps when describing the uncreated coredns.