Search code examples
networkingkuberneteskubeadmcilium

Error while initializing daemon" error="exit status 2" subsys=daemon in cilium CNI on kubernetes


In the kubernetes cluster we use the cilium CNI but it fails on the worker node. The error message is shown belew.

$ kubectl get pod -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cilium-dpd4k                       0/1     Running   2          97s
cilium-operator-55658fb5c4-qpdqb   1/1     Running   0          6m30s
cilium-sc7x6                       1/1     Running   0          6m30s
coredns-6955765f44-2tjf8           1/1     Running   0          6m31s
coredns-6955765f44-h96c4           1/1     Running   0          6m31s
etcd-store                         1/1     Running   0          6m26s
kube-apiserver-store               1/1     Running   0          6m26s
kube-controller-manager-store      1/1     Running   0          6m26s
kube-proxy-8xz8n                   1/1     Running   0          97s
kube-proxy-gxgfv                   1/1     Running   0          6m30s
kube-scheduler-store               1/1     Running   0          6m26s

.

$ kubectl logs -f cilium-dpd4k -n kube-system
level=info msg="Skipped reading configuration file" reason="Config File \"ciliumd\" Not Found in \"[/root]\"" subsys=daemon
level=info msg="  --access-log=''" subsys=daemon
level=info msg="  --agent-labels=''" subsys=daemon
level=info msg="  --allow-localhost='auto'" subsys=daemon
level=info msg="  --annotate-k8s-node='true'" subsys=daemon
level=info msg="  --auto-create-cilium-node-resource='true'" subsys=daemon
level=info msg="  --auto-direct-node-routes='false'" subsys=daemon
level=info msg="  --blacklist-conflicting-routes='true'" subsys=daemon
level=info msg="  --bpf-compile-debug='false'" subsys=daemon
level=info msg="  --bpf-ct-global-any-max='262144'" subsys=daemon
level=info msg="  --bpf-ct-global-tcp-max='524288'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-fin='10s'" subsys=daemon
.
.
.
.

level=warning msg="+ ip -6 rule del fwmark 0xA00/0xF00 pref 10 lookup 2005" subsys=daemon
level=warning msg="+ true" subsys=daemon
level=warning msg="+ sed -i /ENCAP_GENEVE/d /var/run/cilium/state/globals/node_config.h" subsys=daemon
level=warning msg="+ sed -i /ENCAP_VXLAN/d /var/run/cilium/state/globals/node_config.h" subsys=daemon
level=warning msg="+ '[' vxlan = vxlan ']'" subsys=daemon
level=warning msg="+ echo '#define ENCAP_VXLAN 1'" subsys=daemon
level=warning msg="+ '[' vxlan = vxlan -o vxlan = geneve ']'" subsys=daemon
level=warning msg="+ ENCAP_DEV=cilium_vxlan" subsys=daemon
level=warning msg="+ ip link show cilium_vxlan" subsys=daemon
level=warning msg="37450: cilium_vxlan: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000" subsys=daemon
level=warning msg="    link/ether 7e:53:e8:db:1d:ef brd ff:ff:ff:ff:ff:ff" subsys=daemon
level=warning msg="+ setup_dev cilium_vxlan" subsys=daemon
level=warning msg="+ local -r NAME=cilium_vxlan" subsys=daemon
level=warning msg="+ ip link set cilium_vxlan up" subsys=daemon
level=warning msg="RTNETLINK answers: Address already in use" subsys=daemon
level=error msg="Error while initializing daemon" error="exit status 2" subsys=daemon
level=fatal msg="Error while creating daemon" error="exit status 2" subsys=daemon

Cluster informations:

$ uname -a
Linux STORE 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11) x86_64 GNU/Linux

$ kubeadm version 
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:27:49Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

$ ip address
60: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:18:01:f4:7d:8b brd ff:ff:ff:ff:ff:ff
    inet 10.36.0.0/12 brd 10.47.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::7c18:1ff:fef4:7d8b/64 scope link 
       valid_lft forever preferred_lft forever
61: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:24:ec:73:cd:7f brd ff:ff:ff:ff:ff:ff
63: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default 
    link/ether 66:2d:12:49:83:7a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::642d:12ff:fe49:837a/64 scope link 
       valid_lft forever preferred_lft forever
64: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5e:9e:39:10:31:1a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c9e:39ff:fe10:311a/64 scope link 
       valid_lft forever preferred_lft forever
65: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
    link/ether ae:63:29:ac:de:fd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ac63:29ff:feac:defd/64 scope link 
       valid_lft forever preferred_lft forever
37365: vethe0a862c@if37364: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 1e:70:54:f4:ad:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::1c70:54ff:fef4:ada0/64 scope link 
       valid_lft forever preferred_lft forever
37369: veth6628311@if37368: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 1a:ed:12:31:a2:31 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::18ed:12ff:fe31:a231/64 scope link 
       valid_lft forever preferred_lft forever
37372: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0

I already have some search around and find this issue. But I can't resovle the issue. what are the steps to solve this problem? What does he mean by that :

Since we no longer delete the old cilium_host/cilium_net veth pair if they already exist, 'ip route add' will complain of existing routes. Fix this by using 'ip route replace' instead.


Solution

  • You do anyone of below:

    1.You can follow these steps to uninstall flannel

    rm -rf /var/lib/cni/
    rm -rf /run/flannel
    rm -rf /etc/cni/
    

    Remove interfaces related to and flannel:

    ip link 
    

    For each interface flannel, do the following

    ifconfig <name of interface from ip link> down
    ip link delete <name of interface from ip link>
    

    2.You can have flannel and cillium simultaneously in your cluster. You need to follow this doc to configure flannel and cillium. Just to note this is beta feature and not recommended for production usage yet.