I have a Cilium integrated into my k3d cluster but get the following error messages all over the place:
root@k3d-k8s-daemon01-dev-local01-agent-0:/home/cilium# cilium monitor -t drop
Listening for events on 16 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Missed tail call) flow 0xc75cc925 to endpoint 0, ifindex 3, file bpf_host.c:794, , identity world->unknown: 10.43.0.1:443 -> 10.42.0.92:35152 tcp SYN, ACK
xx drop (Missed tail call) flow 0x6933b0fe to endpoint 0, ifindex 3, file bpf_host.c:794, , identity world->unknown: 10.43.0.1:443 -> 10.42.0.92:35152 tcp SYN, ACK
xx drop (Missed tail call) flow 0x877a22ae to endpoint 0, ifindex 3, file bpf_host.c:794, , identity world->unknown: 10.43.0.1:443 -> 10.42.0.92:35152 tcp SYN, ACK
xx drop (Missed tail call) flow 0xdbbeb5ee to endpoint 0, ifindex 3, file bpf_host.c:794, , identity world->unknown: 10.43.0.1:443 -> 10.42.0.92:35152 tcp SYN, ACK
Some connections seem to work but it seems especially the connections from k8s to some inner pods are not possible.
What could be the reason?
I use the most recent Cilium:
root@k3d-k8s-daemon01-dev-local01-agent-0:/home/cilium# cilium version
Client: 1.14.0-rc.1 53f97a7b 2023-07-17T00:45:13-07:00 go version go1.20.5 linux/amd64
Daemon: 1.14.0-rc.1 53f97a7b 2023-07-17T00:45:13-07:00 go version go1.20.5 linux/amd64
The setup of the cluster is
export CLUSTERNAME=k8s-daemon01-dev-local01
k3d cluster create $CLUSTERNAME \
-a 1 \
... \
--image rancher/k3s:v1.27.3-k3s1
# fixes bpf for k3d:
docker exec -it k3d-$CLUSTERNAME-agent-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-$CLUSTERNAME-agent-0 mount --make-shared /sys/fs/bpf
docker exec -it k3d-$CLUSTERNAME-server-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-$CLUSTERNAME-server-0 mount --make-shared /sys/fs/bpf
# this deploys Cilium -- in a quite std way:
ansible-playbook site.yml -i inventory/k8s-daemon01-dev-local01/hosts.ini -t cilium
# wait till cilium-operator is ready
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=cilium-operator -n kube-system --timeout=300s
# wait a bit more for cilium to be ready
sleep 30
# fixes bpf for cilium:
kubectl get nodes -o custom-columns=NAME:.metadata.name --no-headers=true |
xargs -I {} docker exec {} mount bpffs /sys/fs/bpf -t bpf
kubectl get nodes -o custom-columns=NAME:.metadata.name --no-headers=true |
xargs -I {} docker exec {} mount --make-shared /sys/fs/bpf
kubectl get nodes -o custom-columns=NAME:.metadata.name --no-headers=true |
xargs -I {} docker exec {} mount --make-shared /run/cilium/cgroupv2
Drops of type Missed tail call
are a bug in Cilium and should never happen. You should report to the Cilium project, with a Cilium sysdump.
Explanation.
Cilium relies on multiple BPF programs to process packets. It uses "BPF tail calls" to jump from one program to another. That can happen in two cases:
Those tail calls happen by looking up the destination program in a key-value map using a given index (ex. id=42 => program 87). Drops of type Missed tail call
happen when the map lookup fails and doesn't return anything. That should never happen.
In the past, it did happen a few times, for example when performing upgrades and the map was not properly updated, leading to a few transient drops.