I want to attach an eBPF sockops program to a specific kubernetes pod. I am using the bpf_prog_attach()
helper as follows:
err = bpf_prog_attach(sockops_prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0);
And here is the BPF program that I attach to the SOCKOPS hook:
#include <linux/in.h>
#include <linux/tcp.h>
#include <linux/bpf.h>
#include <sys/socket.h>
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
char LICENSE[] SEC("license") = "GPL";
// sock_ops_map maps the sock_ops key to a socket descriptor
struct {
__uint(type, BPF_MAP_TYPE_SOCKHASH);
__uint(max_entries, 65535);
__type(key, struct sock_key);
__type(value, __u64);
} sock_ops_map SEC(".maps");
// `sock_key' is a key for the sockmap
struct sock_key {
__u32 sip4;
__u32 dip4;
__u32 sport;
__u32 dport;
} __attribute__((packed));
// `sk_extract_key' extracts the key from the `bpf_sock_ops' struct
static inline void sk_extract_key(struct bpf_sock_ops *ops,
struct sock_key *key) {
key->dip4 = ops->remote_ip4;
key->sip4 = ops->local_ip4;
key->sport = (bpf_htonl(ops->local_port) >> 16);
key->dport = ops->remote_port >> 16;
}
SEC("sockops")
int bpf_add_to_sockhash(struct bpf_sock_ops *skops) {
__u32 family, op;
family = skops->family;
op = skops->op;
bpf_printk("Got new operation %d for socket.\n", op);
switch (op) {
case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
if (family == AF_INET) {
struct sock_key key = {};
sk_extract_key(skops, &key);
int ret = bpf_sock_hash_update(skops, &sock_ops_map, &key, BPF_NOEXIST);
if (ret != 0) {
bpf_printk("Failed to update sockmap: %d\n", ret);
} else {
bpf_printk("Added new socket to sockmap\n");
}
}
break;
default:
break;
}
return 0;
}
In above, when I provide the cgroup_fd for the /sys/fs/cgroup/unified
cgroup, the program works - the eBPF program gets loaded, and the print statement works.
However, when I use the specific cgroup for a Kubernetes pod (using the cgroup_fd
as /sys/fs/cgroup/unified/kubepods-burstable-podad4348c2_ac53_4c09_a9dc_c207a6c68dec.slice:cri-containerd:30a47e8e847277317a29ff7bdcf5bf03391ff79b847be647120d285f62a0f7e6
, then the program still attaches successfully but I don't get the print statements.
Is there a problem in attaching to the SOCKOPS hook for a child cgroup? Or is the cgroup for a specific kubernetes pod different from the one in unified/
?
It seems the issue was in the directory name. On my system, each kubernetes pod had two corresponding directories. For example, in my case, the kubernetes pod with ID ad4348c2-ac53-4c09-a9dc-c207a6c68dec
had the following two cgroup directories:
$ ls | grep ad4348c2_ac53_4c09_a9dc_c207a6c68dec
kubepods-burstable-podad4348c2_ac53_4c09_a9dc_c207a6c68dec.slice:cri-containerd:30a47e8e847277317a29ff7bdcf5bf03391ff79b847be647120d285f62a0f7e6
kubepods-burstable-podad4348c2_ac53_4c09_a9dc_c207a6c68dec.slice:cri-containerd:3a4af0e09c0e7e506fef59b92cbeb008b0a3e66d442e54e5ca5ded642841a335
The correct cgroup can be found using following command (or by inspecting the pod's json output):
$ kubectl get pods -A -o custom-columns=PodName:.metadata.name,PodUID:.metadata.uid,ContainerID:.status.containerStatuses[0].containerID
PodName PodUID ContainerID
frontend-b74f77687-sd8rf ad4348c2-ac53-4c09-a9dc-c207a6c68dec containerd://3a4af0e09c0e7e506fef59b92cbeb008b0a3e66d442e54e5ca5ded642841a335
Hence, the correct cgroup for the pod to attach and inspect socket messages would be kubepods-burstable-podad4348c2_ac53_4c09_a9dc_c207a6c68dec.slice:cri-containerd:3a4af0e09c0e7e506fef59b92cbeb008b0a3e66d442e54e5ca5ded642841a335