Search code examples
linuxdockerlinux-kernelbpfebpf

Why does my BPF_PROG_TYPE_CGROUP_SKB program not work in a container?


I have written the following eBPF program to count packets:

#include <linux/version.h>
#include <uapi/linux/bpf.h>

#include "include/bpf_map.h"
#include "include/bpf_helpers.h"

struct bpf_map_def SEC("maps/count") count_map = {
    .type = BPF_MAP_TYPE_ARRAY,
    .key_size = sizeof(int),
    .value_size = sizeof(__u64),
    .max_entries = 1024,
};

SEC("cgroup/skb")
int count_packets(struct __sk_buff *skb) {
    char debug[] = "count_packets\n";
    bpf_trace_printk(debug, sizeof(debug));

    int packets_key = 0;
    __u64 *packets = 0;

    packets = bpf_map_lookup_elem(&count_map, &packets_key);
    if (packets == 0)
        return 0;

    *packets += 1;

    // allow access
    return 1;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

I also have a user space component that loads the program as a BPF_PROG_TYPE_CGROUP_SKB, attaches it to a v2 cgroup (/sys/fs/cgroup/unified/foo) using attach type BPF_CGROUP_INET_EGRESS, adds its own PID to that cgroup and starts creating network traffic.

When I run this user space component outside of a container it works as expected and I see my program being called by doing cat /sys/kernel/debug/tracing/trace_pipe.

However, when I run my program within a container I do not see any output.

I am running the container as follows:

docker run -it \
        --privileged \
        --pid=host \
        --net=host \
        -v /sys/fs/cgroup/unified:/sys/fs/cgroup/unified \
        ${IMAGE}

I am using host network and PID namespaces to avoid any potential issues they would cause otherwise.

Why does my program not seem to work from within a container?

uname -a: Linux ubuntu-bionic 4.18.0-16-generic #17~18.04.1-Ubuntu SMP Tue Feb 12 13:35:51 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Solution

  • This was because docker was using the net_prio and net_cls controllers which overwrite data used for cgroup2 matching. From here

    While userland may start using net_prio or net_cls at any time, once either is used, cgroup2 matching no longer works.

    My solution was to disable these controllers with the boot flag: cgroup_no_v1=net_prio,net_cls. A better solution would be to just stop docker from using them, but I couldn't see how to do that.