Search code examples
linuxlinux-kernelperfebpflibbpf

eBPF program execution context


I'm experimenting with writing eBPF using libbpf but the documentation is very sparse and I'm having troubles understanding even some basic things related to eBPF program execution. I'm mostly interested in BPF_PROG_TYPE_PERF_EVENT program type, in case an answer depends on the program type, but I'd also appreciate references to where the corresponding info can be found for the other program types.

  1. How does a eBPF program get executed relative to a task/thread (AFAIU, these are the same in the eBPF context) that triggered the event to which the eBPF program is attached? Is eBPF executed on the same CPU on which the triggering task/thread has been running (and the task/thread is paused until the program finishes), or they can run in parallel?
  2. Where can I find out exactly what is passed to a eBPF program as its argument (context)? I know this is program-type-dependent, and for BPF_PROG_TYPE_SOCKET_FILTER this is even documented on the bpf(2) man page, but what about the other program types?
  3. Does the return value of a eBPF program have an impact on something? I figure, every eBPF program in C programming language terms has to return a 64-bit integer because the eBPF register responsible for storing the return value has to be filled on program exit, but does the return value actually mean something to Linux? Again, if this is program-type-dependent, where can I find any information about this?

I'd really appreciate not only the answers, but also references to some official sources where the answers on these and similar questions can be found.


Solution

  • How does a eBPF program get executed relative to a task/thread

    It depends on the program type, some program types are triggered as direct result of a thread. Probes on syscalls or LSM hooks are triggered by a task and so the TID or even the task struct is known. But program types such as BPF_PROG_TYPE_SCHED_CLS are executed in the network stack in the context of a kernel thread, so no userspace thread is associated.

    (AFAIU, these are the same in the eBPF context) that triggered the event to which the eBPF program is attached?

    Typically, the context of a eBPF program refers to the data that is given to the BPF program via its "arguments". But knowing when a program is called by the kernel is also an important aspect to consider.

    Is eBPF executed on the same CPU on which the triggering task/thread has been running

    Yes. BPF programs always run on the same logical CPU on which the triggering code runs. BPF programs are guaranteed to never migrate from that CPU.

    (and the task/thread is paused until the program finishes), or they can run in parallel?

    BPF programs run sequentially to the triggering code. You can think of them as functions that may take some time to return execution.

    Where can I find out exactly what is passed to a eBPF program as its argument (context)? I know this is program-type-dependent, and for BPF_PROG_TYPE_SOCKET_FILTER this is even documented on the bpf(2) man page, but what about the other program types?

    There is some documentation out there such as https://docs.kernel.org/bpf/index.html or https://ebpf-docs.dylanreimerink.nl/linux/program-type/ but the most fool proof method is to read the kernel source code. For most program types there exist examples in https://github.com/torvalds/linux/tree/master/tools/testing/selftests/bpf/prog_tests or https://github.com/torvalds/linux/tree/master/samples/bpf

    If that fails, you can search the source code for the const struct bpf_verifier_ops declaration for your program type, then follow the .is_valid_access function which will typically elude to the context type. For example for perf events:

    Does the return value of a eBPF program have an impact on something? I figure, every eBPF program in C programming language terms has to return a 64-bit integer because the eBPF register responsible for storing the return value has to be filled on program exit, but does the return value actually mean something to Linux? Again, if this is program-type-dependent, where can I find any information about this?

    Yes, for most program types the return value has meaning. The meaning is different for each program types, sometimes its interpreted as an enum value, sometimes as an error code, and other times as a number such as packet length. Some programs types are only allowed to return "valid" return values and for other program types it doesn't matter.

    The meaning of the return type per program type is typically also documented on the same pages as the context type, but to find the related kernel code you have to look at how its used at the bpf_prog_run call site.