Search code examples
linuxlinux-kernelebpfbpflibbpf

How to make eBPF program sleepable


I've been reading about sleepable eBPF programs, specifically this article provides a nice introduction. However I am struggling to find any documentation or examples on how to actually achieve this in code. Any tips or links to documentation are greatly appreciated.


Solution

  • note: This answer was last updated 29-03-2023, sleepable programs are still subject to change, this answer might not be accurate past kernel v6.3.

    As of the writing of this answer, no official documentation for this feature exists. The closest you can get at this moment is the LWN article and the commit messages.

    Sleepable programs were added in v5.10.

    eBPF programs that are "sleepable" cannot actually actively sleep like a userspace program can (e.g. "os.sleep(100ms)"). It is a property which allows the program to call certain helper functions which are otherwise not available. For example bpf_copy_from_user.

    Non-sleepable eBPF programs are guaranteed to not switch between CPUs and on non RT (Real Time) kernels will not be interrupted by the scheduler. However, some operations like reading from userspace memory might be IO bound, if that userspace memory is file-backed for example. So these functions might "sleep" or block until that IO is done. It makes sense for the kernel to do something else in the mean time. Which is why this was introduced.

    This has some implications:

    The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs.

    To load a program as sleepable, the BPF_F_SLEEPABLE flag has to be passed to the BPF_PROG_LOAD syscall command. For authors using libbpf, the easiest way to tell the loader to do so is to add a .s to the end of the program section. For an LSM hook on file_protect the section name would become lsm.s/file_protect a uprobe for example uprobe.s//proc/self/exe:trigger_func3

    In v5.10 only fentry/fexit/fmod_ret and lsm programs can be sleepable. Even so, not all LSM hooks are allowed to be sleepable. The patch set adds a special list of sleepable hooks:

    /* non exhaustive list of sleepable bpf_lsm_*() functions */
    BTF_SET_START(btf_sleepable_lsm_hooks)
    #ifdef CONFIG_BPF_LSM
    BTF_ID(func, bpf_lsm_file_mprotect)
    BTF_ID(func, bpf_lsm_bprm_committed_creds)
    #endif
    BTF_SET_END(btf_sleepable_lsm_hooks)
    
    static int check_sleepable_lsm_hook(u32 btf_id)
    {
        return btf_id_set_contains(&btf_sleepable_lsm_hooks, btf_id);
    }
    

    In v5.18 support for sleepable iterator programs has been added in this patch set.

    In v6.0 support for sleepable uprobes and uretprobes was added in this patch set.

    In v6.3 support for sleepable sockops programs was added in this patch set.

    Lastly in v5.10, sleepable programs are only allowed to use pre-allocated BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_ARRAY maps.

    In v5.12, this patch, allows sleepable programs to use per-CPU map types: BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_PERCPU_ARRAY, BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_MAP_TYPE_ARRAY_OF_MAPS, and BPF_MAP_TYPE_HASH_OF_MAPS

    In v5.12, this patch set, allows sleepable programs to use ringbuffers: BPF_MAP_TYPE_RINGBUF

    In v5.17, this patch set, allows sleepable programs to use storage maps: BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_SK_STORAGE, and BPF_MAP_TYPE_TASK_STORAGE

    When user ring buffers were introduced in v6.1, in this patch set, sleepable programs got access to BPF_MAP_TYPE_USER_RINGBUF as well.

    And in v6.2, this patch set, gave sleepable programs access to BPF_MAP_TYPE_CGRP_STORAGE.