Search code examples
cbpfebpftracepoint

Why are the first 8 bytes of cpumap_enqueue_ctx not accessible by bpf code?


Reading some ebpf examples which are attached to tracepoints I've noticed that every struct is build starting with a padding like this (from samples/bpf/xdp_redirect_cpu_kern.c)

/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_enqueue/format
 * Code in:         kernel/include/trace/events/xdp.h
 */
struct cpumap_enqueue_ctx {
        u64 __pad;              // First 8 bytes are not accessible by bpf code
        int map_id;             //      offset:8;  size:4; signed:1;
        u32 act;                //      offset:12; size:4; signed:0;
        int cpu;                //      offset:16; size:4; signed:1;
        unsigned int drops;     //      offset:20; size:4; signed:0;
        unsigned int processed; //      offset:24; size:4; signed:0;
        int to_cpu;             //      offset:28; size:4; signed:1;
};

All I found is this comment which says that the first 8 bytes can't be accessed by bpf code, but I don't understand why.


Solution

  • From this mailing list:

    The first 8 bytes of the tracepoint context struct are not accessible by the bpf code. This is a choice that dates back to the original inclusion of this code.

    See explaination in: commit 98b5c2c65c29 ("perf, bpf: allow bpf programs attach to tracepoints")

    And from commit 98b5c2c65c29:

    introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached
    to the perf tracepoint handler, which will copy the arguments into
    the per-cpu buffer and pass it to the bpf program as its first argument.
    The layout of the fields can be discovered by doing
    'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
    prior to the compilation of the program with exception that first 8 bytes
    are reserved and not accessible to the program. This area is used to store
    the pointer to 'struct pt_regs' which some of the bpf helpers will use:
    +---------+
    | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
    +---------+
    | N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly)
    +---------+
    | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
    +---------+
    
    Not that all of the fields are already dumped to user space via perf ring buffer
    and broken application access it directly without consulting tracepoint/format.
    Same rule applies here: static tracepoint fields should only be accessed
    in a format defined in tracepoint/format. The order of fields and
    field sizes are not an ABI.
    

    So the first 8 bytes are not accessible because they are used to store a pointer to a critical structure used by BPF helpers and therefore need to stay hidden to prevent damage or information leak.