Search code examples
linux-kernelebpfbpf

"invalid bpf_context access" when trying to read `regs` parameter


Depending on how the syscall is defined in /sys/kernel/btf/vmlinux, reading struct pt_regs *regs parameter for fentry/fexit traces causes invalid bpf_context access error:

SEC("fentry/__x64_sys_recvfrom")
int BPF_PROG(fentry_syscall, struct pt_regs *regs) {
  struct event t;

  bpf_get_current_comm(t.comm, TASK_COMM_LEN);

  u64 id = bpf_get_current_pid_tgid();
  t.pid = id >> 32;

  // This causes an error on some environment.
  t.fd = PT_REGS_PARM1_CORE(regs);

  bpf_printk("comm: %s, pid: %d, fd: %d", t.comm, t.pid, t.fd);

  return 0;
$ sudo ./output
2022/07/01 03:33:01 loading objects: field FentrySyscall: program fentry_syscall: load program: permission denied:
        arg#0 type is not a struct
        Unrecognized arg#0 type PTR
        ; int BPF_PROG(fentry_syscall, struct pt_regs *regs) {
        0: (79) r6 = *(u64 *)(r1 +0)
        func '__x64_sys_recvfrom' arg0 type FWD is not a struct
        invalid bpf_context access off=0 size=8
        processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

This seems to occur when the parameter defined as BTF is declared as FWD:

$ bpftool btf dump file /sys/kernel/btf/vmlinux format raw
...
[13362] FWD 'pt_regs' fwd_kind=struct
[13363] CONST '(anon)' type_id=13362
[13364] PTR '(anon)' type_id=13363
[13365] FUNC_PROTO '(anon)' ret_type_id=36 vlen=1
        '__unused' type_id=13364
...
[13608] FUNC '__x64_sys_recvmsg' type_id=13365 linkage=static
...

Meanwhile, syscalls/environments with no errors have a concrete type definition like:

$ bpftool btf dump file /sys/kernel/btf/vmlinux format raw
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
...
[226] STRUCT 'pt_regs' size=168 vlen=21
        'r15' type_id=1 bits_offset=0
        'r14' type_id=1 bits_offset=64
        'r13' type_id=1 bits_offset=128
        'r12' type_id=1 bits_offset=192
        'bp' type_id=1 bits_offset=256
        'bx' type_id=1 bits_offset=320
        'r11' type_id=1 bits_offset=384
        'r10' type_id=1 bits_offset=448
        'r9' type_id=1 bits_offset=512
        'r8' type_id=1 bits_offset=576
        'ax' type_id=1 bits_offset=640
        'cx' type_id=1 bits_offset=704
        'dx' type_id=1 bits_offset=768
        'si' type_id=1 bits_offset=832
        'di' type_id=1 bits_offset=896
        'orig_ax' type_id=1 bits_offset=960
        'ip' type_id=1 bits_offset=1024
        'cs' type_id=1 bits_offset=1088
        'flags' type_id=1 bits_offset=1152
        'sp' type_id=1 bits_offset=1216
        'ss' type_id=1 bits_offset=1280
...
[5183] CONST '(anon)' type_id=226
...
[5189] PTR '(anon)' type_id=5183
...
[5321] FUNC_PROTO '(anon)' ret_type_id=42 vlen=1
        '__unused' type_id=5189
...
[17648] FUNC '__x64_sys_recvmsg' type_id=5321 linkage=static
...

I've tested on several distributions and found that how the regs is defined depends on the distribution/kernel/syscall combination. Why are they so complicated? How can I avoid this error and make my eBPF program run on any (latest) Linux environments.

I've created a GitHub repo for this issue.


Solution

  • This is essentially a bug and has been fixed in Linux 5.15.78.

    This is what the commit log says:

        With just the forward declaration of the 'struct pt_regs' in
        syscall_wrapper.h, the syscall stub functions:
        
          __[x64|ia32]_sys_*(struct pt_regs *regs)
        
        will have different definition of 'regs' argument in BTF data
        based on which object file they are defined in.
        
        If the syscall's object includes 'struct pt_regs' definition,
        the BTF argument data will point to a 'struct pt_regs' record,
        like:
        
          [226] STRUCT 'pt_regs' size=168 vlen=21
                 'r15' type_id=1 bits_offset=0
                 'r14' type_id=1 bits_offset=64
                 'r13' type_id=1 bits_offset=128
          ...
        
        If not, it will point to a fwd declaration record:
        
          [15439] FWD 'pt_regs' fwd_kind=struct
        
        and make bpf tracing program hooking on those functions unable
        to access fields from 'struct pt_regs'.
        
        Include asm/ptrace.h directly in syscall_wrapper.h to make sure all
        syscalls see 'struct pt_regs' definition. This then results in BTF for
        '__*_sys_*(struct pt_regs *regs)' functions to point to the actual
        struct, not just the forward declaration.
    

    Replacing a forward declaration struct pt_regs; in asm/ptrace.h with an actual definition #include <asm/ptrace.h> fixes the issue.