Depending on how the syscall is defined in /sys/kernel/btf/vmlinux
, reading struct pt_regs *regs
parameter for fentry/fexit traces causes invalid bpf_context access
error:
SEC("fentry/__x64_sys_recvfrom")
int BPF_PROG(fentry_syscall, struct pt_regs *regs) {
struct event t;
bpf_get_current_comm(t.comm, TASK_COMM_LEN);
u64 id = bpf_get_current_pid_tgid();
t.pid = id >> 32;
// This causes an error on some environment.
t.fd = PT_REGS_PARM1_CORE(regs);
bpf_printk("comm: %s, pid: %d, fd: %d", t.comm, t.pid, t.fd);
return 0;
$ sudo ./output
2022/07/01 03:33:01 loading objects: field FentrySyscall: program fentry_syscall: load program: permission denied:
arg#0 type is not a struct
Unrecognized arg#0 type PTR
; int BPF_PROG(fentry_syscall, struct pt_regs *regs) {
0: (79) r6 = *(u64 *)(r1 +0)
func '__x64_sys_recvfrom' arg0 type FWD is not a struct
invalid bpf_context access off=0 size=8
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
This seems to occur when the parameter defined as BTF is declared as FWD
:
$ bpftool btf dump file /sys/kernel/btf/vmlinux format raw
...
[13362] FWD 'pt_regs' fwd_kind=struct
[13363] CONST '(anon)' type_id=13362
[13364] PTR '(anon)' type_id=13363
[13365] FUNC_PROTO '(anon)' ret_type_id=36 vlen=1
'__unused' type_id=13364
...
[13608] FUNC '__x64_sys_recvmsg' type_id=13365 linkage=static
...
Meanwhile, syscalls/environments with no errors have a concrete type definition like:
$ bpftool btf dump file /sys/kernel/btf/vmlinux format raw
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
...
[226] STRUCT 'pt_regs' size=168 vlen=21
'r15' type_id=1 bits_offset=0
'r14' type_id=1 bits_offset=64
'r13' type_id=1 bits_offset=128
'r12' type_id=1 bits_offset=192
'bp' type_id=1 bits_offset=256
'bx' type_id=1 bits_offset=320
'r11' type_id=1 bits_offset=384
'r10' type_id=1 bits_offset=448
'r9' type_id=1 bits_offset=512
'r8' type_id=1 bits_offset=576
'ax' type_id=1 bits_offset=640
'cx' type_id=1 bits_offset=704
'dx' type_id=1 bits_offset=768
'si' type_id=1 bits_offset=832
'di' type_id=1 bits_offset=896
'orig_ax' type_id=1 bits_offset=960
'ip' type_id=1 bits_offset=1024
'cs' type_id=1 bits_offset=1088
'flags' type_id=1 bits_offset=1152
'sp' type_id=1 bits_offset=1216
'ss' type_id=1 bits_offset=1280
...
[5183] CONST '(anon)' type_id=226
...
[5189] PTR '(anon)' type_id=5183
...
[5321] FUNC_PROTO '(anon)' ret_type_id=42 vlen=1
'__unused' type_id=5189
...
[17648] FUNC '__x64_sys_recvmsg' type_id=5321 linkage=static
...
I've tested on several distributions and found that how the regs
is defined depends on the distribution/kernel/syscall combination. Why are they so complicated? How can I avoid this error and make my eBPF program run on any (latest) Linux environments.
I've created a GitHub repo for this issue.
This is essentially a bug and has been fixed in Linux 5.15.78.
This is what the commit log says:
With just the forward declaration of the 'struct pt_regs' in
syscall_wrapper.h, the syscall stub functions:
__[x64|ia32]_sys_*(struct pt_regs *regs)
will have different definition of 'regs' argument in BTF data
based on which object file they are defined in.
If the syscall's object includes 'struct pt_regs' definition,
the BTF argument data will point to a 'struct pt_regs' record,
like:
[226] STRUCT 'pt_regs' size=168 vlen=21
'r15' type_id=1 bits_offset=0
'r14' type_id=1 bits_offset=64
'r13' type_id=1 bits_offset=128
...
If not, it will point to a fwd declaration record:
[15439] FWD 'pt_regs' fwd_kind=struct
and make bpf tracing program hooking on those functions unable
to access fields from 'struct pt_regs'.
Include asm/ptrace.h directly in syscall_wrapper.h to make sure all
syscalls see 'struct pt_regs' definition. This then results in BTF for
'__*_sys_*(struct pt_regs *regs)' functions to point to the actual
struct, not just the forward declaration.
Replacing a forward declaration struct pt_regs;
in asm/ptrace.h
with an actual definition #include <asm/ptrace.h>
fixes the issue.