ebpf kprobe argument not matching the syscall

I'm learning eBPF and I'm playing with it in order to understand it better while following the docs but there's something I don't understand why it's not working...

I have this very simple code that stops the code and returns 5.

int main() {
   exit(5);
   return 0;
}

The exit function from the code above calls the exit_group syscall as can we can see by using strace (image below) yet within my Python code that's using eBPF through bcc the output I get for my bpf_trace_printk is the value 208682672 and not the value 5 that the exit_group syscall is called with as I was expecting...

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void my_exit(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='my_exit')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

I've looked into whatever I found online but I couldn't find a solution as I've been investigating this problem for a few days now...

I truly appreciate any help available and thank you!

Solution

Fix

You need to rename your function from my_exit to syscall__exit_group.

Why does this matter? BPF programs named in this way get special handling from BCC. Here's what the documentation says:

8. system call tracepoints

Syntax: syscall__SYSCALLNAME

syscall__ is a special prefix that creates a kprobe for the system call name provided as the remainder. You can use it by declaring a normal C function, then using the Python BPF.get_syscall_fnname(SYSCALLNAME) and BPF.attach_kprobe() to associate it.

Arguments are specified on the function declaration: syscall__SYSCALLNAME(struct pt_regs *ctx, [, argument1 ...]).

For example:
int syscall__execve(struct pt_regs *ctx,
    const char __user *filename,
    const char __user *const __user *__argv,
    const char __user *const __user *__envp)
{
    [...]
}
This instruments the execve system call.

Source.

Corrected Code

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void syscall__exit_group(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='syscall__exit_group')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

Output from the sample program exiting:

(b'<...>', 14896, 0, b'd...1', 3996.079261, b'5')

How it Works

After BCC transforms your BPF program, this results in a slightly different interpretation of the arguments passed. You can use bpf = BPF(text=bpftext, debug=bcc.DEBUG_PREPROCESSOR) to see how your code is transformed.

Here's what happens without the syscall__ prefix:

void my_exit(struct pt_regs *ctx){
 int status = ctx->di;
        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

This reads in the RDI register and interprets it as the syscall argument.

On the other hand, here's what happens if it's named syscall__exit_group:

void syscall__exit_group(struct pt_regs *ctx){
#if defined(CONFIG_ARCH_HAS_SYSCALL_WRAPPER) && !defined(__s390x__)
 struct pt_regs * __ctx = ctx->di;
 int status; bpf_probe_read(&status, sizeof(status), &__ctx->di);
#else
 int status = ctx->di;
#endif

        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

If the CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined (it is on x86_64) then the RDI register is interpreted as a pointer to a struct pt_regs, which looks up the RDI register in that, which is the first argument to exit_group().

On systems without syscall wrappers, this does the same thing as the previous example.