I'm learning eBPF and I'm playing with it in order to understand it better while following the docs but there's something I don't understand why it's not working...
I have this very simple code that stops the code and returns 5.
int main() {
exit(5);
return 0;
}
The exit
function from the code above calls the exit_group
syscall as can we can see by using strace
(image below) yet within my Python code that's using eBPF through bcc the output I get for my bpf_trace_printk
is the value 208682672
and not the value 5
that the exit_group syscall is called with as I was expecting...
from bcc import BPF
def main():
bpftext = """
#include <uapi/linux/ptrace.h>
void my_exit(struct pt_regs *ctx, int status){
bpf_trace_printk("%d", status);
}
"""
bpf = BPF(text=bpftext)
fname = bpf.get_syscall_fnname('exit_group')
bpf.attach_kprobe(event=fname, fn_name='my_exit')
while True:
print(bpf.trace_fields())
if __name__ == '__main__':
main()
I've looked into whatever I found online but I couldn't find a solution as I've been investigating this problem for a few days now...
I truly appreciate any help available and thank you!
You need to rename your function from my_exit
to syscall__exit_group
.
Why does this matter? BPF programs named in this way get special handling from BCC. Here's what the documentation says:
8. system call tracepoints
Syntax:
syscall__SYSCALLNAME
syscall__
is a special prefix that creates a kprobe for the system call name provided as the remainder. You can use it by declaring a normal C function, then using the PythonBPF.get_syscall_fnname(SYSCALLNAME)
andBPF.attach_kprobe()
to associate it.Arguments are specified on the function declaration:
syscall__SYSCALLNAME(struct pt_regs *ctx, [, argument1 ...])
.For example:
int syscall__execve(struct pt_regs *ctx, const char __user *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) { [...] }
This instruments the execve system call.
from bcc import BPF
def main():
bpftext = """
#include <uapi/linux/ptrace.h>
void syscall__exit_group(struct pt_regs *ctx, int status){
bpf_trace_printk("%d", status);
}
"""
bpf = BPF(text=bpftext)
fname = bpf.get_syscall_fnname('exit_group')
bpf.attach_kprobe(event=fname, fn_name='syscall__exit_group')
while True:
print(bpf.trace_fields())
if __name__ == '__main__':
main()
Output from the sample program exiting:
(b'<...>', 14896, 0, b'd...1', 3996.079261, b'5')
After BCC transforms your BPF program, this results in a slightly different interpretation of the arguments passed. You can use bpf = BPF(text=bpftext, debug=bcc.DEBUG_PREPROCESSOR)
to see how your code is transformed.
Here's what happens without the syscall__
prefix:
void my_exit(struct pt_regs *ctx){
int status = ctx->di;
({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
}
This reads in the RDI register and interprets it as the syscall argument.
On the other hand, here's what happens if it's named syscall__exit_group
:
void syscall__exit_group(struct pt_regs *ctx){
#if defined(CONFIG_ARCH_HAS_SYSCALL_WRAPPER) && !defined(__s390x__)
struct pt_regs * __ctx = ctx->di;
int status; bpf_probe_read(&status, sizeof(status), &__ctx->di);
#else
int status = ctx->di;
#endif
({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
}
If the CONFIG_ARCH_HAS_SYSCALL_WRAPPER
is defined (it is on x86_64) then the RDI register is interpreted as a pointer to a struct pt_regs
, which looks up the RDI register in that, which is the first argument to exit_group()
.
On systems without syscall wrappers, this does the same thing as the previous example.