Search code examples
clinuxlinux-kernelkprobe

confused by sys_stat, sys_statfs syscall works


I'm trying to set a kprobe on stat syscall to capture some information. When checking /proc/kallsyms I have many similar names and it's a bit confusing which is the right one for me too use.

I'm trying to find the right syscall used to get the data shown in output of commands like stat filename or stat dir.

First I tried __x64_sys_stat but my handlers do not get called. Then I tried __do_sys_stat, __x64_sys_newstat but none of the handlers was called.

Below is output of /proc/kallsyms,

user@xubun2204:~$ cat /proc/kallsyms | grep newstat
0000000000000000 t __do_sys_newstat
0000000000000000 T __x64_sys_newstat
0000000000000000 T __ia32_sys_newstat
0000000000000000 t __do_compat_sys_newstat
0000000000000000 T __ia32_compat_sys_newstat
0000000000000000 T __x64_compat_sys_newstat
0000000000000000 d event_exit__newstat
0000000000000000 d event_enter__newstat
0000000000000000 d __syscall_meta__newstat
0000000000000000 d args__newstat
0000000000000000 d types__newstat
0000000000000000 d __event_exit__newstat
0000000000000000 d __event_enter__newstat
0000000000000000 d __p_syscall_meta__newstat
0000000000000000 d _eil_addr___x64_compat_sys_newstat
0000000000000000 d _eil_addr___ia32_compat_sys_newstat
0000000000000000 d _eil_addr___ia32_sys_newstat
0000000000000000 d _eil_addr___x64_sys_newstat
user@xubun2204:~$ cat /proc/kallsyms | grep do_stat
0000000000000000 T proc_do_static_key
0000000000000000 T do_statx
0000000000000000 t do_statfs_native
0000000000000000 t do_statfs64
user@xubun2204:~$ cat /proc/kallsyms | grep sys_stat
0000000000000000 t __do_sys_stat
0000000000000000 T __x64_sys_stat
0000000000000000 T __ia32_sys_stat
0000000000000000 T __x64_sys_statx
0000000000000000 T __ia32_sys_statx
0000000000000000 t __do_sys_statfs
0000000000000000 T __x64_sys_statfs
0000000000000000 T __ia32_sys_statfs
0000000000000000 t __do_sys_statfs64
0000000000000000 T __x64_sys_statfs64
0000000000000000 T __ia32_sys_statfs64
0000000000000000 t __do_compat_sys_statfs
0000000000000000 T __ia32_compat_sys_statfs
0000000000000000 T __x64_compat_sys_statfs
0000000000000000 T kcompat_sys_statfs64
0000000000000000 T __ia32_compat_sys_statfs64
0000000000000000 T __x64_compat_sys_statfs64
0000000000000000 d _eil_addr___ia32_sys_statx
0000000000000000 d _eil_addr___x64_sys_statx
0000000000000000 d _eil_addr___ia32_sys_stat
0000000000000000 d _eil_addr___x64_sys_stat
0000000000000000 d _eil_addr___x64_compat_sys_statfs64
0000000000000000 d _eil_addr___ia32_compat_sys_statfs64
0000000000000000 d _eil_addr___x64_compat_sys_statfs
0000000000000000 d _eil_addr___ia32_compat_sys_statfs
0000000000000000 d _eil_addr___ia32_sys_statfs64
0000000000000000 d _eil_addr___x64_sys_statfs64
0000000000000000 d _eil_addr___ia32_sys_statfs
0000000000000000 d _eil_addr___x64_sys_statfs

Then I tried __x64_sys_statfs and now my handlers are getting called!

The way I understood the difference between sys_stat and sys_statfs is sys_stat is used for getting information about files and directories and sys_statfs is to get information about filesystem.

But in this case no matter what I through at stat command line(filesystem or specific file or directory) the __x64_sys_statfs is getting called!

The problem starts when I want to read what is being returned by this syscall, should I expect struct kstat or struct statfs?

Here is the code that I manage to trigger the kretprobe handler with,

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kprobes.h>
#include <linux/statfs.h>
#include <linux/slab.h>
#include <linux/fs.h>

static struct kretprobe my_kretprobe;

static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
    printk("entry_handler\n");
    return 0;
}

static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
    printk("ret_handler\n");
    return 0;
}

static int __init my_module_init(void)
{
    int ret;

    my_kretprobe.kp.symbol_name = "__x64_sys_statfs";
    my_kretprobe.entry_handler = entry_handler;
    my_kretprobe.handler = ret_handler;
    my_kretprobe.maxactive = 20;

    ret = register_kretprobe(&my_kretprobe);
    if (ret < 0) {
        printk(KERN_INFO "register_kretprobe failed, returned %d\n", ret);
        return ret;
    }

    printk(KERN_INFO "Kretprobe registered for __x64_sys_statfs\n");
    return 0;
}

static void __exit my_module_exit(void)
{
    unregister_kretprobe(&my_kretprobe);
    printk(KERN_INFO "Kretprobe unregistered\n");
}

module_init(my_module_init);
module_exit(my_module_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("jelal");
MODULE_DESCRIPTION("simple kret lkm");

and the Makefile,

obj-m += statdata.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

I'm experimenting and I do not have any final goal for a product, the only goal is to being able to access the data shown by stat terminal command from kernel.

  • Whats the difference between sys_stat, sys_statfs, sys_newstat and sys_statx?
  • How can I know which syscall is the right one to install probe on?
  • What is the reason that sys_stat seem to not being used?

Thanks


Solution

  • When you use the command-line stat utility, it depends on the version of the utility whether it uses:

    • sys_stat,
    • open, then sys_fstat,
    • sys_lstat,
    • opendir, then sys_newfstatat
    • sys_statx - a new stat API with flags about what information to retrieve and what information was retrieved.

    These all return much the same information about the named object (statx uses a different structure), for slightly different programmatic inputs, which the command-line tool does not expose.

    In your case it looks like statx is linked into the symbol fixups for the application, so likely this is the API actually being used.

    In many cases on older distributions the stat application will first call lstat, or sys_newfstatat with NOFOLLOW flagged, because the stat command wants to report that it saw the link. Having determined that the named resource is not a link, it has no need to call stat again, but it perhaps would if it was a link.

    ltrace and strace can be useful tools to see which library and system calls are actually being made by an application.

    Generally, the dynamic library call is a thin wrapper around the system call, but in the case of the stat family of functions, the library entrypoints have an x at the start of their name, and take an extra argument specifying the version of the stat structure that the application expects. The dynamic library implementation can choose to translate the syscall result to earlier versions of the structure, or fail the call without making the syscall if the version is not recognised. However, I don't think this versioning feature was ever really used in anger. Often also the library call name has 64 appended to the name to support 32-bit programs that used the 64-bit interface in the transitional period, that are now recompiled as 64-bit programs.

    Those are all the stat functions.

    Additionally, the OP reminds me they asked about statfs. StatFS returns some "interesting" information about the file system that is serving that file, but not about the file itself.

    The command-line stat program has an option --file-system to show this information, which is why you can see a binding to the statfs library call, but it doesn't make the call or display the info by default.