Search code examples
clinuxebpf

EBPF program load fails without verifier log


I am trying to write an EBPF program, but at some place I got stuck. In the documentation it is said that functions, which are defined by you, are absolutely valid and callable, however, even in this simplest example (using map though) I get an error when loading the program to the interface using tc, and verifier emits absolutely nothing.

This is my example program, I have also inserted the necessary BPF-helpers, so that it can be compiled at once:

#include <linux/bpf.h>
#include <linux/pkt_cls.h>

#ifndef __BPF_HELPERS_H
#define __BPF_HELPERS_H

/* helper macro to place programs, maps, license in
 * different sections in elf_bpf file. Section names
 * are interpreted by elf_bpf loader
 */
#define SEC(NAME) __attribute__((section(NAME), used))

/* a helper structure used by eBPF C program
 * to describe map attributes to elf_bpf loader
 */
struct bpf_map_def {
    unsigned int type;
    unsigned int key_size;
    unsigned int value_size;
    unsigned int max_entries;
    unsigned int map_flags;
    unsigned int inner_map_idx;
    unsigned int numa_node;
};

/* helper functions called from eBPF programs written in C */
static void *(*bpf_map_lookup_elem)(void *map, const void *key) =
(void *) BPF_FUNC_map_lookup_elem;
static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) =
(void *) BPF_FUNC_clone_redirect;

#endif

struct bpf_map_def SEC("maps") my_map = {
    .type        = BPF_MAP_TYPE_ARRAY,
    .key_size    = sizeof(__u32),
    .value_size  = sizeof(int),
    .max_entries = 256,
};

static int foo(int value) { return value == 1; }

SEC("action")
int filter_and_redirect(struct __sk_buff* skb)
{
    int key = 0;
    int *value = (int*)bpf_map_lookup_elem(&my_map, &key);
    if (!value) return TC_ACT_SHOT;

    if (foo(*value)) bpf_clone_redirect(skb, 1, 1);
    // if (*value == 1) bpf_clone_redirect(skb, 1, 1);
    return TC_ACT_SHOT;
}

SEC("classifier")
int cls_main(struct __sk_buff* skb) { return -1; }

I compile it with the following command, which is pretty standard

clang -O2 -fno-inline-functions -emit-llvm -c example.c -o - | llc -march=bpf -filetype=obj -o example.o

And then loading it to the interface via

sudo tc filter add dev foo0 parent ffff: bpf obj example.o sec classifier flowid ffff:1 action bpf obj example.o sec action ok

If I comment the line with the function call and uncomment the next one, it works perfectly. However, the result of the load with the original program is:

Error fetching program/map!
bad action parsing
parse_action: bad value (6:bpf)!
Illegal "action"

There should be some log, specifying, exactly why my program is not valid for the verifier, but there isn't, so I'm really stuck. Why doesn't it allow me to use the function? Thanks for all the advises!

In case you'd like to take a look at the result of llvm-objdump -S command, you can find it here.

[EDIT]

As it seems from the accepted answer, there is a bug(?) in tc, which basically does not allow functions, which do not use maps. That's why as a temporary workaround I have changed the function in the example to:

struct bpf_map_def SEC("maps") dummy_map = {
    .type        = BPF_MAP_TYPE_ARRAY,
    .key_size    = sizeof(int),
    .value_size  = sizeof(int),
    .max_entries = 1,
};

int foo(int value) { 
    int key = 0;
    int *v = (int*)bpf_map_lookup_elem(&dummy_map, &key);

    return value == 1; 
}

Now it is working, but with the cost of an additional map operation, which is not needed, so I will be looking forward to more news about this issue.


Solution

  • There is no log from the verifier, and should not be any, because the verifier never gets a chance to check your program. The error you get comes from tc, which fails to shape the bytecode into a program for the kernel, and no attempt to load the program into the kernel occurs.

    You can check that with strace -e bpf tc filter ..., no bpf(BPF_PROG_LOAD, ...) is called.

    It fails because of the function call, and because what looks like a bug in tc.

    Tc (and iproute2) got support for eBPF to eBPF function calls in commit b5cb33aec65c ("bpf: implement bpf to bpf calls support"). The commit log mentions that, in order to add support,

    First step is processing of map related relocation entries
    for .text section
    

    This translates into:

    @@ -2120,10 +2192,18 @@ static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
     static int bpf_fetch_prog_sec(struct bpf_elf_ctx *ctx, const char *section)
     {
        bool lderr = false, sseen = false;
    +   struct bpf_elf_prog prog;
        int ret = -1;
     
    -   if (bpf_has_map_data(ctx))
    -       ret = bpf_fetch_prog_relo(ctx, section, &lderr, &sseen);
    +   if (bpf_has_call_data(ctx)) {
    +       ret = bpf_fetch_prog_relo(ctx, ".text", &lderr, NULL,
    +                     &ctx->prog_text);
    +       if (ret < 0)
    +           return ret;
    +   }
    +
    +   if (bpf_has_map_data(ctx) || bpf_has_call_data(ctx))
    +       ret = bpf_fetch_prog_relo(ctx, section, &lderr, &sseen, &prog);
        if (ret < 0 && !lderr)
            ret = bpf_fetch_prog(ctx, section, &sseen);
        if (ret < 0 && !sseen)
    

    In other words, the bpf_fetch_prog_relo() function is called to performed a relocation operation on the .text section, where your function foo is located. This is done to insert the file descriptor for the map the function (foo) uses, if any. If any? Well, no, in fact it seems to be called all the time, even if the .text function uses no map and needs no relocation. But then if finding relocation information fails (which is the case if there is no relocation to do, because the dedicated section will be missing), then we exit with an error which is propagated upwards. Backtrace:

    bpf_fetch_prog_relo()
    bpf_fetch_prog_sec()
    bpf_obj_open()
    bpf_do_load()
    bpf_load_common()
    bpf_parse_and_load_common()
    bpf_parse_opt()
    parse_action()
    bpf_parse_opt()
    tc_filter_modify()
    do_filter()
    do_cmd()
    main()
    

    This eventually makes the tc command fail, with functions like parse_action() and a few others printing the error messages that you see. Again, nothing to do with the kernel verifier.

    How to fix it? If I am correct and this is a bug in iproute2, this should be fixed upstream, I'll see with the author what he thinks of it. You could either patch iproute2 or find a way to use a map in the function you call. I made your program load successfully by using:

    static int foo(int value)
    {
        int key = 0;
        int *beep;
    
        beep = (int*)bpf_map_lookup_elem(&my_map, &key);
        if (!beep)
            return 0;
        return value == *beep;
    }
    

    (Thanks for the stand-alone reproducer by the way, much appreciated.) This example is not super useful, but it seems to confirm that the relocation is mandatory for .text.