Search code examples
bpfebpf

XDP program ipheader, data, nh_off confusion


I'm studying XDP codes right now and I'm having some confusion as regards to how programs approach certain parts of a packet header. So! When I look at the code that gets the IP address of a packet, it goes like:

static inline int parse_ipv4(void *data, u64 nh_off, void *data_end) {
struct iphdr *iph = data + nh_off;

if ((void*)&iph[1] > data_end)
    return 0;
    return iph->protocol;
}

Now here are some things that confuse me :

struct iphdr *iph = data + nh_off;
  1. I thought nh_off is the offset value to the next header, so if you add data + nh_off, shouldn't that take you to the next packet? Because to my understanding, if you add next header offset to the data, there should be a next packet waiting to be processed!

  2. What does

    (void*)&iph[1]

    do exactly? I tried to guess what this line of code does for a few days but I still have no clue at all.

I am so sorry if my questions are too absorb or vague.. This things have been bothering me for a while and I would greatly appreciate it if someone could share their knowledge with me. Thank you so much in advance.


Solution

  • It all depends on your code, since I don't see how nh_off is defined in your case. But most of the time, it does point to the next header, so we would have:

    1. nh_off being the offset of the next header after Ethernet header has been parsed, i.e. nh_off is the offset of the IP header in the packet (typically, it's set to 14 at this stage, the number of bytes in the Ethernet header if no VLAN/encap is used).

      Setting struct iphdr *iph = data + nh_off; declares and initialises iph as a struct iphdr pointer, so we can reuse it afterwards to easily reach each field from the IPv4 header. It points to data + nh_off, i.e. the beginning of the packet plus the offset at which the IPv4 header begins in the packet.

      Next packet to be processed is not accessible from within your eBPF program; you would get a new ctx with a data pointer pointing to it when that new packet is processed with a new call to the BPF program, but you only see just one packet at once.

    2. So iph points to the beginning of your IPv4 header. We can use that pointer to easily reach the individual fields (e.g. iph->protocol to get L4 protocol). But before we do that we must ensure that the packet is long enough and actually contains those field. Otherwise we could do an out-of-bound access (therefore the verifier would reject the program in the first place). This is the check we do here: if ((void*)&iph[1] > data_end) return 0;

      In that verification, (void*)&iph[1] means: i) Consider a struct iphdr * array (&iph, a pointer to a pointer to a struct iphdr). ii) Take the second cell of that array, e.g. the address of the struct pointed by the second struct iphdr *, e.g. the address of the byte that starts right after the first struct iphdr in the packet. And iii) cast it as a void * so we can compare it with data_end. In other words, this is a way to compare data_end (the address in memory right after the last byte of the packet) and the address of the byte right after IPv4 header (so possibly first byte of L4 is packet is long enough). If (void*)&iph[1] is bigger than data_end, then the IPv4 header we considered is longer than the actual packet we got, and we cannot afford to dereference iph to try to reach e.g. the protocol field.

    With a diagram, maybe:

    Packet data
    
    | Ethernet     | IPv4               | IPv4 data (e.g. L4, data)       |
    +--------------+--------------------+------ ... ----------------------+
    ^              ^                    ^                                 ^
    data           data + nh_off        |                                 data_end
                   iph                  |
                   &iph[0]              &iph[1]
    

    We would have an issue to access iph->protocol if we had the following instead (this is why we return 0 if the comparison succeeds):

    Packet data
    
    | Ethernet     | <something>   | End of packet
    +--------------+----------------    +
    ^              ^               ^    ^
    data           data + nh_off   |    |
                   iph             |    |
                   &iph[0]         |    &iph[1]
                                   data_end