I'm studying XDP codes right now and I'm having some confusion as regards to how programs approach certain parts of a packet header. So! When I look at the code that gets the IP address of a packet, it goes like:
static inline int parse_ipv4(void *data, u64 nh_off, void *data_end) {
struct iphdr *iph = data + nh_off;
if ((void*)&iph[1] > data_end)
return 0;
return iph->protocol;
}
Now here are some things that confuse me :
struct iphdr *iph = data + nh_off;
I thought nh_off
is the offset value to the next header, so if you add data + nh_off
, shouldn't that take you to the next packet?
Because to my understanding, if you add next header offset to the data, there should be a next packet waiting to be processed!
What does
(void*)&iph[1]
do exactly? I tried to guess what this line of code does for a few days but I still have no clue at all.
I am so sorry if my questions are too absorb or vague.. This things have been bothering me for a while and I would greatly appreciate it if someone could share their knowledge with me. Thank you so much in advance.
It all depends on your code, since I don't see how nh_off
is defined in your case. But most of the time, it does point to the next header, so we would have:
nh_off
being the offset of the next header after Ethernet header has been parsed, i.e. nh_off
is the offset of the IP header in the packet (typically, it's set to 14 at this stage, the number of bytes in the Ethernet header if no VLAN/encap is used).
Setting struct iphdr *iph = data + nh_off;
declares and initialises iph
as a struct iphdr
pointer, so we can reuse it afterwards to easily reach each field from the IPv4 header. It points to data + nh_off
, i.e. the beginning of the packet plus the offset at which the IPv4 header begins in the packet.
Next packet to be processed is not accessible from within your eBPF program; you would get a new ctx
with a data
pointer pointing to it when that new packet is processed with a new call to the BPF program, but you only see just one packet at once.
So iph
points to the beginning of your IPv4 header. We can use that pointer to easily reach the individual fields (e.g. iph->protocol
to get L4 protocol). But before we do that we must ensure that the packet is long enough and actually contains those field. Otherwise we could do an out-of-bound access (therefore the verifier would reject the program in the first place). This is the check we do here: if ((void*)&iph[1] > data_end) return 0;
In that verification, (void*)&iph[1]
means: i) Consider a struct iphdr *
array (&iph
, a pointer to a pointer to a struct iphdr
). ii) Take the second cell of that array, e.g. the address of the struct pointed by the second struct iphdr *
, e.g. the address of the byte that starts right after the first struct iphdr
in the packet. And iii) cast it as a void *
so we can compare it with data_end
. In other words, this is a way to compare data_end
(the address in memory right after the last byte of the packet) and the address of the byte right after IPv4 header (so possibly first byte of L4 is packet is long enough). If (void*)&iph[1]
is bigger than data_end
, then the IPv4 header we considered is longer than the actual packet we got, and we cannot afford to dereference iph
to try to reach e.g. the protocol
field.
With a diagram, maybe:
Packet data
| Ethernet | IPv4 | IPv4 data (e.g. L4, data) |
+--------------+--------------------+------ ... ----------------------+
^ ^ ^ ^
data data + nh_off | data_end
iph |
&iph[0] &iph[1]
We would have an issue to access iph->protocol
if we had the following instead (this is why we return 0
if the comparison succeeds):
Packet data
| Ethernet | <something> | End of packet
+--------------+---------------- +
^ ^ ^ ^
data data + nh_off | |
iph | |
&iph[0] | &iph[1]
data_end