Search code examples
perlbit-manipulationlibpcap

perl bitwise AND and bitwise shifting


I was reading some example code snippet for the module Net::Pcap::Easy, and I came across this piece of code

my $l3protlen = ord substr $raw_bytes, 14, 1;
my $l3prot = $l3protlen & 0xf0 >> 2;    # the protocol part
return unless $l3prot == 4;    # return unless IPv4
my $l4prot = ord substr $packet, 23, 1;
return unless $l4prot == '7';

After doing a total hex dump of the raw packet $raw_bytes, I can see that this is an ethernet frame, and not on a TCP/UDP packet. Can someone please explain what the above code does?


Solution

  • For parsing the frame, I looked up this page.

    Now onto the Perl...

    my $l3protlen =  ord substr $raw_bytes, 14, 1;
    

    Extract the 15th byte (character) from $raw_bytes, and convert to its ordinal value (e.g. a character 'A' would be converted to an integer 65 (0x41), assuming the character set is ASCII). This is how Perl can handle binary data as if it were a string (e.g. passing it to substr) but then let you get the binary values back out and handle them as numbers. (But remember TMTOWTDI.)

    In the IPv4 frame, the first 14 bytes are the MAC header (6 bytes each for destination and source MAC address, followed by 2-byte Ethertype which was probably 0x8000 - you could have checked this). Following this, the 15th byte is the start of the Ethernet data payload: the first byte of this contains Version (upper 4 bytes) and Header Length in DWORDs (lower 4 bytes).

    Now it looks to me like there is a bug in the next line of this sample code, but it may well normally work by a fluke!

    my $l3prot    = $l3protlen & 0xf0 >> 2; # the protocol part
    

    In Perl, >> has higher precedence than &, so this will be equivalent to

    my $l3prot    = $l3protlen & (0xf0 >> 2);
    

    or if you prefer

    my $l3prot    = $l3protlen & 0x3c;
    

    So this extracts bits 2 - 5 from the $l3prot value: the mask value 0x3c is 0011 1100 in binary. So for example a value of 0x86 (in binary, 1000 0110) would become 0x04 (binary 0000 0100). In fact a 'normal' IPv4 value is 0x45, i.e. protocol type 4, header length 5 dwords. Mask that with 0x3c and you get... 4! But only by fluke: you have tested the top 2 bits of the length, not the protocol type!

    This line should surely be

    my $l3prot = ($l3protlen & 0xf0) >> 4;
    

    (note brackets for precedence and a shift of 4 bits, not 2). (I found this same mistake in the CPAN documentation so I guess it's probably quite widely spread.)

    return unless $l3prot == 4; # return unless IPv4
    

    For IPv4 we expect this value to be 4 - if it isn't, jump out of the function right away. (So the wrong code above gives the result which lets this be interpreted as an IPv4 packet, but only by luck.)

    my $l4prot = ord substr $packet, 23, 1;
    

    Now extract the 24th byte and convert to ordinal value in the same way. This is the Protocol byte from the IP header:

    return unless $l4prot == '7';
    

    We expect this to be 7 - if it isn't jump out of the function right away. (According to IANA, 7 is "Core-based trees"... but I guess you know which protocols you are interested in!)