Search code examples
udpwiresharkpcap

Detecting UDP vs. non-UDP packets in PCAP


I'm parsing a Wireshark PCAP file in C++, not using any external libraries. The Wireshark app shows that most packets in that file are UDP but some are ARP, STP, or other.

I'm only interested in the UDP. Where in the packet (or packet header) does it denote the protocol?

When it is UDP, byte 23 in the packet is set to 17, however in non-UDP, byte 23 doesn't have the same meaning - thus one can't rely on that byte.

Didn't find any docs on that, including PCAP format docs.


Solution

  • When it is UDP, byte 23 in the packet is set to 17, however in non-UDP, byte 23 doesn't have the same meaning

    It's more complicated than that.

    A pcap file has, in the file header, a field that indicates the type of link-layer header that the packets in the file have. The list of link-layer header types shows what different values in that field mean.

    If the field has the value 1, then the packet begins with an Ethernet header, which begins with a 6-byte Ethernet destination address, followed by a 6-byte Ethernet source address, followed by a 2-byte type/length field.

    If the type/length field has the value 0x0800, then the packet is an IPv4 packet. IPv4 is described by RFC 791; the header of an IPv4 packet has a 1- byte protocol field, which contains an Internet Protocol Number value. A value of 6 means TCP; a value of 17 means UDP.

    In a packet with an Ethernet header that has a type/length field value of 0x0800, the first 14 bytes are an Ethernet header, and the next 20 bytes are the fixed portion of an IPv4 header. The Protocol field in the IPv4 header is an an offset of 9 from the beginning of the IPv4 header, so, in an IPv4-over-Ethernet packet, the Protocol field is an an offset of 14+9 = 23 bytes from the beginning of the packet.

    If the packet does not begin with an Ethernet header, then, unless the link-layer header is also 14 bytes long and is immediately followed by the payload, byte 23 is not the IPv4 Protocol field.

    Furthermore, if the packet is not an IPv4 packet, even if the packet does begin with an Ethernet header, byte 23 is not the IPv4 Protocol field.

    UDP - and TCP, and so on - can also run on top of IPv6. IPv6 is described by RFC 8200. The IPv6 header has a Next Header field, which contains an Internet Protocol Number field. However, there's no guarantee that this will be a value for a protocol such as TCP or UDP; it might be a value for an "extension header". The "extension header" will contain its own Next Header field, which could be for a protocol such as TCP or UDP or for another "extension header". Eventually, there will be a last "extension header", the Next Header field of which will contain a value specifying a protocol such as TCP or UDP.

    If the packet has an Ethernet header, and the type/length field has the value 0x86dd, then an IPv6 header, not an IPv4 header, comes after the Ethernet header.

    If your network is an Ethernet network with VLANs, then the type/length field may have the value 0x8100, in which case the Ethernet header will be followed by a 2-byte VLAN tag, which is followed by another 2-byte type/length field.

    The tcpdump source file with the Ethernet dissector has ~600 lines of source code; the tcpdump source files with the IPv4 dissector and with the IPv6 dissector both have ~500 lines of source code. Wireshark's dissection is even more complicated than that. Writing code to properly analyze raw network packets from a capture file - whether pcap, pcapng, Sniffer, snoop, Microsoft Network Monitor, or another packet analyzer - isn't a simple process.