Search code examples
coperating-systemkernelbufferpacket

How does pcap unix buffering work?


Hypothetical scenario: A udp packet stream arrives at machine X, which is running two programs - one which is listening for the packets with recv(), and another which is running pcap.

In this case, as I understand it, the packets are stored in the interface until it is polled by the kernal, which then moves them into a buffer in the kernals memory, and copies the packets into another two buffers - one buffer for the program listening with recv, and one buffer for the program listening with pcap. The packets are removed from the respective buffer when they are read - either by pcap_next() or recv(), the next time the process scheduler runs them (I assume they are blocking in this case). Is this correct? Are there really 4 buffers used, or is it handled some other way?

I'm looking for a description, as detailed as possible, as to what buffers are really involved in this case, and how packets move from one to the other (e.g. does a packet get copied to pcaps buffer before it goes to the recv buffer, after, or undefined?).

I know this seems like a big question, but all I really care about is where the packet gets stored, and how long it stays there for. Bullet points are fine. Ideally I'd like a general answer, but if it varies between OS I'm most interested in Linux.


Solution

  • Linux case (BSD's are probably somewhat similar, using mbufs instead of skbuffs):

    Linux uses skbuffs (socket buffers) to buffer network data. A skbuff has metadata about some network data, and some pointers to that data.

    Taps (pcap users) create clones of skbuffs. A clone is a new skbuff, but it points to the same data. When someone needs to modify data shared by several skbuffs (the original skbuff and its clones), it first needs to create a fresh copy (copy-on-write).

    When someone doesn't need an skbuff anymore, it kfree_skb()'s it. kfree_skb() decrements a reference count, and when that reference count reaches zero, the skbuff is freed. It's slightly more complicated to account for clones, but this is the general idea.