Search code examples
pythonpcaptshark

Reading Large PCAP with Tshark in Smaller Parts


I have a PCAP file that I want to read with the tshark command, but it is too large to fit into memory (9GB but reading each packet fills up 35GB of Google Colab after about 30 million packets).

Therefore, I would like to split it into four parts that I can read and process separately. I have tried to split it by filtering on a frame's time using the line below. However, this continues scanning over all packets, so it takes too long.

!tshark -Y "(tcp or udp) && (frame.time <= \"2019-10-21 05:00:01\")"  -r $file_name -l -T fields -e $FIELDS

What is the best way to process a PCAP file that is too large for memory? How can I split it without losing any packets?


Solution

  • To my knowledge, tshark will read and analyze all packets before doing anything else.

    You should consider using tcpdump instead whom packet analysis is lighter.

    Something like this should be (a little) faster:

    tcpdump -r "Your_file" -w "ouput_filename" -C 2250

    Where 2250 is the size (in Mbytes) of each of the 4 new output files.