Search code examples
pythonmachine-learningscapypacket-capture

Analyzing packet captures: What is the right approach


My goal is to build a packet capture analyzer:

input: A pcap file (or any capture file). The file could have hundreds/thousands of packets.

output: Bunch of information about the traffic streams

- How many TCP streams?
- How many UDP streams?
- For a given client (source IP):
     o How many TCP connections were opened
     o How many concurrent TCP connections were opened.
     o What was the longest and shortest session
     o What is the re-transmission ratio for a given stream?
- Given a protocol (say HTTP) identify how many streams had this protocol.
- etc

One obvious way to solve this problem is:

  1. Read the capture & Store the captured data in a data-strucutre of choice.
  2. for each of the above queries, write specialized functions to gather the data (i.e parse all streams, capture the stats)
  3. Dump the output.

I was planning on using scapy (a python library) for this.

Before I jump into the implementation, I am curious to learn about other possible approaches to the problem:

  1. Are there any other frame-works/libraries that make the job easier?

  2. Is there a completely different approach that can leverage on AI/ML based frameworks. [I have no prior experience with AI/ML]

  3. What is the best way to implement a framework where I can ask a few questions about the data-set and implement functions to respond to the questions?

[ I am very proficient in Python and C, however open to other possible options ]

Update: 10/Nov : I found this: https://github.com/vichargrave/espcap to be a very useful start for what I want to do..


Solution

  • I recommend you to use Pyshark. this is wrapper for tshark. it also support all of tshark filter, decoder lib, ... and easy to use! This is a great package for parsing .pcap file and also livecapturing

    https://pypi.python.org/pypi/pyshark

     import pyshark
    cap = pyshark.FileCapture('/root/log.cap')
    cap
    >>> <FileCapture /root/log.cap>
    print cap[0]
    Packet (Length: 698)
    Layer ETH:
            Destination: BLANKED
            Source: BLANKED
            Type: IP (0x0800)
    Layer IP:
            Version: 4
            Header Length: 20 bytes
            Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
            Total Length: 684s
            Identification: 0x254f (9551)
            Flags: 0x00
            Fragment offset: 0
            Time to live: 1
            Protocol: UDP (17)
            Header checksum: 0xe148 [correct]
            Source: BLANKED
            Destination: BLANKED
      ...
    dir(cap[0])
    ['__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_packet_string', 'bssgp', 'captured_length', 'eth', 'frame_info', 'gprs-ns', 'highest_layer', 'interface_captured', 'ip', 'layers', 'length', 'number', 'pretty_print', 'sniff_time', 'sniff_timestamp', 'transport_layer', 'udp']
    cap[0].layers
    [<ETH Layer>, <IP Layer>, <UDP Layer>, <GPRS-NS Layer>, <BSSGP Layer>]