Search code examples
javaalgorithmsearchpacket-capturejpcap

Efficient Searching Algorithm for Capture File


I am currently developing a tool in java that will help track and interpret data being sent across an ethernet connection. I have already successfully developed both the packet sniffer and the packet data interpreter.

I run into a problem when trying to navigate to specific packets within the trace file. Each packet has an associated time stamp, and I would like to be able to navigate to a specific time window. My current method to do this is below.

public ArrayList<Packet> getTimeWindow(double time, int window) {
    ArrayList<Packet> packets = new ArrayList<Packet>();
    double start = time - window;
    double end = time + window;

    JpcapCaptor captor = null;
    try {
        captor = JpcapCaptor.openFile(this.traceFile); 
    } catch (IOException e) {e.printStackTrace();}

    Packet p = captor.getPacket();
    while(packet != null) {
        if(f.timestamp > end) return packets;
        if(p.timestamp >= start) packets.add(p);    
        packet=captor.getPacket();
    }
    return packets;
}

This works fine for small traces, but can get pretty slow when we're dealing with millions of packets. I would like to implement some form of binary search algorithm, but I can't figure out a way to navigate to the middle of the packets without preprocessing them. The packets are not neatly organized by line, and even if I jump to a random point in the file, I can't guarantee I'm at the start of a packet.

In summary: I am looking to develop an efficient way to search for a specific packet in a capture (.pcap or .cap) file. I've scoured the net, and I haven't been able to find anything that can do quite what I'm asking.

If anyone has any ideas / solutions you could suggest, it would be greatly appreciated.

Thanks!


Solution

  • An easy, small solution is to build a simple index for the files in question. For instance, you can record the offset in the file of the start of every 1000th packet. Store this information (just a sequence of 64-bit indexes into the original trace file) in a small index file. Then when you're doing binary search, you can use this index, together with the original file, to find (within 1000 packets) the correct point to start reading.

    Of course, this requires preprocessing (or processing while generating) the trace files.