Search code examples
javamultithreadingfile-iorandom-access

Java - Reading A Binary File In Parallel


I have a binary file that contains blocks of information (I'll refer to them as packets henceforth). Each packet consists of a fixed-length header and a variable length body. I've to determine the lenth of the body from the packet header itself. My task is to read these packets from the file and perform some operation on them. Currently I'm performing this task as follows:

  • Opening the file as a random access file and going to a specific start position (a user-specified start position). Reading the 1st packet from this position. Performing the specific operation
  • Then in a loop
    • reading the next packet
    • performing my operation This goes on till I hit the end of file marker.

As you can guess, when the file size is huge, reading each packet serially and processing it is a time-consuming affair. I want to somehow parallelize this operation i.e. packet generation operation and put it in some blocking queue and then parallely retrieve each packet from the queue and perform my operation.

Can someone suggest how may I generate these packets in parallel?


Solution

  • You should only have one thread read in the file sequentially since I'm assuming the file lies in a single drive. Reading the file is limited by your IO speed so there's no point in parallelizing that in the CPU. In fact, reading non-sequentially will actually significantly decrease your performance since regular hard drives are designed for sequential IO. For each packet it reads in, it should put that object into a thread-safe queue.

    Now you can start parallelizing the processing of the packets. Create multiple threads and have them each read in packets from the queue. Each thread should do their processing and put it into some "finished" queue.

    Once the IO thread has finished reading in the file, a flag should be set so that the working threads stop once the queue is empty.