Search code examples
javamulticorebufferedreaderinputstreamreader

BufferedReader in a multi-core environment


I have 8 files. Each one of them is about 1.7 GB. I'm reading those files into a byte array and that operation is fast enough.

Each file is then read as follow:

BufferedReader br=new BufferedReader(new InputStreamReader(new ByteArrayInputStream(data))); 

When processed using a single core in a sequential sense it takes abour 60 seconds to complete. However, when distributing the computation over 8 separate cores it takes far longer than 60 seconds per file.

Since the data are all in memory and no IO operations is performed, I would have presumed that it should take no longer than 60 seconds to process a single file per core. So, the total 8 files should complete in just over 60 seconds but this is not the case.

Am I missing something about BufferedReader behaviour? or any of the readers used in the above code.

It might worth mentioning that I'm using this code to upload files first:

byte[] content=org.apache.commons.io.FileUtils.readFileToByteArray(new File(filePath));

The code over all looks like this:

For each file
 read the file into a byte[]
 add the byte[] to a list
end For
For each item in the list
 create a thread and pass a byte[] to it
end For

Solution

  • How are you actually "distributing the computation"? Is there synchronization involved? Are you simply creating 8 threads to read the 8 files?

    What platform are you running on (linux, windows, etc.)? I have seen seemingly strange behavior from the windows scheduler before where it moves a single process from core to core to try and balance the load among the cores. This ended up causing slower performance than just allowing a single core to be utilized more than the rest.