Search code examples
javamultithreadingmemory-mapped-filesmappedbytebuffer

java Memory mapped Files multithreading read / write


I have 2 threads that concurrently access the same large file(.txt).

1st Thread is reading from the File. 2nd Thread is writing to the File.

Both threads access the same block e.g. (start:0, blocksize:10), but with different channel & Buffer instances

Reader:

{
     int BLOCK_SIZE = 10;
     byte[] bytesArr = new byte[BLOCK_SIZE];
     File file = new File("/db.txt");
     RandomAccessFile randomFile = new RandomAccessFile(file, "r");
     FileChannel channel = randomFile.getChannel();
     MappedByteBuffer map = channel.map(FileChannel.MapMode.READ_ONLY, 0, BLOCK_SIZE);
     map.get(bytesArr , 0, BLOCK_SIZE);
     channel.close();
}

Writer:

{
     int BLOCK_SIZE = 10;
     File file = new File("/db.txt");
     RandomAccessFile randomFile = new RandomAccessFile(file, "rw");
     FileChannel channel = randomFile.getChannel();
     MappedByteBuffer map = channel.map(FileChannel.MapMode.READ_WRITE, 0, BLOCK_SIZE);
     map.put(bytesToWrite);
     channel.close();
}

I know that if both starts at the same time, I will get Overlapping Exceptions! BUT what I would like to know, at which point exactly the Overlapping is happing? I mean when occurs the "lock" exactly? Example: lets say the writer get access first, then if reader try to access, at which point is it possible?:

 FileChannel channel = randomFile.getChannel();
 // 1- can reader access here?
 MappedByteBuffer map = channel.map(FileChannel.MapMode.READ_WRITE, 0, BLOCK_SIZE);
 // 2- can reader access here?
 map.put(bytesToWrite);
 // 3- can reader access here?
 channel.close();
 // 4- can reader access here?

1, 2, 3 or 4?

No 4 is sure, because the channel is been closed! What about the other points?

Thanks!


Solution

  • I am summing up a few notes from a chat conversation with the OP. The OP had the mental model (like most of us) that once a thread writes to a data structure, that data structure is immediately visible to all other threads. In the OPs tests using memory mapped files, he had confirmed that this appeared to be true on a single socket Intel CPU.

    Unfortunately this is not true, and is an area where Java can and does show the underlying behaviour of the hardware. Java has been designed to assume that code is single threaded, and can thus be optimised as such until such times as it is told otherwise. What that means will differ by hardware, and version of hotspot (and the statistics that hotspot has collected). This complexity, and running on a single socket Intel CPU invalidated the OPs test.

    For further information, the following links will help gain a deeper understanding into the 'Java Memory Model'. And particularly that synchronized does not just mean 'mutual exclusion'; in hardware terms it is also about 'data visibility' and 'instruction ordering'. Two topics that single threaded code take for granted.

    Do not worry if this takes time to sink in, and that you feel overwhelmed at first. We all felt like that at first. Java does an amazing job of hiding this complexity, if and only if you follow this one simple rule. When a thread reads or modifies a shared data structure, it must be within a synchronized block. That is, both the writing thread and the reading thread. Obviously I am simplifying, but follow that rule and the program will always work. Break it only if you have a very deep understanding of the Java Memory Model, memory barriers and how it relates to different hardware (and even then concurrency experts even avoid breaking that rule too if they can; going single threaded is often much much simpler and can be surprisingly fast.. many low latency systems are designed to be mostly single threaded for this reason).


    To directly answer the OPs question. The sample code from the question has no locks in it. No memory barriers, no concurrency controls at all. Thus the behaviour of how the reads and writes will interact is undefined. They may work, they may not. They may work most of the time. Intel has the strongest memory guarantees of all CPUs, and running the test cases on a single socket Intel CPU would miss a lot of complex bugs. Sun was caught out by this too before Java 5 and JSR 133 came out (read the article on why Double Checked Locking was broken in Java for more detail).