Search code examples
javaparallel-processingjava-11openjdk-11

How data is stored/accessed and preventing race conditions in maps, java


We have a case like this.

class A{
 class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 

    public updater(){
        // updates dataMap
        // takes several milliseconds
    }

    public someAction(){
        // needs to perform read on dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

The issue is, both someAction and updater both can be called simultaneously, someAction is a more frequent method. If updater is called, it can replace a lot of values from dataMap. And we need consistency in readAction. If the method starts with old dataMap, then all reads should happen with old dataMap.

class foo{
    //Map with a lot of entries
    private HashMap<String,String> dataMap; 


    public updater(){
        var updateDataMap = clone(dataMap); // some way to clone data from map
        // updates updateDataMap instead of dataMap
        // takes several milliseconds
        this.dataMap = updateDataMap;       
    }

    public someAction(){
        var readDataMap = dataMap;
        // reads from readDataMap instead of dataMap
        // several times, in a long process
        // which takes several milliseconds
    }
}

Will this ensure consistency? I believe that the clone method will allocate a different area in memory and new references will happen from there. And are there going to be any performance impacts? And will the memory of oldDataMap be released after it has been used?

If this is the correct way, are there are any other efficient way to achieve the same?


Solution

  • I believe your approach will work, because all changes from the updater() will occur to a (deep) copy, and will not be visible by someAction() until, in a single operation, the reference is updated.

    I understand that you do not care about whether someAction() sees the latest version of the map's contents, as long as the map is consistent, that is, it is not observed while it is in the middle of being updated. In this case, there is no way for your someAction() to look at an incomplete map.

    Beware that at most 1 thread should be able to call updater() - two threads calling it at the same time would mean that only one of them gets to write an updated map. I recommend the following changes:

    // no synchronization needed at this level, but volatile is important
    private volatile HashMap<String,String> dataMap = new HashMap<>;
    
    // if two threads attempt to call this at once, one blocks until the other finishes
    public synchronized updater() {
        var writeDataMap = clone(dataMap);  // a deep copy
    
        // update writeDataMap - guaranteed no other thread updating
        // ... long operation
    
        dataMap = writeDataMap;             // switch visible map with the updated one 
    }
    
    public someAction() {
        var readDataMap = dataMap;
    
        // process readDataMap - guaranteed not to change while being read
        // ... long operation
    }
    

    The important keyword here is volatile, to ensure that other threads have access to the updated map as soon as the updater() finishes its job. The use of synchronized simply prevents multiple updater() threads from interferring with each other, and is mostly defensive.