Search code examples
javajava-6

Concurrency at the data row level


I have a HashMap and want to synchronize each row/entry separately in order to maximize concurrency, so in this way many threads can access the HashMap at the same time but no two threads or more can access the same row/entry at the same time.

I did the following in my code but I'm not sure if it's correct or not:

/* Lock/synchronize the data to this key, (skey is a key of type String) */
synchronized (aHashMap.get(skey)) {

    /* write the data (data is Integer) */
    aHashMap.put(skey, data);

}

Solution

  • The appropriate solution depends very much on your particular problem. If all your threads can update any of the entries in the Map, then the first thing to try is ConcurrentHashMap:

    In this case, the operation you described would be replaced with:

    data = ... compute ...
    aHashMap.replace(skey, data);
    

    Using ConcurrentHashMap solves the data race but one problem remains. If another thread would update the same key at the same time, one of the computations would be lost. If you are ok with this, great. Otherwise, you can:

    do {
      oldData = aHashMap.get(skey);
      data = ... compute (maybe based on oldData) ... 
      boolean success = aHashMap.replace(skey, oldData, data);
    } while(!success);
    

    In this case, replace will only succeed if the data hasn't changed (and the replace would be atomic). If if fails, you can put everything in a do while loop to try again, maybe based on the updated value.

    Also, be careful not to have any side effects between the map get and replace. that computation should only create a brand new "data" object. If you update the "oldData" object or some other shared data you will get unexpected results.

    If you do have side effects, one approach is to have make a key-level lock like this:

    synchronized(skey) {
      data = ... compute ... 
      aHashMap.replace(skey, data);
    }
    

    Even in this case, ConcurrentHashMap is still needed. Also, this will not stop some other code from updating that key in the map. All code that updates the key would need to lock on it.

    Also, this will not be thread-safe if you update oldData in "... compute ..." and the values are not unique within the map. If you do want to update oldData there, cover it with another synchronized.

    If this does the trick and your content with the performance, look no further.

    If the threads only update values, do not change the keys, then you might try converting your pairs to objects and use something different than a Map. For example, you could split the set of objects in several sets and then feed them to your threads. Or maybe use ParallelArray. But I might be digressing here... :)