Search code examples
javajava.util.concurrentconcurrenthashmap

Consequences of updating other key(s) in ConcurrentHashMap#computeIfAbsent


Javadoc from ConcurrentHashMap#computeIfAbsent says

The computation should be short and simple, and must not attempt to update any other mappings of this map.

But, from what I see, using remove() and clear() methods inside mappingFunction works fine. For example this

Key element = elements.computeIfAbsent(key, e -> {
    if (usages.size() == maxSize) {
        elements.remove(oldest);
    }
    return loader.load(key);
});

What bad consequences of using remove() method inside mappingFunction could be?


Solution

  • The javadoc explains clearly the cause :

    Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

    You have not to forget that ConcurrentHashMap is designed to provide a way of using a thread safe Map without being as locking as it is the case for older thread safe Map classes as HashTable.
    When a modification on the map occurs, it locks only the concerned mapping and not the whole map.

    ConcurrentHashMap is a hash table supporting full concurrency of retrievals and high expected concurrency for updates.

    computeIfAbsent() is a new method added in Java 8.
    If badly used, that is, if in the body of computeIfAbsent() that already locks the mapping of the key passed to the method, you lock another key, you enter in a path where you may defeat the purpose of the ConcurrentHashMap as finally you will lock two mapping.
    Imagine the problem if you lock more mapping inside computeIfAbsent() and that the method is not short at all. Concurrency access on the map would become slow.

    So the javadoc of computeIfAbsent() stresses on this potential issue by reminding the principles of ConcurrentHashMap : keep it simple and fast.


    Here is example code illustrating the issue.
    Suppose we have an instance of ConcurrentHashMap<Integer, String>.

    We will start two threads that use it :

    • The first thread : thread1 that invokes computeIfAbsent() with the key 1
    • The second thread : thread2 that invokes computeIfAbsent() with the key 2

    thread1 executes a fast enough task but it doesn't follow advise of the computeIfAbsent() javadoc : it updates the key 2 in computeIfAbsent(), that is another mapping of which one used in the current context of the method (that is key 1).

    thread2 executes a long enough task. It invokes computeIfAbsent() with the key 2 by following advises of the javadoc : it doesn't update any other mapping in the implementation of it.
    To simulate the long task, we can use the Thread.sleep() method with 5000 as parameter.

    For this specific situation, if thread2 starts before thread1, the invocation ofmap.put(2, someValue); in thread1 will be blocked while thread2 is not returned of computeIfAbsent() that locks the mapping of the key 2.

    Finally, we get a ConcurrentHashMap instance that blocks the mapping of the key 2 during 5 seconds while computeIfAbsent() is invoked with the mapping of the key 1.
    It is misleading, not effective and it goes against the ConcurrentHashMap intention and the computeIfAbsent() description which the intention is computing the value for the current key :

    if the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null

    The sample code :

    import java.util.concurrent.ConcurrentHashMap;
    
    public class BlockingCallOfComputeIfAbsentWithConcurrentHashMap {
    
      public static void main(String[] args) throws InterruptedException {
        ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<>();
    
        Thread thread1 = new Thread() {
            @Override
            public void run() {
                map.computeIfAbsent(1, e -> {
                    String valueForKey2 = map.get(2);
                    System.out.println("thread1 : get() returns with value for key 2 = " + valueForKey2);
                    String oldValueForKey2 = map.put(2, "newValue");
                    System.out.println("thread1 : after put() returns, previous value for key 2 = " + oldValueForKey2);
                    return map.get(2);
                });
            }
        };
    
        Thread thread2 = new Thread() {
            @Override
            public void run() {
              map.computeIfAbsent(2, e -> {
                try {
                  Thread.sleep(5000);
                } catch (Exception e1) {
                  e1.printStackTrace();
                }
                String value = "valueSetByThread2";
                System.out.println("thread2 : computeIfAbsent() returns with value for key 2 = " + value);
                return value;
              });
            }
        };
    
        thread2.start();
        Thread.sleep(1000);
        thread1.start();
      }
    }
    

    As output we always get :

    thread1 : get() returns with value for key 2 = null

    thread2 : computeIfAbsent() returns with value for key 2 = valueSetByThread2

    thread1 : after put() returns, previous value for key 2 = valueSetByThread2

    This is written fast as reads on the ConcurrentHashMap are not blocking :

    thread1 : get() returns with value for key 2 = null

    but this :

    thread1 : after put() returns, previous value for key 2 = valueSetByThread2

    is output only when thread2 is returned of computeIfAbsent().