Search code examples
javamultithreadinghashmapthread-safetyconcurrenthashmap

ConcurrentHashMap thread-safety without using putIfAbsent


I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.

Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:

    private final Map<K,V> map = new ConcurrentHashMap<>();
    return map.putIfAbsent(k, new Object());

let's assume we don't want to use the putIfAbsent method, the above code should look something like this:

    private final Map<K,V> map = new ConcurrentHashMap<>();
    synchronized (map) {
        V value = map.get(key); //Edit adding the value fetch inside synchronized block 
        if (!nonNull(value)) {
            map.put(key, new Object());
        }
    }
    return map.get(key)

Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?


Solution

  • Is the problem with this approach the fact that the whole map is locked

    There are two problems with this approach.

    It's not intrinsic

    The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.

    So, if, during that second snippet (with synchronized), some other thread does this:

    map.putIfAbsent(key, new Object());
    

    Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.

    Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?

    Had the other thread done this:

    synchronized (map) {
      map.putIfAbsent(key, new Object());
    }
    

    then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.

    Which leads us to;

    This is pointless

    There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.

    If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.

    The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.

    One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.

    Atomicity

    The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.

    The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.

    In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.

    Conclusions

    Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).

    If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.