Search code examples
javajava-8java.util.concurrentconcurrenthashmap

Why does this code get stuck in infinite loop


public void bad() {

        final ConcurrentMap<String, Integer> chm = new ConcurrentHashMap<>();

        final String key = "1";

        chm.computeIfAbsent(key, __ -> {
            chm.remove(key);
            return 1;
        });

    }

Of course i understand this looks very silly. It is just a super simplified version of some problematic code i was dealing with. I understand it makes no sense to do this but i am trying to understand the behaviour it causes.

When running this code you get stuck in an infinite loop on line 1107 after invoking http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#l1096

I am finding it very difficult to understand exactly what is happening which is causing this. Same behaviour when done on a seperate thread but waiting

public void bad2() {

        final ConcurrentMap<String, Integer> chm = new ConcurrentHashMap<>();

        final String key = "1";

        Thread worker = new Thread(() -> chm.remove("1"));

        chm.computeIfAbsent(key, __ -> {
            worker.start();
            try {
                worker.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            return 1;
        });
    }

Why in both cases is it not that chm.remove(key) completes normally returning null and then the value of 1 is set?

Interestingly this was addressed at some point and the first example throws java.lang.IllegalStateException: Recursive update when i ran with java 17.


Solution

  • This is called a 'contract error'. This happens when javadoc explicitly tells you NOT to do thing X, and leaves it at that; it does not specify what happens if you do X, just that you shouldn't do that. In this case, X is 'update the map in your passed-in function', and the javadoc explicitly spells out: DO NOT.

    so the computation should be short and simple, and must not attempt to update any other mappings of this map.

    When you perform such a contract error, anything can happen. The spec is clear: Don't. So, if you do, the spec essentially claims the ability to do anything. Hard-crash, whistle a dixie tune from the speakers, you name it.

    Hence why the behaviour changed (ordinarily, java does not change its behaviours without quite a big ordeal about breaking compatibility, but here, the spec behaviour has not changed, because the spec merely says 'do not do this, this thing does not perform according to spec if you fail to heed this warning' and 'loop endlessly' and 'throw an exception' are both just 'doing unspecified broken stuff' in this regard.

    Okay, but why does it endlessly loop?

    Because concurrent hashmap is 'smart' and uses a retry/CAS update model. Instead of acquiring a bunch of stuff, it just tries the operation without doing that, but will then check during/afterwards if it actually succeeded, or if, due to other threads modifying the same map at the same time in the same general area, its write got overwritten or otherwise didn't apply, in which case it'll do it again. In this case, removing the key is essentially 'eliminating a marker', which makes CHM think it updated a thing concurrently with another thing and therefore it should try again. Forever and ever.

    That's what the 'cas' in line 1656 in your linked source file (casTabAt) stands for: Compare-And-Set. This is a concurrency primitive that can be a lot faster than locks: "If the current value is X, then set it to Y. Otherwise, do not set it at all. Tell me whether you set it or not" - all that, in one atomic operation, which is speedy because CPUs tend to support it as barebones machine code. No lock acquiry required. The general principle is to check what the current value is, do some bookkeeping, then set the new value, using CAS to ensure that the 'state' you checked is still the state we're in. If not, some other thread so happened to also be updating stuff, so, start over.

    That's just one implementation. Tomorrow, it can change. You cannot rely on 'it will endlessly loop' because the spec do not guarantee it, and indeed, in JDK17, you get an exception instead.