Search code examples
javaconcurrentmodification

Producing ConcurrentModificationExecption with Java HashSet


I am just testing a few things out to improve my understanding and I wrote the following code to produce a ConcurrentModificationExecption

public static void main(String[] args) throws InterruptedException {
    Set<String> set = new HashSet<>();

    Runnable updateList = () -> {
        while (true) {
            set.add("Hello");
        }
    };
    ExecutorService executorService1 = Executors.newSingleThreadExecutor();
    executorService1.execute(updateList);

    Runnable printList = () -> {
        while (true) {
            if (set.contains("Hello")) {
                System.out.println(set);
                set.add("Bye");
            }
        }
    };
    ExecutorService executorService2 = Executors.newSingleThreadExecutor();
    executorService2.execute(printList);

    Thread.sleep(1000);
    executorService1.shutdown();
    executorService2.shutdown();
}

However its not producing the exception like I thought it would. When I replace the HashSet with and ArrayList, I egt the exception. Any idea why?


Solution

  • ConcurrentModificationException has nothing to do with threads at all.

    In fact, you are not guaranteed to get CoModEx in (thread) concurrent situations, period 1. The 'concurrent' in ConcurrentModificationException is not referring to 'concurrency' in the sense of 'multiple threads'!

    This code will produce a CoModEx, because this snippet shows the 'concurrent' that CoModEx is referring to:

    Set<String> s = new HashSet<String>();
    s.add("Hello"); s.add("World!"); s.add("Foobar");
    
    for (String elem : s) {
      s.remove(elem);
    }
    

    Notably, this code is guaranteed to throw CoModEx: It will do so, every time, on all JVMs, on all OSes, on all architectures, necessarily so. If the above code does not produce a CoModEx, your JVM is broken.

    Whereas your snippet - CoModEx is merely one of a million things that could happen.

    CoModEx refers specifically to this sequence of events:

    1. You create an iterator by invoking .iterator() on some Collection. Note that for (String x : collection) is syntax sugar and invokes .iterator().
    2. You modify the underlying collection (the one you invoked .iterator() on), and not via that iterator, i.e. not via invoking .remove() on the iterator. For example, you invoke collection.remove(), or collection.clear(), or collection.add().
    3. You touch the iterator; you invoke .hasNext() or .next() (which, again, for (String x : collection) does inherently).

    That sequence of events leads to a CoModEx.

    So what happens in a thread-concurrency clash?

    Everything. Nothing. Magic. Who knows??

    Anytime 1 thread writes to a field and another thread reads from it, the reading thread can legally observe either value - it can observe the value before the update, or afterwards - it is up to the JVM to decide. Unless the writing code has an established Happens-Before relationship vs. the reading code, and the java lang spec enumerates precisely which acts cause a Happens-Before relationship (it generally involves volatile or synchronized or other thread concurrency primitives; games that HashSet does not play!) - And it is free to make a different decision every time. That's because the JVM is written to target multiple architectures, and is targeted to run faster than 'horribly slowly', therefore, its behaviour in such tricky scenarios needs to be 'whatever is fastest on local hardware' and "fastest" therefore depends on what the hardware is.

    In practice, when you call e.g. someSet.add(foo), no doubt some fields end up being set somewhere during the execution of HashSet's add method. Exactly which ones are an implementation detail that intentionally is not specced and intentionally is not defined behaviour as per the java spec. So, some field(s) are touched by this but you cannot rely on any particular definition. Nevertheless, as per spec, any fields modified will 'convey' their changes, or not, on the whim of the JVM you are running this on, to other threads. So what actually happens?

    Everything. Nothing. Magic. Who knows??

    CoModEx might happen. Or the collection answers with lies (x.contains(y) returns false, but x.get(y) returns a non-null value. Or Vice Versa. Or x.get(y) hangs forever. Or it throws a StackOverflowError. Or a ConcurrentModificationException. Or perhaps depending on the phase of the moon, the JVM feels particularly favourable to the EU and Ode to Joy blasts from your speakers.

    Causing concurrent field access without explicit protective measures, i.e. establishing Happens-Before relationships as per the Java Memory Model part of the Java Virtual Machine specification, is broken code that cannot be tested because its behaviour is unspecified.

    The only way to win that game is not to play. Do not write code like that.

    Specifically, do not write code that multi-core writes/reads to the same hash set and expect ConcurrentModificationException to result. Don't write code that expects the opposite either: You're into unspecced behaviour - don't do it at all.

    So what does CoModEx mean?

    That you interact with an iterator that was created before the last modification to a given collection. That is all, and that can only be guaranteed behaviour if you are all doing that within one thread; do it from multiple threads and the JMM rules kick in which combine with 'HashSet and co are intentionally specced without including in excruciating detail how they work exactly to allow JVM impls to provide something that runs optimally on local hardware' and you end up at: "The behaviour of your java app is arbitrary and unspecified".


    [1] Some subtypes, such as CopyOnWriteList, have docs that explicitly spec what happens when you do that, and for such implementations, you can rely on the classes doing what their specs say they must. However, Set itself has no such specs and unless a type explicitly adds such a spec, the behaviour is not guaranteed once you interact with it from multiple threads. In particular, HashSet makes no guarantees about either how it works or what it does when you access it from multiple threads, therefore, unless you want to involve checking the exact source code for the exact JVM release and version you are running it on, anything could happen. Maybe your computer asplodes. That'd be bad, but the JVM would not be 'breaking spec'.