Search code examples
javamultithreadingarraylistvolatile

Is externally synchronized ArrayList thread safe if its fields are not volatile?


Let's assume this:

  1. There is single ArrayList
  2. List is accessed by multiple threads. Thread can add element and iterate over all elements.
  3. All access is externally synchronized. So it's not possible that two threads access list at the same time.

Looking at ArrayList source code we can see that size and elementData fields aren't volatile:

    transient Object[] elementData; // non-private to simplify nested class access

    private int size;

Also, let's look at add method:

    /**
     * This helper method split out from add(E) to keep method
     * bytecode size under 35 (the -XX:MaxInlineSize default value),
     * which helps when add(E) is called in a C1-compiled loop.
     */
    private void add(E e, Object[] elementData, int s) {
        if (s == elementData.length)
            elementData = grow();
        elementData[s] = e;
        size = s + 1;
    }

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        modCount++;
        add(e, elementData, size);
        return true;
    }

Does something like that can happen?

  1. Let's assume that list has 4 elements.
  2. Thread A adds new element. Size is updated to 5.
  3. Thread B adds new element. Size is cached and thread B sees its old value (4). So instead of adding new element, last element is overwritten.

Could similar situation happen with elementData?


Solution

  • TL;DR: the problems you describe are impossible, with correct synchronization, because synchronization ensures atomicity and visibility of the operations.


    The way a JVM executes Java code is rather complicated. It is free to reorder the instructions corresponding to the expressions and statements in the Java code, in order to execute them more efficiently, provided that you can't tell that a thread has reordered its operations.

    In essence, it's like a boss who says "I don't care how you get the work done, just have [the work] done by [some time]".

    The difficulty of this is that whilst it says that you mustn't be able to see the reordering within a thread, it doesn't say that different threads can't see each other doing things in a different order.

    This is head-spinning stuff. The simplifying concept is the idea of happens-before. There are certain things you can do in the two threads to ensure that things done by one thread appear to have happened already by the time the other thread tries to use the result of them. Literally, things in one thread have "happened before" things in the other. (Continuing with the work analogy, this is like having to hand over your completed work to a colleague for them to be able to do theirs: they can take what you've completed and do their work, irrespective of how you completed it).

    There are a number of well-known things which create happens-before relationships. The relevant ones in terms of this question are:

    • A write of a volatile variable happens before a read of the same variable. This is often described in terms of "the data is always written to and read from main memory, instead of being cached by the thread".
    • Exiting a synchronized block happens before entering a synchronized block with the same monitor. In other words, writes which happen inside a synchronized block by one thread are visible to other threads executing code synchronized in the same thing

    So, volatile and synchronized are both ways of creating the happens-before that is required to guarantee that [something] done by one thread is seen by the other.

    But there is a difference between the two:

    • Volatile gives you visibility: it ensures that a write is visible.
    • Synchronization gives you visibility and atomicity: it ensures that the write is visible, but it additionally ensures that nobody else is doing something at the same time, for as long as a particular monitor is held.

    In the case of adding to an ArrayList, atomicity is required, because you are doing more than one thing: increasing the size and assigning the new array element.

    Making the variables volatile as well would serve no purpose in terms of the correctness, but it would make the code slower in the modal case, which is where ArrayList is only ever accessed from a single thread.

    So, provided your code is correctly synchronized - that is, all accesses to the list are synchronized on the same thing, for example on the list itself - the situation you describe cannot happen, because of the atomicity and visibility properties of synchronization.