Search code examples
javaiteratorclonecopyonwritearraylistjava-failsafe

Why CopyOnWriteArrayList needs copies for both write and read operations?


Coming from this article, it says:

When we are using any of the modify methods – such as add() or remove() – the whole content of the CopyOnWriteArrayList is copied into the new internal copy.

Due to this simple fact, we can iterate over the list in a safe way, even when concurrent modification is happening.

When we're calling the iterator() method on the CopyOnWriteArrayList, we get back an Iterator backed up by the immutable snapshot of the content of the CopyOnWriteArrayList.

Its content is an exact copy of data that is inside an ArrayList from the time when the Iterator was created. Even if in the meantime some other thread adds or removes an element from the list, that modification is making a fresh copy of the data that will be used in any further data lookup from that list.

Simple question to ask myself next is why both? Basically, from my understanding write operations are made on a new copy, while read operation are done on a clone of a collection.

If for example writes are done on a new copy, that means I can iterate over "original" collection - meaning it won't be affected. So why add overhead of storing elements inside another copy (snapshot)? Or the opposite direction, if I store elements inside copy (snapshot), why writes need to be done on a copy, when I am literally iterating over clone and not "original" collection (meaning snapshot will never change)?

I hope the question is legal to ask, as I literally checked every possible source on the internet and not a single article helped me clear this confusion. What am I missing here?


Solution

  • CopyOnWriteArrayList does not create a copy of the array when you call iterator, as the docs says:

    The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created.

    Note the word "reference".

    This sentence is worded rather poorly:

    Its content is an exact copy of data that is inside an ArrayList from the time when the Iterator was created.

    It doesn't mean that a copy of the array is created when you call iterator(). It should have said:

    Its content is the same as the data that is inside an ArrayList from the time when the Iterator was created.

    The more important point of that paragraph is:

    Even if in the meantime some other thread adds or removes an element from the list, that modification is making a fresh copy of the data that will be used in any further data lookup from that list.

    What this means is that if you create an iterator, then proceeds to mutate the list in some way, the iterator won't see those changes. Why? Because the mutations are done by making a new array that has the mutations, but the iterator is iterating through the old array, which doesn't have the mutations. This is why we say that the iterator takes a "snapshot".

    Here's some code from OpenJDK to illustrate.

    In iterator(), it simply creates a COWIterator with getArray(), which gets the snapshot by returning the volatile array field:

    final Object[] getArray() {
        return array;
    }
    
    ...
    
    public Iterator<E> iterator() {
        return new COWIterator<E>(getArray(), 0);
    }
    

    And mutator methods, such as add, sets the array field:

    final void setArray(Object[] a) {
        array = a;
    }
    
    ...
    
    public boolean add(E e) {
        Object[] elements = getArray();
        int len = elements.length;
        Object[] newElements = Arrays.copyOf(elements, len + 1);
        newElements[len] = e;
        setArray(newElements);
        return true;
    }
    

    I've removed the (un)locking code to make it easier to see what's happening.