Search code examples
javaspliterator

Spliterator on modified underlying Collection


I know that this should never occur in production, but I'm trying to understand some intricate details about Spliterators and bumped into the following "puzzler" (at least a puzzler for me):

(Snippet 1)

List<Integer> list = new ArrayList<>() {{ add(1); add(2); add(3); add(4); }};
Spliterator<Integer> spl1 = list.spliterator();
list.add(5);
list.add(6);
Spliterator<Integer> spl2 = s1.trySplit();
s1.forEachRemaining(System.out::print);
s2.forEachRemaining(System.out::print);

This code prints 456123, as expected (cough I already expected a ConcurrentModificationException, but I understand the behaviour cough), i.e., it creates a Spliterator on the list, which will be split when the list has 6 elements, etc. pp. So far so good.

What I don't understand is the following:

(Snippet 2)

List<Integer> list = new ArrayList<>() {{ add(1); add(2); add(3); add(4); }};
Spliterator<Integer> spl1 = list.spliterator();
Spliterator<Integer> spl2 = s1.trySplit();
list.add(5);
list.add(6);
s1.forEachRemaining(System.out::print);
s2.forEachRemaining(System.out::print);

I expect this code to fail, and it does, with a ConcurrentModificationException on the line for s1.forEachRemaining, it will however print 34 onto the output too. If it's changed to System.err::println, one sees that the values 3 and 4 are in this respective order put onto the PrintStream before the Exception.

Now the crazy part:

(Snippet 3)

List<Integer> list = new ArrayList<>() {{ add(1); add(2); add(3); add(4); }};
Spliterator<Integer> spl1 = list.spliterator();
Spliterator<Integer> spl2 = s1.trySplit();
list.add(5);
list.add(6);
s2.forEachRemaining(System.out::print);
s1.forEachRemaining(System.out::print);

Note that the only change between Snippet 2 and 3 is the order in which we access s1 and s2. Snippet 3 still fails with a ConcurrentModificationException, but the values printed are 1 and 2. This is, because the Exception happens now on the line for s2.forEachRemaining!

If I understand correctly, what happens is:

  • the Spliterator is initialized
  • the split is done
  • the iteration happens
    • during the iteration it is observed that there has been a modification on the underlying collection, after the last split has been done

Does this mean that Spliterators are also "lazy", just like Streams? However, this argument doesn't really hold up, when experimenting with multiple splits, i.e.,

(Snippet 4)

List<Integer> list = new ArrayList<>() {{ add(1); add(2); add(3); add(4); add(5); add(6); add(7); add(8); add(9); add(10); }};
Spliterator<Integer> s1 = list.spliterator();
Spliterator<Integer> s2 = s1.trySplit();
list.add(11);
list.add(12);
Spliterator<Integer> s3 = s2.trySplit();
s1.forEachRemaining(s -> System.err.println("1 " + s));
s2.forEachRemaining(s -> System.err.println("2 " + s));
s3.forEachRemaining(s -> System.err.println("3 " + s));

Should then evaluate s1 without problems and throw an exception during the processing of s2, but it already throws and exception during the processing of s1!

Any help or pointers are appreciated.

Details: I run the snippets on AdoptOpenJDK 11.0.4+11 (64-bit) on Windows in Eclipse 2019-06 (4.12.0), if it matters.


Solution

  • With your first snippet (minor errors corrected)

    List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3, 4));
    Spliterator<Integer> spl1 = list.spliterator();
    list.add(5);
    list.add(6);
    Spliterator<Integer> spl2 = spl1.trySplit();
    spl1.forEachRemaining(System.out::print);
    spl2.forEachRemaining(System.out::print);
    

    you are observing a documented behavior of supporting the changes:

    A Spliterator that does not report IMMUTABLE or CONCURRENT is expected to have a documented policy concerning: when the spliterator binds to the element source; and detection of structural interference of the element source detected after binding. A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size, rather than at the time the Spliterator is created. A Spliterator that is not late-binding binds to the source of elements at the point of construction or first invocation of any method. Modifications made to the source prior to binding are reflected when the Spliterator is traversed. After binding a Spliterator should, on a best-effort basis, throw ConcurrentModificationException if structural interference is detected. Spliterators that do this are called fail-fast. The bulk traversal method (forEachRemaining()) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.

    So here, we see a late-binding spliterator in action. Changes made prior any traversal are reflected during the traversal. This is the basis for the similar behavior of a Stream which is settled on a Spliterator:

    Unless the stream source is concurrent, modifying a stream's data source during execution of a stream pipeline can cause exceptions, incorrect answers, or nonconformant behavior. For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:

    List<String> l = new ArrayList(Arrays.asList("one", "two"));
    Stream<String> sl = l.stream();
    l.add("three");
    String s = sl.collect(joining(" "));
    

    First a list is created consisting of two strings: "one"; and "two". Then a stream is created from that list. Next the list is modified by adding a third string: "three". Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of "one two three".

    For your other snippets, the sentence applies

    A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size…

    So the spliterator is bound at the first (successful) trySplit call and adding elements afterwards to the source list will invalidate it and all spliterators split off from it.

    So with

    List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3, 4));
    Spliterator<Integer> spl1 = list.spliterator();
    Spliterator<Integer> spl2 = spl1.trySplit();
    list.add(5);
    list.add(6);
    spl1.forEachRemaining(System.out::print);
    spl2.forEachRemaining(System.out::print);
    

    you get the behavior described as optimized fail-fast:

    The bulk traversal method (forEachRemaining()) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.

    So you see the elements of the first spliterator you call forEachRemaining(System.out::print) on, followed by a ConcurrentModificationException. Changing the order of the last two statements just changes which elements are printed before the exception is thrown, perfectly matching the description.

    Your last snippet only demonstrates that trySplit doesn’t perform a check on its own. It successfully splits an already invalidated spliterator, resulting in another invalid spliterator. The behavior of all resulting spliterators stays the same, the first spliterator will detect the interference, but due to performance considerations only after the traversal.

    We can verify that all created spliterators behave the same:

    List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
    Spliterator<Integer> s1 = list.spliterator();
    Spliterator<Integer> s2 = s1.trySplit();
    list.add(11);
    list.add(12);
    Spliterator<Integer> s3 = s2.trySplit();
    try {            
        s1.forEachRemaining(s -> System.out.println("1 " + s));
    } catch(Exception ex) {
        System.out.println("1 "+ex);
    }
    try {            
        s2.forEachRemaining(s -> System.out.println("2 " + s));
    } catch(Exception ex) {
        System.out.println("2 "+ex);
    }
    try {            
        s3.forEachRemaining(s -> System.out.println("3 " + s));
    } catch(Exception ex) {
        System.out.println("3 "+ex);
    }
    
    1 6
    1 7
    1 8
    1 9
    1 10
    1 java.util.ConcurrentModificationException
    2 3
    2 4
    2 5
    2 java.util.ConcurrentModificationException
    3 1
    3 2
    3 java.util.ConcurrentModificationException