Search code examples
javajava-streamlazy-evaluationshort-circuiting

Weird streams behavior with sorted() and concat()


Stream evaluation is usually lazy (by default), unless statefull operations exist as part of the pipeline. I encountered a case where the lazyness is violated due to stateful operation and I don't understand why it happens.

Consider the following code:

List<Integer> l1 = List.of(4, 5, 3, 1, 2);
List<Integer> l2 = List.of(6, 7, 8, 9, 10);

Stream
    .concat(
        l1.stream()
            .map(i -> {
                System.out.println("in the map for " + i);
                if (i % 3 != 0) {
                    return null;
                }
                return i;
            }),
        l2.stream())
    .filter(i -> {
        System.out.println("in the filter " + i);
        return i != null;
    })
    .findAny();

In details:

I have two steams constructed out of integer lists (l1 & l2). Both streams are concatenated to form a new stream.

The l1 stream goes through some mapping that converts every item not divisible by 3 to null; the l2 stream is taken as is. On the concatenated stream, I am adding a filter (filters the non-null values only --> so from the first stream, only the items divided to 3 will go through the pipeline) and finally a terminal operation findAny which triggers the stream's pipeline (and will effectively deliver back the first item divisible by 3 and stop the stream processing).

This code works as expected: first all l1 items are consumed before l2 items are reached. The output shows how l1 mapping function is called followed by the concatenated-stream-filter function for the first two l1 items, and the whole stream is finished when the 3rd item of l1 is not converted to null and thus survives the filter:

in the map for 4
in the filter null
in the map for 5
in the filter null
in the map for 3
in the filter 3

The problem (or the thing I don't understand) starts when the l1 stream is modified with the .sorted operation:

Stream
    .concat(
        l1.stream()
            .sorted()
            .map(i -> {
                System.out.println("in the map for " + i);
                if (i % 3 != 0) {
                    return null;
                }
                return i;
            }),
        l2.stream())
    .filter(i -> {
        System.out.println("in the filter " + i);
        return i != null;
    })
    .findAny();

... now things look different:

in the map for 1
in the map for 2
in the map for 3
in the map for 4
in the map for 5
in the filter null
in the filter null
in the filter 3

as sorted is statefull operation, I know that it first needs to consume the entire l1 stream to sort its values. My surprise came as it seems that it also affects the remaining of the l1 pipeline as the map function is called eagerly before any of the concatenated-stream-filter method invocations, as was before.

I read Java Streams - Buffering huge streams and Why filter() after flatMap() is "not completely" lazy in Java streams?, and I am already running on java 17 and working with Stream.concat() and I am not using flatMap() (at least not explicitly).

Can you explain why ? What am I missing here ?


Solution

  • This is caused by JDK-8277306: stream with sorted() and concat() causes some ops not to be lazy, which was closed as “Won’t fix” with the following comment:

    Stream.concat takes the spliterator from each input stream and combines them to create a new spliterator from which a new stream is constructed. Thereby it binds to the sources of each stream to concatenate.

    It is currently not possible propagate the short circuiting property of the stream pipeline after the concatenation to pipeline before the concatenation. It comes down to resolving the push vs. pull differences across the spliterator boundary. That's a tricky problem, and one that Ilikely requires significant effort that I find hard to justify given the scope of the problem.