java java-stream time-complexity collectors

Runtime complexity of Collectors#toList

In the Java library source code, the Collectors#toList method is defined like this:

public static <T>
Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}

We see BinaryOperator as third parameter of CollectorImpl's contructor, which merges two sub-results in linear time.

Does it mean, that in case of frequent usage of this function by Stream#collect method, we can gain square computation time?

Consider this code:

List<Integer> desc = Stream.iterate(n, k -> k - 1).limit(n + 1)
    .collect(Collectors.toList());

desc.parallelStream()
    .map(k -> {
        try {
            Thread.sleep(k * 500);
        } catch (InterruptedException ignored) {
        }
        return k;
    })
    .collect(Collectors.toList());

Elements of second stream happened to be computed in descend order. The easiest what collect method can do is for each number wrap it into List and add all next numbers to it, with square complexity in total, how sad.

Solution

In this case the input desc list will be divided to the several independent parts depending on number of hardware threads your system has. Usually it's 16 parts on 4-core system (though it's not specified and may change). Each part will be processed independently using the accumulator, then results will be merged together using the combiner. So it won't drop to quadratic complexity, but yes, many unnecessary copying will be done.

Actually it's much more efficient to use toArray() method. It checks the stream source characteristics and in your case it's especially optimized as the source is SIZED and SUBSIZED, so the result can be written into single array without any additional copying. If you need the List, you may consider using Arrays.asList(desc.parallelStream()....toArray(Integer[]::new))