Search code examples
java-8distinctgroupingjava-streamunordered

How "unordered" helps "distinct()" and "groupingBy"


I am doing the Oracle's Stream API Java 1.8 course and I stumbled upon this while perusing the lecture notes:

unordered():

– Inherited from BaseStream

– Returns a stream that is unordered (used internally)

– Can improve efficiency of operations like distinct() and groupingBy()

Here is my question. How can the property of being unordered lead to more efficient calculation of distinct() and groupingBy()


Solution

  • It only has significance in case of parallel streams. In case of ordered parallel streams, the distinct() operation has to do extra work in order to keep it's stability guarantee, that is,

    for duplicated elements, the element appearing first in the encounter order is preserved

    (see the API Note part in the javadoc for Stream.distinct().

    In case of unordered parallel streams, no such guarantee needs to be kept, as the stream is already unordered. This way, removing the ordered characteristic from an ordered parallel stream can greatly improve the performance of the distinct() operation.

    Similarly, for the groupingBy() operation, lifting the requirement that the stream order should be preserved can greatly improve the efficiency of the operation in case of parallel streams as the reduction itself can be performed concurrently. Note that this will only happen while collecting from parallel streams with concurrent collectors, with either the collector or the stream itself being unordered. Practically, you'll need to use Stream.collect(groupingByConcurrent(..)) instead of Stream.collect(groupingBy(..)). See the javadoc for Stream.collect() and Collector for more details.