Search code examples
javafilterjava-streamreduce

Why is using reduce on a parallel stream producing a larger value than on a sequential stream?


Can someone tell me why this is happening and if it's expected behaviour or a bug

List<Integer> a = Arrays.asList(1,1,3,3);

a.parallelStream().filter(Objects::nonNull)
        .filter(value -> value > 2)
        .reduce(1,Integer::sum)

Answer: 10

But if we use stream instead of parallelStream I'm getting the right & expected answer 7


Solution

  • The first argument to reduce is called "identity" and not "initialValue".

    1 is no identity according to addition. 1 is identity for multiplication.

    Though you need to provide 0 if you want to sum the elements.


    Java uses "identity" instead of "initialValue" because this little trick allows to parallelize reduce easily.


    In parallel execution, each thread will run the reduce on a part of the stream, and when the threads are done, they will be combined using the very same reduce function.

    Though it will look something like this:

    mainThread:
      start thread1;
      start thread2;
      wait till both are finished;
    
    thread1:
      return sum(1, 3); // your reduce function applied to a part of the stream
    
    thread2:
      return sum(1, 3);
    
    // when thread1 and thread2 are finished:
    mainThread:
      return sum(sum(1, resultOfThread1), sum(1, resultOfThread2));
      = sum(sum(1, 4), sum(1, 4))
      = sum(5, 5)
      = 10
    

    I hope you can see, what happens and why the result is not what you expected.