Search code examples
c#.netmultithreadingparallel-processingplinq

ParallelEnumerable.Aggregate for several methods


Start to learn multithreading. Have 3 methods to calculate a sum, average, and product of square roots of an array.

At first, I make three separate blocking calls using PLINQ. Then I thought that it would be nice to be able to make it in a single call and return an object with sum, product, and average at the same time. I read that ParallelEnumerable.Aggregate can help me with this, but I totally don't know how to use it.

I would be really grateful for some explanation how to use this function in my case, good/bad aspects of this approach.

public static double Average(double[] array, string tool)
        {
            if (array == null) throw new ArgumentNullException(nameof(array));
            double sum = Enumerable.Sum(array);
            double result = sum / array.Length;
            Print(tool, result);
            return result;
        }

        public static double Sum(double[] array, string tool)
        {
            if (array == null) throw new ArgumentNullException(nameof(array));
            double sum = Enumerable.Sum(array);
            Print(tool, sum);
            return sum;
        }

        public static void ProductOfSquareRoots(double[] array, string tool)
        {
            if (array == null) throw new ArgumentNullException(nameof(array));
            double result = 1;
            foreach (var number in array)
            {
                result = result * Math.Sqrt(number);
            }
            Print(tool, result);
        }

Solution

  • The three aggregated values (average, sum and product of square roots) that you want to compute can each be computed by performing a single pass over the numbers. Instead of doing this three times (one for each aggregated value) you can do this once and aggregate the three values inside the loop (this should save time).

    The average is the sum divided by the count and as you already are computing the sum you only need the count in addition to get the average. If you know the size of the input you don't even have to count the items but here I assume that the size of the input is unknown in advance.

    If you want to use LINQ you can use Aggregate:

    var aggregate = numbers.Aggregate(
        // Starting value for the accumulator.
        (Count: 0, Sum: 0D, ProductOfSquareRoots: 1D),
        // Update the accumulator with a specific number.
        (accumulator, number) =>
        {
            accumulator.Count += 1;
            accumulator.Sum += number;
            accumulator.ProductOfSquareRoots *= Math.Sqrt(number);
            return accumulator;
        });
    

    The variable aggregate is a ValueTuple<int, double, double> with the items Count, Sum and ProductOfSquareRoots. Before C# 7 you would use an anonymous type. However, that would require an allocation for each value in the input sequence slowing down the aggregation. By using a mutable value tuple the aggregation should become faster.

    Aggregate works with PLINQ so if numbers is of type ParallelQuery<T> and not IEnumerable<T> then the aggregation will be performed in parallel. Notice that this requires the aggregation to be both associative (e.g. (a + b) + c = a + (b + c) and commutative (e.g. a + b = b + a) which in your case is true.

    PLINQ has an overhead so it might not perform better compared to single threaded LINQ depending on the number of elements in your sequence and how complex the calculations are. You will have to measure this yourself to determine if PLINQ speeds things up. However, you can use the same Aggregate expression in both LINQ and PLINQ making your code easy to switch from single threaded to parallel by inserting AsParallel() the right place.