Search code examples
c#.netparallel-processingparallel.foreach

Parallel.ForEach slower than normal foreach


I'm playing around with the Parallel.ForEach in a C# console application, but can't seem to get it right. I'm creating an array with random numbers and i have a sequential foreach and a Parallel.ForEach that finds the largest value in the array. With approximately the same code in c++ i started to see a tradeoff to using several threads at 3M values in the array. But the Parallel.ForEach is twice as slow even at 100M values. What am i doing wrong?

class Program
{
    static void Main(string[] args)
    {
        dostuff();

    }

    static void dostuff() {
        Console.WriteLine("How large do you want the array to be?");
        int size = int.Parse(Console.ReadLine());

        int[] arr = new int[size];
        Random rand = new Random();
        for (int i = 0; i < size; i++)
        {
            arr[i] = rand.Next(0, int.MaxValue);
        }

        var watchSeq = System.Diagnostics.Stopwatch.StartNew();
        var largestSeq = FindLargestSequentially(arr);
        watchSeq.Stop();
        var elapsedSeq = watchSeq.ElapsedMilliseconds;
        Console.WriteLine("Finished sequential in: " + elapsedSeq + "ms. Largest = " + largestSeq);

        var watchPar = System.Diagnostics.Stopwatch.StartNew();
        var largestPar = FindLargestParallel(arr);
        watchPar.Stop();
        var elapsedPar = watchPar.ElapsedMilliseconds;
        Console.WriteLine("Finished parallel in: " + elapsedPar + "ms Largest = " + largestPar);

        dostuff();
    }

    static int FindLargestSequentially(int[] arr) {
        int largest = arr[0];
        foreach (int i in arr) {
            if (largest < i) {
                largest = i;
            }
        }
        return largest;
    }

    static int FindLargestParallel(int[] arr) {
        int largest = arr[0];
        Parallel.ForEach<int, int>(arr, () => 0, (i, loop, subtotal) =>
        {
            if (i > subtotal)
                subtotal = i;
            return subtotal;
        },
        (finalResult) => {
            Console.WriteLine("Thread finished with result: " + finalResult);
            if (largest < finalResult) largest = finalResult;
        }
        );
        return largest;
    }
}

Solution

  • It's performance ramifications of having a very small delegate body.

    We can achieve better performance using the partitioning. In this case the body delegate performs work with a high data volume.

    static int FindLargestParallelRange(int[] arr)
    {
        object locker = new object();
        int largest = arr[0];
        Parallel.ForEach(Partitioner.Create(0, arr.Length), () => arr[0], (range, loop, subtotal) =>
        {
            for (int i = range.Item1; i < range.Item2; i++)
                if (arr[i] > subtotal)
                    subtotal = arr[i];
            return subtotal;
        },
        (finalResult) =>
        {
            lock (locker)
                if (largest < finalResult)
                    largest = finalResult;
        });
        return largest;
    }
    

    Pay attention to synchronize the localFinally delegate. Also note the need for proper initialization of the localInit: () => arr[0] instead of () => 0.

    Partitioning with PLINQ:

    static int FindLargestPlinqRange(int[] arr)
    {
        return Partitioner.Create(0, arr.Length)
            .AsParallel()
            .Select(range =>
            {
                int largest = arr[0];
                for (int i = range.Item1; i < range.Item2; i++)
                    if (arr[i] > largest)
                        largest = arr[i];
                return largest;
            })
            .Max();
    }
    

    I highly recommend free book Patterns of Parallel Programming by Stephen Toub.