Search code examples
c#.net-4.0task-parallel-library

TPL Data Parallelism Issue


I have a situation to process the set of data in parallel, in the end I want to know how many of them in total have been processed successfully. I come with following dummy code by following the sample at http://msdn.microsoft.com/en-us/library/dd460703.aspx and https://web.archive.org/web/20230128065739/http://reedcopsey.com/2010/01/22/parallelism-in-net-part-4-imperative-data-parallelism-aggregation/

    public void DoWork2()
    {
        int sum = 0;
        Parallel.For<int>(0, 10,
            () => 0,
            (i, lockState, localState) =>
            {
                DummyEntity entity = DoWork3(i);
                if (entity != null)
                {
                    Console.WriteLine("Processed {0}, sum need to be increased by 1.", i);
                    return 1;
                }
                else
                {
                    Console.WriteLine("Processed {0}, sum need to be increased by 0.", i);
                    return 0;
                }
            },
            localState =>
            {
                lock (syncRoot)
                {
                    Console.WriteLine("Increase sum {0} by {1}", sum, localState);
                    sum += localState;
                }
            }
            );
        Console.WriteLine("Total items {0}", sum);
    }

    private DummyEntity DoWork3(int i)
    {
        if (i % 2 == 0)
        {
            return new DummyEntity();
        }
        else
        {
            return null;
        }
    }

However the result changes every time I run. I think there is some thing wrong with the code. But could not figure out why.


Solution

  • Your problem is your choice in overloads. You've stored local state information to minimize the use of global state, yet you're not using the local state.

    If you note from the example you gave they use the subtotal (what you've called localState) in the body of the loop:

    subtotal += nums[j];
    return subtotal;
    

    Compare this to your code (made a bit more concise):

    if (entity != null)
    {
        return 1;
    }
    else
    {
        return 0;
    }
    

    No mention of localState is there, so you've effectively thrown away some of the answers. If you change it instead to read:

    if (entity != null)
    {
        return localState + 1;
    }
    else
    {
        return localState;
    }
    

    You'll find the following answer on the command line (for this given problem):

    Total items 5
    

    This usage of local state is in order to reduce access to shared state.

    Here is a snippet from using 0..50 as the range:

    Processed 22, sum need to be increased by 1.
    Processed 23, sum need to be increased by 0.
    Increase sum 0 by 1
    Processed 8, sum need to be increased by 1.
    Processed 9, sum need to be increased by 0.
    Processed 10, sum need to be increased by 1.
    Processed 11, sum need to be increased by 0.
    Increase sum 1 by 2
    Increase sum 3 by 8
    Increase sum 11 by 10
    Processed 16, sum need to be increased by 1.
    Processed 17, sum need to be increased by 0.
    Processed 18, sum need to be increased by 1.
    Increase sum 21 by 4
    Total items 25