Search code examples
c#concurrencythread-safetyparallel.foreach

Parallel.Foreach and for each producing different results: Why is my code unsafe?


I have a text file which I read to a string content. To identify the textbody which I want to process further I am getting the indexes of keywords in the string and then set the "starting" index to the smallest index found.

I tried this with Parallel.ForEach ...

ConcurrentBag<int> indexes = new();
int index;

switch (Case)
{
    case 1:
        Parallel.ForEach(KeywordTypes.GetImplementedNamedObjects(), inos =>
        {
            index = content.IndexOf($"/begin {inos}");
            index = index == -1 ? content.Length : index;
            indexes.Add(index);
        });
        index = indexes.Min();
        return index;

... and with foreach:

foreach (string inos in KeywordTypes.GetImplementedNamedObjects())
{
    index = content.IndexOf($"/begin {inos}");
    index = index == -1 ? content.Length : index;
    indexes.Add(index);
}

index = indexes.Min();
return index;

where foreach produces the expected result but Parallel.ForEach does not.

Why is my code not thread-safe?


Solution

  • There is only one index variable here, because it is "captured". This means that multiple threads can be squabbling over it, instead of having their own version each.

    Consider:

    • thread A computes index = content.IndexOf($"/begin {inos}");
    • thread B computes index = content.IndexOf($"/begin {inos}"); - oops, thread A's version just got overwritten
    • thread A computes index = index == -1 ? content.Length : index; using the index that B just updated
    • etc

    the point is: a value got lost due to thread contention.

    Simply move the declaration of index to fix this:

    Parallel.ForEach(KeywordTypes.GetImplementedNamedObjects(), inos =>
    {
        var index = content.IndexOf($"/begin {inos}");
        ...
    

    Fundamentally, the scope of a variable is defined by where it is declared. If a variable is declared outside of a local method / lambda, the compiler respects what you've asked, and that variable is shared between all uses of that local method / lambda; if it is declared inside the local method / lambda, then the lifetime is local to that invocation, and no state is shared between the callers.

    If you want to be absolutely sure you're not accidentally leaking state, the static modifier on a lambda achieves this, although it also prevents indexes being accessed, so ... probably not what you need here.