Search code examples
openmprace-condition

openmp: race condition in for loop and critical section


I'm getting unpredictable errors in my program when running with a lot of threads and am trying to find where the problem sits by inserting critical sections. Now I've got to the following

void A::func()
{
 #pragma omp parallel for
 for (int i=0; i<100; ++i)
  {
    #pragma omp critical
    results[i] = compute(i);
  }
}

This still gives me the problem that I'm trying to find. The only way I can get it to disappear is by removing the parallelisation of the for loop (not very desirable!).

Note that the compute function is complicated and actually uses some caches. I wouldn't be surprised if there is some problem in that code, but as the whole thing is now in a critical section, I can't understand what's going on.

The actual for loop that I'm trying to debug is at https://github.com/UCL/STIR/blob/49c65340a2116834fea6a9a5cb17f73eb4a6f259/src/scatter_buildblock/ScatterSimulation.cxx#L179-L198. The test version that still has the problem (close to the example above) is here

Note: I had first posted this in a more complicated scenario but I've deleted that post as 1) my simple example there doesn't exhibit the problem and 2) this scenario is even more puzzling.

Some more info added. I'm trying this on Windows 10 with a Ryzen 9 5900. My test code shows occasional problems with gcc-8 and gcc-9 on WSL2/Ubuntu 20.04. (I observe no problems with VS 2019 on windows nor clang-13 on WSL2/Ubuntu 20.04, but that doesn't mean a lot I guess).


Solution

  • This is the code snippet in question from github.

    #ifdef STIR_OPENMP
    #pragma omp parallel for reduction(+:total_scatter) schedule(dynamic)
    #endif
    
        for (int i = 0; i < static_cast<int>(all_bins.size()); ++i)
        {
            const Bin bin = all_bins[i];
            const double scatter_ratio = scatter_estimate(bin);
    
    #if defined STIR_OPENMP 
    #  if _OPENMP >= 201107
    #    pragma omp atomic write
    #  else
    #    pragma omp critical(ScatterSimulationByBin_process_data_for_view_segment_num)
    #  endif
    #endif
            viewgram[bin.axial_pos_num()][bin.tangential_pos_num()] =
                    static_cast<float>(scatter_ratio);
            total_scatter += static_cast<double>(scatter_ratio);
        } // end loop over bins
    

    These are the possible causes of your problem:

    1. scatter_estimate is not threadsafe. Using #pragma omp critical before the function would have solved it if this is the case.
    2. bin.axial_pos_num() or bin.tangential_pos_num() is not threadsafe. Please check these as well.
    3. The result depends on the order of execution of your loop. If different i gives the same indices (i.e. bin.axial_pos_num() and bin.tangential_pos_num()) then different execution order may give different result, because you overwrite the same element of viewgram array in different order. Note that, if this is the case it may be a flaw in your algorithm.