Search code examples
c++loopsc++11openmp

Trouble understanding some unexpected behaviour in code with OpenMP


Question regarding OpenMP parallelization. I have included a stripped down version of my function below. The problem is that, the contents of the for loop are not getting evaluated for all values of uiIndex, although not always.

  1. I use the buffer vec_succ_status to check if all values of uiIndex are getting evaluated. It turns out that it is not.
  2. My code does not crash, it just exits from the function compute_Lagr_shortest_paths_from_source, without encountering any of the exit(-1) statements in the function definition below.
  3. I am using g++ 7.4.0 version on Ubunutu 14, and every time it has failed, there is exactly one value of uiIndex that was skipped. There is no consistency to the uiIndex for which the function fails to evaluate.
  4. For the programs I have been testing, the size of vec_group is always 1, so only the first if statement inside the for loop will evaluate.
  5. In my main function, I included the line omp_set_num_threads(4). Apart from that, I did not set any other settings (such as scheduler type) for OpenMP.
  6. Also, I can assure that no 2 values of uiIndex lead to the same uiRobot value, so no 2 threads will ever have to access the same vec_cf_graphs[uiRobot] array through the lieftime of the function.

I wonder if I am making some wrong assumptions about OpenMp. I require all objects such as vec_cf_graphs, vec_succ_status to be shared across all threads. I am wondering if I need to explicitly mention them as shared, as it usually the recommended approach. Anyways, I thought the way I have implemented also suffices. However, it seems rather strange to me that certain uiIndex values can get skipped altogether. I must point out that, I repeatedly call the function shown, but only sometimes certain uiIndex values are getting skipped from evaluation. If someone can point me to potential issues with my approach, that would be great. I am happy to provide additional information. Thanks.

bool compute_Lagr_shortest_paths_from_source(std::vector<Robot_CF_Graph>& vec_cf_graphs, const std::vector<std::vector<size_t>>& vec_robot_groups)
{
    size_t uiIndex;
    std::vector<bool> vec_succ_status(vec_robot_groups.size(), false);

    #pragma omp parallel for default(shared) private(uiIndex)
    for(uiIndex = 0; uiIndex < vec_robot_groups.size(); uiIndex++)
    {
        vec_succ_status[uiIndex] = false;
        const auto& vec_group = vec_robot_groups[uiIndex];

        if(1 == vec_group.size())
        {
        size_t uiRobot = vec_group[0];
        vec_cf_graphs[uiRobot].compute_shortest_path("ABC"); 
        vec_succ_status[uiIndex] = true;  
        }
        else          
        {            
        std::cout<< "Tag: Code should not have entered this block"<<endl;
        exit(-1);                       
        }

        if(false == vec_succ_status[uiIndex])
        {
        std::cout<< "It is not possible for this to happen \n";
        exit(-1);
        }
    }   

    return true;     
}

Solution

  • You concurrently write to a vector<bool> which is not a 'normal' vector. It has an internal memory optimization. This is undefined behaviour.

    See detailed reasoning here:

    Write concurrently vector<bool>

    How vector<bool> is different from other vectors can be found here:

    https://en.cppreference.com/w/cpp/container/vector_bool

    Just using a vector<char> with 0 or 1 representing true or false is the easiest way to solve this. Other options are discussed here, if you want to have more elegant code:

    Alternative to vector<bool>