c++multithreading c++11 openmp stdthread

Nested openMP parallelisation in combination with std::thread

Hello fellow StackOverFlowers,

I am currently working on a bigger project in the area of image-processing. I am developing using Visual Studio 2013 (not negotiable). Without bothering you with any further details, here is my problem:

I have two actions that have to run in parallel:

The iterative solution of a system of linear equations (Using 1-2 threads)
A fairly complex process involving image-to-image-registrations. (Using all remaining threads)

In order to know which images need to be registered, an approximate solution of the system of linear equations is required. Therefore they need to run simultaniously. (Thanks to Z boson for pointing out the absence of this information). The iterative Solution runs constantly and gets informed after every successful image registartion.

The code is going to run on a 24-cored system.

At the moment the image-registration is implemented using openMP and a "#pragma omp parallel for". The iterative solution is being started using an std::thread and also uses an openMP "#pragma omp parallel for" internally.

Now I know, that according to the omp documentation an omp-thread that finds a nested parallelism will use its thread-team to execute the code. But I would think that this does not work in my case, since its a std::thread thats starts the second omp-parallelism.

For better understanding here is an example-code:

int main()
{
    std::thread * m_Thread = new std::thread(&IterativeSolution);

    #pragma omp parallel for
    for(int a = 0; a < 100; a++)
    {
        int b = GetImageFromApproximateSolution();
        RegisterImages(a,b);
        // Inform IterativeSolution about result of registration
    }
}

void IterativeSolution()
{
    #pragma omp parallel for
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

My question at this point is: Will the above code create too many threads? If so, would the following code solve the problem?

int main()
{
    // The max is to avoid having less than 1 thread
    int numThreads = max(omp_get_max_threads() - 2, 1); 

    std::thread * m_Thread = new std::thread(&IterativeSolution);

    #pragma omp parallel for num_threads(numThreads)
    for(int a = 0; a < 100; a++)
    {
        int b = GetImageFromApproximateSolution();
        RegisterImages(a,b);
        // Inform IterativeSolution about result of registration
    }
}

void IterativeSolution()
{
    #pragma omp parallel for num_threads(2)
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

Solution

This produces undefined behavior in terms of the OpenMP standard. Most implementations I have tested will create 24 threads for each of those two parallel regions in your first example, for a total of 48. The second example should not create too many threads, but since it relies on undefined behavior it may do anything from crashing to turning your computer into a jelly-like substance without warning.

Since you're already using OpenMP, I would recommend making it standards compliant OpenMP by simply removing the std::thread, and using nested OpenMP parallel regions instead. You can do so like this:

int main()
{
    // The max is to avoid having less than 1 thread
    int numThreads = max(omp_get_max_threads() - 2, 1); 
    #pragma omp parallel num_threads(2)
    {
        if(omp_get_thread_num() > 0){
            IterativeSolution();
        }else{
            #pragma omp parallel for num_threads(numThreads)
            for(int a = 0; a < 100; a++)
            {
                int b = GetImageFromApproximateSolution();
                RegisterImages(a,b);
                // Inform IterativeSolution about result of registration
            }
        }
    }
}

void IterativeSolution()
{
    #pragma omp parallel for num_threads(2)
    for(int i = 0; i < 2; i++)
    {
        //SolveColumn(i);
    }
}
void RegisterImage(int a, int b)
{
    // Do Registration
}

Chances are that you will need to add the environment variable definitions OMP_NESTED=true and OMP_MAX_ACTIVE_LEVELS=2, or more, to your environment to enable nested regions. This version has the advantage of being completely defined in OpenMP, and should work portably on any environment that supports nested parallel regions. If you have a version that does not support nested OpenMP parallel regions, then your suggested solution may be the best remaining option.