Search code examples
c++openmp

OpenMP omp_get_num_threads() V.S. omp_get_max_threads()


I do not understand the difference between omp_get_num_threads() and omp_get_max_threads(). I copy the demo code as the following.

    omp_set_nested(1);
    omp_set_max_active_levels(10);
    omp_set_dynamic(0);
    omp_set_num_threads(2);
    #pragma omp parallel 
    {
        omp_set_num_threads(3);

        #pragma omp parallel
        {
            omp_set_num_threads(4);
            #pragma omp single
            {
                std::cout << omp_get_max_active_levels() << " " << omp_get_num_threads() << " " 
                << omp_get_max_threads() << std::endl;
            }
        }

        #pragma omp barrier
        #pragma omp single 
        {
            std::cout << omp_get_max_active_levels() << " " << omp_get_num_threads() << " " 
                << omp_get_max_threads() << std::endl;
        }
    }

And then I got the following output.

10 3 4
10 3 4
10 3 4
10 3 3

I have checked the official documentation, but I am still confused about that.


Solution

  • From documentation:

    omp_get_num_threads

    The omp_get_num_threads routine returns the number of threads in the team executing the parallel region to which the routine region binds. If called from the sequential part of a program, this routine returns 1.

    omp_get_max_threads

    The value returned by omp_get_max_threads is the value of the first element of the nthreads-var ICV of the current task. This value is also an upper bound on the number of threads that could be used to form a new team if a parallel region without a num_threadsclause were encountered after execution returns from this routine.

    The figure below illustrates the flow of threads. Your output may be incorrect, and I can't reproduce it with clang+libomp or gcc+libGOMP.

    enter image description here

    The omp_get_max_threads always returns the number of threads that a new parallel construct can create, if the number of threads is not specified along with it. When you set 4 on omp_set_num_threads at inner parallel region, the maximum number of new different threads that can be created is 4, but in that region 3 are in use. For the outer parallel region, the max is 3, and 2 are in use.

    In a serial code, out of any pragmas, the number of threads is 1, but the max is the default for the system (usually the number of cores), if you not changed it via omp_set_num_threads or OMP_NUM_THREADS environment variable