Search code examples
compilationmacrosopenmpc-preprocessor

How OpenMP macros work behind the scenes in collaboration with the preprocessor/compiler and the library itself?


I'm trying to implement a similar functionality to one of my projects and I was wondering how it works.

For example, I was wondering how #pragma omp parallel default(shared) private(iam, np) works in the following example from the compiler's/proprocessor's perspective? I'm referencing the compiler since I have read that #pragma macros are to give side information to the compiler. If I take into account that all the macros are handled by the preprocessor it gets really confusing to me.

How is the macro expanded and how the OpenMP library gets access to the information in those macros? Is there a specific compiler extension that OpenMP uses to fetch those information for every compiler that it supports or is it just simple macros invocation?

#include <stdio.h>
#include <mpi.h>
#include <omp.h>

int main(int argc, char *argv[])
{
    int numprocs, rank, namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int iam = 0, np = 1;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);

    #pragma omp parallel default(shared) private(iam, np)
    {
        np = omp_get_num_threads();
        iam = omp_get_thread_num();
        printf("Hybrid: Hello from thread %d out of %d from process %d out of %d on %s\n",
                iam, np, rank, numprocs, processor_name);
    }

    MPI_Finalize();

    return 0;
}

I got this example from here.


Solution

  • For example, I was wondering how #pragma omp parallel default(shared) private(iam, np) works in the following example from the compiler's/proprocessor's perspective?

    This is strongly dependent of the compiler implementation. In practice, for Clang and GCC (and probably ICC), the pragma annotation gives information to compilers steps enabling it to transform the code in a front-end pass. Put it simply, the front-end of a compiler is the one doing preprocessing, tokenization, syntactic analysis and semantic analysis, while the back-end does optimizations and code generation.

    For most steps, mainstream compilers enable you to get the temporary output intermediate code. For example Clang and GCC have the -E flag for the preprocessor and -S for code generation. Low-level intermediate representation (IR) are more dependant to a compiler implementation so the flags are not the same (nor the optimizations and the intermediate language). GCC use a GENERIC/GIMPLE language for the high-level IR while Clang use the LLVM IR language. AFAIK, the GIMPLE code can be dump using the -fdump-* flags. For Clang, -emit-llvm can be used to dump the IR code.

    In Clang, the transformation is done after the AST generation, but before the first IR generation. Note that some other compilers does an AST transformation, while some other do that in later steps. When OpenMP is enabled (with -fopenmp), Clang replaces the pragma region with an __kmpc_fork_call and generate a function for the region which is passed to KMP function. KMP is the prefix for the IOMP runtime shared by both Clang and ICC. GCC has its own runtime called GOMP. There are many other runtimes but the mainstream ones are GOMP and IOMP. Also note that GCC uses a similar strategy by calling GOMP_parallel with a generated function provided at runtime. The IOMP/GOMP runtimes take care of initializing the region and the ICV before calling the compiler-generated function.

    Note that the processor is not aware of the use of OpenMP (at least not for all OpenMP implementations I am aware of).

    How is the macro expanded and how the OpenMP library gets access to the information in those macros?

    Note that pragma annotations are not macros, there are more powerful than that: they provide information to the compiler that can perform non trivial changes during any compilation steps. For example, a pragma can change the way the code generation is performed which is impossible with preprocessor macros (eg. #pragma GCC unroll n for loop unrolling in GCC and #pragma ivdep for telling ICC that there is no loop-carried dependencies enabling auto-vectorization).

    The information are passed to the main runtime fork function as arguments (ie. __kmpc_fork_call and GOMP_parallel) like the compiler-generated user function.

    Is there a specific compiler extension that OpenMP uses to fetch those information for every compiler that it supports or is it just simple macros invocation?

    It is not just simple macros invocation and AFAIK there is no external module for GCC and Clang. They are directly integrated to the compiler (though it may be modular, especially for Clang). This is important because compilers need to analyse the pragma annotations at compile-time. The pragma are not just a way to automatically generate runtime calls and abstract them with a standard language/interface, they also impact the compiler steps. For example, #pragma omp simd should impact the auto-vectorization optimization steps of compilers (back-end steps).

    AFAIK, there are some (research) OpenMP implementations based on a source-to-source compilation so to be compiler independent but I am not sure they supports all OpenMP features (especially SIMD ones).