Search code examples
gpuopenmpnvidiaintel-oneapioffloading

OpenMP offloading with Intel oneAPI DPC++ compiler to NVIDIA GPU


I'm on a mission to write a program with OpenMP offloading to a GPU. At the moment I compile my code with Intel oneAPI DPC++ compiler icpx v2022.1.0 and aim to utilise an NVIDIA Tesla V100 at the backend. Please find below the relevant parts of my Makefile:

MKLROOT   = /lustre/system/local/apps/intel/oneapi/2022.2.0/mkl/latest

CXX       = icpx
INC       =-I"${MKLROOT}/include"
CXXFLAGS  =-qopenmp -fopenmp-targets=spir64 ${INC} --gcc-toolchain=/lustre/system/local/apps/gcc9/9.3.0
LDFLAGS   =-qopenmp -fopenmp-targets=spir64 -fsycl -L${MKLROOT}/lib/intel64
LDLIBS    =-lmkl_sycl -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lsycl -lOpenCL -lstdc++ -lpthread -lm -ldl

${EXE}: ${OBJ}
    ${CXX} ${CXXFLAGS} $^ ${LDFLAGS} ${LDLIBS} -o $@

The code compiles without errors and warnings, but I'm not entirely sure it does use the GPU when it runs.

  1. How can I verify that? Can I use an Intel or an NVIDIA profiler to check that?
  2. Is my assumption correct, that Intel compiler supports offloading to an NVIDIA GPU?
  3. Or should I better use an NVIDIA compiler to enable OpenMP offloading to NVIDIA graphics cards?

Solution

  • How can I verify that? Can I use an Intel or an NVIDIA profiler to check that?

    On systems with Nvidia GPUs like a V100, you can use nvidia-smi so to check the state of the GPU. You can also use profilers like the Nsight suite (or the old deprecated nvvp).

    Is my assumption correct, that Intel compiler supports offloading to an NVIDIA GPU?

    According to Intel, it is supported:

    The OpenMP* Offload to GPU feature of the Intel® oneAPI DPC++/C++ Compiler and the Intel® Fortran Compiler compiles OpenMP source files for a wide range of accelerators. Only the icx and ifx compilers support the OpenMP Offload feature.

    As far as I understand, they generate either a Clang-based intermediate-code for GPU, or a SPIR64 binary.

    The former can certainly be used on Nvidia GPU according to Nvidia (despite the lack of information provided by Intel and Nvidia).

    The later is related to the SPIR standard. Indeed, AFAIK, DPC++ is an implementation of the open SYCL standard which can produce code for the SPIR-V ecosystem. SPIR means Standard Portable Intermediate Representation. It is meant for high-level languages to produce one unified portable code for many back-end. Hardware vendors have then to support it so for all high-level languages/tools to support the vendor. Thus, the vendor does not have to support high-level languages/tools directly.

    While I did not found any information provided by Nvidia supporting SPIR-V directly, SPIR codes can be executed on devices supporting recent version (>=1.2) of OpenCL and Vulkan. Fortunately, Nvidia recently claimed to support OpenCL 3.0.

    Put it shortly, it should work on the target Nvidia GPU though it might not be simple to do yet.

    Or should I better use an NVIDIA compiler to enable OpenMP offloading to NVIDIA graphics cards?

    The mainstream Nvidia compiler wrapper nvcc is meant to support CUDA codes that basically work only on Nvidia GPUs (with a great support). LLVM should support Nvidia GPUs (using the CUDA ecosystem), but the setup can be a bit tricky (and you need a recent version of the toolchain to avoid many issues). GCC, when built with the right flags and dependencies, supports OpenACC offloading to Nvidia PTX since version 5 and OpenMP offloading to PTX since version 7. Besides, while Nvidia does not support OpenMP offloading in their compiler wrapper nvcc, it also distributes the nvc and nvc++ compilers (formerly known as PGI HPC compilers) with OpenMP and OpenACC offloading.

    Note that OpenMP offloading is still quite new and rather experimental though some vendors appear to provide a good support so far.