compiler-errors compilation lapack nvcc magma

How to properly use the -Xnvlink compiler options when we have 2 NVIDIA/CUDA GPU cards with NVLink hardware component

On Debian 10, I have 2 GPU cards RTX A6000 with NVlink hardware component and I would like to benefit of the potential combined power of both cards.

Currently, I have the following magma.make invoked by a Makefile :

CXX = nvcc -std=c++17 -O3
LAPACK = /opt/intel/oneapi/mkl/latest
LAPACK_ANOTHER=/opt/intel/mkl/lib/intel64
MAGMA = /usr/local/magma
INCLUDE_CUDA=/usr/local/cuda/include
LIBCUDA=/usr/local/cuda/lib64

SEARCH_DIRS_INCL=-I${MAGMA}/include -I${INCLUDE_CUDA} -I${LAPACK}/include
SEARCH_DIRS_LINK=-L${LAPACK}/lib/intel64 -L${LAPACK_ANOTHER} -L${LIBCUDA} -L${MAGMA}/lib

CXXFLAGS = -c -DMAGMA_ILP64 -DMKL_ILP64 -m64 ${SEARCH_DIRS_INCL}

LDFLAGS = ${SEARCH_DIRS_LINK} -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lcuda -lcudart -lcublas -lmagma -lpthread -lm -ldl -Xnvlink

SOURCES = main_magma.cpp XSAF_C_magma.cpp
EXECUTABLE = main_magma.exe

As you can see, I have use the last flag -Xnvlink but it generates the following error at compilation :

/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [Makefile:10: main_magma.exe] Error 1

How to use the right flag or options to include in the executable the combined power calls of 2 GPU with NVLink ?

Solution

I have use the last flag -Xnvlink ...

Let's consult some documentation:

The following table lists some useful nvlink options which can be specified with nvcc option --nvlink-options.

4.2.9.2.1. --disable-warnings (-w)
Inhibit all warning messages.

4.2.9.2.2. --preserve-relocs (-preserve-relocs)
Preserve resolved relocations in linked executable.

4.2.9.2.3. --verbose (-v)
Enable verbose mode which prints code generation statistics.

4.2.9.2.4. --warning-as-error (-Werror)
Make all warnings into errors.

4.2.9.2.5. --suppress-arch-warning (-suppress-arch-warning)
Suppress the warning that otherwise is printed when object does not contain code for target arch.

4.2.9.2.6. --suppress-stack-size-warning (-suppress-stack-size-warning)
Suppress the warning that otherwise is printed when stack size cannot be determined.

4.2.9.2.7. --dump-callgraph (-dump-callgraph)
Dump information about the callgraph and register usage.

It should be obvious from that text this option is for controlling the device linker behaviour during compilation, and that none of this has anything to do with NVLINK, which is a hardware interconnect technology.

How to use the right flag or options to include in the executable the combined power calls of 2 GPU with NVLink ?

There is no flag or option. There is no compiler assisted multi-gpu support. You have to write your own multi-gpu code, or use a library where someone wrote it for you. If such multi-gpu code is present in your executable, it will work without the need for any special compiler options or flags during compilation.