CUDA nvcc building chain of libraries

My goal is: library2.so is using library1.so and mycode.o is using (libs should be linked) library2.so (and maybe library1.so).

The source code is (one line header files are omitted):

library1.cu:

__device__ void func1_lib1(void){}

library2.cu:

#include "library1.h"
__global__ void func1_lib2(void)
{
    func1_lib1();
}
extern "C"
void func2_lib2(void)
{
    func1_lib2<<<1,1>>>();
}

mycode.c:

#include "library2.h"
int main(void)
{
    func2_lib2();
}

I'm building the shared libraries according to with Makefile

broken:
    rm -f *.o *.so
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library1.cu
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dlink -o cuda-lib1.o library1.o
    gcc  -shared -Wl,-soname,library1.so -o library1.so library1.o uda-lib1.o
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library2.cu
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dlink -o cuda-lib2.o library2.o -lrary1
    gcc  -shared -Wl,-soname,library2.so -o library2.so library2.o cuda-lib2.o
    gcc  -c mycode.c
    gcc  -o mycode -L. -lrary2 -lrary1 mycode.o

working:
    rm -f *.o *.so
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library1.cu
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library2.cu
    nvcc -arch=sm_30 --compiler-options '-fPIC' -dlink -o cuda-lib.o library1.o library2.o
    gcc  -shared -Wl,-soname,library.so -o library.so library1.o library2.o cuda-lib.o
    gcc  -c -fPIC mycode.c                                                      
    gcc  -o mycode -L. -lrary  -L/usr/local/cuda/lib64 -lcuda -lcudart mycode.o

make working works without any problems. But it doesn't make a chain of libraries. library1.cu and library2.cu are in the same .so file.

make broken fails with

nvcc -arch=sm_30 --compiler-options '-fPIC' -dlink -o cuda-lib2.o library2.o -lrary1
nvlink error   : Undefined reference to '_Z10func1_lib1v' in 'library2.o'

If I inspect library1.so by nm there is inside a target (T) _Z10func1_lib1v.

Solution

In your "broken" approach, you are attempting to create a library1.so (a shared library) which contains only a __device__ function:

__device__ void func1_lib1(void){}

Any other object that wished to use that __device__ function must make use of relocatable device code/separate compilation and linking, which of course you are trying to do.

However, with respect to libraries, device linking only supports functions contained in static libraries. Note these statements from the nvcc manual:

The device linker has the ability to read the static host library formats (.a on Linux and Mac OS X, .lib on Windows). It ignores any dynamic (.so or .dll) libraries.

and:

Note that only static libraries are supported by the device linker.

So your general strategy won't work. A possible workaround would be to place the library1.cu code in a static library:

rm -f *.o *.so
nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library1.cu
nvcc -arch=sm_30 --lib -o cuda-lib1.a library1.o
nvcc -arch=sm_30 --compiler-options '-fPIC' -dc library2.cu
nvcc -arch=sm_30 --compiler-options '-fPIC' -dlink -o cuda-lib2.o library2.o cuda-lib1.a
gcc  -shared -Wl,-soname,library2.so -o library2.so -L/usr/local/cuda/lib64 -lcuda -lcudart library2.o cuda-lib2.o cuda-lib1.a
gcc  -c mycode.c
gcc  -o mycode -L. -lrary2  mycode.o

Or else to create a sequence of .so libraries that don't require device linking across library boundaries, which is more-or-less demonstrated in your "working" case.