I would like to build and test my application within a CICD process. In order to reduce costs, I want to run the build and test as two separate jobs on two different machines. The build should only have a CPU, and the test will have a GPU as well. The base image is from nvidia/cuda
. The problem is that the cuda libraries seem to only be available when using the nvidia docker runtime. I get the following warnings and errors when running without nvidia runtime:
/usr/bin/ld: warning: libcuda.so.1, needed by mylib.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnvidia-ml.so.1, needed by mylib.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: mylib.so: undefined reference to `nvmlInit_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuGetErrorString'
/usr/bin/ld: mylib.so: undefined reference to `nvmlSystemGetDriverVersion'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetArchitecture'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetCount_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxPushCurrent_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuStreamSynchronize'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxPopCurrent_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxCreate_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuDeviceGetCount'
/usr/bin/ld: mylib.so: undefined reference to `cuMemcpy2DAsync_v2'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxGetCurrent'
I am able to successfully build this library on a gpu machine with the same docker container by adding --gpus all
to docker run command, and on that same gpu machine removing it gets the same errors above.
Is there a way to have libcuda
, libnvidia-ml
, etc. be available inside the docker container when the host does not have access to a GPU. Again I don't want to actually run the application on the build machine, it will be copied to another machine that does have a GPU for testing.
Turns out the cuda docker image has a set of stub libraries for this exact purpose. They are in /usr/local/cuda/lib64/stubs
. I was unable to get cmake to find them, so a symlink did the trick.