Search code examples
c++dockerbuildnvidiacicd

Separate GPU based application build and test with docker


I would like to build and test my application within a CICD process. In order to reduce costs, I want to run the build and test as two separate jobs on two different machines. The build should only have a CPU, and the test will have a GPU as well. The base image is from nvidia/cuda. The problem is that the cuda libraries seem to only be available when using the nvidia docker runtime. I get the following warnings and errors when running without nvidia runtime:

/usr/bin/ld: warning: libcuda.so.1, needed by mylib.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnvidia-ml.so.1, needed by mylib.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: mylib.so: undefined reference to `nvmlInit_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuGetErrorString'
/usr/bin/ld: mylib.so: undefined reference to `nvmlSystemGetDriverVersion'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetArchitecture'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetCount_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxPushCurrent_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuStreamSynchronize'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxPopCurrent_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxCreate_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuDeviceGetCount'
/usr/bin/ld: mylib.so: undefined reference to `cuMemcpy2DAsync_v2'
/usr/bin/ld: mylib.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
/usr/bin/ld: mylib.so: undefined reference to `cuCtxGetCurrent'

I am able to successfully build this library on a gpu machine with the same docker container by adding --gpus all to docker run command, and on that same gpu machine removing it gets the same errors above.

Is there a way to have libcuda, libnvidia-ml, etc. be available inside the docker container when the host does not have access to a GPU. Again I don't want to actually run the application on the build machine, it will be copied to another machine that does have a GPU for testing.


Solution

  • Turns out the cuda docker image has a set of stub libraries for this exact purpose. They are in /usr/local/cuda/lib64/stubs. I was unable to get cmake to find them, so a symlink did the trick.