I am trying to compile CUDA with clang, but the code I am trying to compile depends on a specific nvcc flag (-default-stream per-thread
). How can I tell clang to pass the flag to nvcc?
For example, I can compile with nvcc and everythign works fine:
nvcc -default-stream per-thread *.cu -o app
But when I compile from clang, the program does not behave correctly because I can not pass the default-steam
flag:
clang++ --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 *.cu -o app -lcudart_static -ldl -lrt -pthread
How do I get clang to pass flags to nvcc?
It looks like it may not be possible.
nvcc behind the scenes calls either clang/gcc with some custom generated flags and then calls ptxas and some other stuff to create the binary.
e.g.
nvcc -default-stream per-thread foo.cu
# Behind the scenes
gcc -custom-nvcc-generated-flag -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -o foo.ptx
ptxas foo.ptx -o foo.cubin
When compiling to CUDA from clang, clang compiles directly to ptx and then calls ptxas:
clang++ foo.cu -o app -lcudart_static -ldl -lrt -pthread
# Behind the scenes
clang++ -triple nvptx64-nvidia-cuda foo.cu -o foo.ptx
ptxas foo.ptx -o foo.cubin
clang never actually calls nvcc. It just targets ptx and calls the ptx assembler.
Unless you know what custom backend flags will be produced by nvcc and manually include them when calling clang, I'm not sure you can automatically pass an nvcc flag from clang.