I wrote a bunch of small CUDA programs. Most of them compile fine in debug and release builds. However, a few fail when compiled in Release mode because various GCC intrinsics are being given the wrong types of pointers. But I'm not actually using intrinsics. This program partially reproduces my problem:
#include <iostream>
#include <cuda_runtime.h> // To pacify the syntax highlighter
#include <immintrin.h> // NOTE: I don't ever include this header in my real code
__global__ void kernel() {
// Do nothing
}
using namespace std;
int main() {
kernel<<<1, 1>>>();
cout << "Hello, world!" << endl;
return 0;
}
The problem, however, is that in my actual code I do not include <immintrin.h>
or use GCC intrinsics of any kind. It's possible that some library code I use does, but I don't know for sure. If I remove <immintrin.h>
from this example, the program compiles and runs fine.
The actual offenders are here and here, if you want to see them.
I am using the following software:
nvcc
version 8.0.44gcc
version 5.4.1cmake
version 3.8.20170418The projects build and run perfectly fine in Debug mode, including the sample program above.
/usr/bin/g++-5 -std=c++11 -fopenmp -O3 -DNDEBUG -rdynamic CMakeFiles/DotProduct.dir/DotProduct_generated_main.cu.o CMakeFiles/DotProduct.dir/DotProduct_intermediate_link.o -o DotProduct -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt ../../Common/libCommon.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic -L/usr/lib/x86_64-linux-gnu -lSDL2 -lSDL2_ttf -lSDL2 -lGLEW -lGLU -lGL
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9533): error: argument of type "void *" is incompatible with parameter of type "long long *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h(9542): error: argument of type "void *" is incompatible with parameter of type "long long *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(54): error: argument of type "const void *" is incompatible with parameter of type "const long long *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(62): error: argument of type "const void *" is incompatible with parameter of type "const int *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(70): error: argument of type "const void *" is incompatible with parameter of type "const long long *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(78): error: argument of type "const void *" is incompatible with parameter of type "const int *" /usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h(86): error: argument of type "void *" is incompatible with parameter of type "const long long *"
<random>
-> <opt_random.h>
-> <x86intrin.h>
-> <immintrin.h>
-> (every other header mentioned in the error log). My new goal is now to enable all the usual optimizations except those which intrinsics.It turns out that this is likely an nvcc
bug stemming from CUDA's stated lack of support for my particular system configuration. I filed a report here (you need to be logged in to see it).
For now, I worked around it by not using anything that requires intrinsics. In my case I used Thrust's random number generators instead of the standard library's. Someone I talked to suggested that I could also separate my host and device code more carefully such that the source files processed by nvcc
don't ever include <immintrin.h>
. Haven't tried it but for those in the future who see this, it's worth a shot.