I'm developing a custom OP for tensorflow that needs GPU support, following the guide in the tensorflow documentation. While tracing the errors in my own code I went back to the example from the documentation and tried to compile the referenced code example:
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
__global__ void AddOneKernel(const int* in, const int N, int* out) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
out[i] = in[i] + 1;
}
}
void AddOneKernelLauncher(const int* in, const int N, int* out) {
AddOneKernel<<<32, 256>>>(in, N, out);
}
#endif
using the command suggested in the docs:
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
with $TF_INC
properly replaced by the tensorflow include path. Unfortunaly this yields a lot of errors:
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1294): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1300): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1306): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1312): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1318): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1324): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1330): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1336): error: expression must have arithmetic, unscoped enum, or pointer type
and many more like these.
I found that this might be related to an unsupported nvcc / gcc / os combination. I did not setup the machine by myself (and actually do not have sudo rights). I have nvcc version 7.5.17, gcc version 4.9.3 on Ubuntu 16.04.2. Ubuntu 16.04.2 is NOT listed in the supported systems for CUDA 7.5. This might be an issue but I found many people claiming that it works on 16.04. In addition I successfully compiled Tensorflow with GPU support on this machine..
Further, these errors are related to the Tensor #include in the code and the code compiles successfully without this line. I haven't tried if the demo OP works without this include, but my own OP failed with
2017-06-01 09:36:14.679685: E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2017-06-01 09:36:14.679777: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Two questions:
Ok, for those who come across the same problem: You can set the host compiler for nvcc
using the -ccbin
argument, as pointed out in this answer. Just set it to gcc-4.9
.