Search code examples
c++ubuntucudanvcctensorflow

Compile custom tensorflow op for CUDA


I'm developing a custom OP for tensorflow that needs GPU support, following the guide in the tensorflow documentation. While tracing the errors in my own code I went back to the example from the documentation and tried to compile the referenced code example:

#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

__global__ void AddOneKernel(const int* in, const int N, int* out) {
  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
       i += blockDim.x * gridDim.x) {
    out[i] = in[i] + 1;
  }
}

void AddOneKernelLauncher(const int* in, const int N, int* out) {
  AddOneKernel<<<32, 256>>>(in, N, out);
}

#endif

using the command suggested in the docs:

nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

with $TF_INC properly replaced by the tensorflow include path. Unfortunaly this yields a lot of errors:

/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1294): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1300): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1306): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1312): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1318): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1324): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1330): error: expression must have arithmetic, unscoped enum, or pointer type
/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h(1336): error: expression must have arithmetic, unscoped enum, or pointer type

and many more like these.

I found that this might be related to an unsupported nvcc / gcc / os combination. I did not setup the machine by myself (and actually do not have sudo rights). I have nvcc version 7.5.17, gcc version 4.9.3 on Ubuntu 16.04.2. Ubuntu 16.04.2 is NOT listed in the supported systems for CUDA 7.5. This might be an issue but I found many people claiming that it works on 16.04. In addition I successfully compiled Tensorflow with GPU support on this machine..

Further, these errors are related to the Tensor #include in the code and the code compiles successfully without this line. I haven't tried if the demo OP works without this include, but my own OP failed with

2017-06-01 09:36:14.679685: E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2017-06-01 09:36:14.679777: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed

Two questions:

  1. Why do I need to include this Eigen Tensor header, although the demo OP actually does not use Eigen Tensor?
  2. Where do the errors come from and how to resolve them? Do you think this is related to an unsupported system configuration?

Solution

  • Ok, for those who come across the same problem: You can set the host compiler for nvcc using the -ccbin argument, as pointed out in this answer. Just set it to gcc-4.9.