Search code examples
c++tensorflowubuntucudanvcc

Adding custom TensorFlow OP


I am trying to use the Tensorflow implementation of compact bilinear pooling by ronghanghu since it's used in the implementation of the "Learning Rich Features for Image Manipulation Detection" paper. ronghanghu uses TensorFlow version 1.12.0 with CUDA 8.0 and g++ 5.4.0 to build sequential_batch_fft.so. However, they do say we can rebuild the sequential_batch_fft.so using a different version of Tensorflow (in my case 2.4.0) with a different compiler (g++ 7.5.0) and a different CUDA version (11.0). When I try to build sequential_batch_fft.so using the commands in compile.sh below

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

# Use 0 if the TensorFlow binary is built with GCC 4.x
# see https://docs.computecanada.ca/wiki/GCC_C%2B%2B_Dual_ABI for details
USE_CXX11_ABI=0

nvcc -std=c++11 -c -o sequential_batch_fft_kernel.cu.o \
sequential_batch_fft_kernel.cu.cc \
-D_GLIBCXX_USE_CXX11_ABI=$USE_CXX11_ABI -DNDEBUG \
-L$TF_LIB -ltensorflow_framework \
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

g++ -std=c++11 -shared -o ./build/sequential_batch_fft.so \
sequential_batch_fft_kernel.cu.o \
sequential_batch_fft.cc \
-D_GLIBCXX_USE_CXX11_ABI=$USE_CXX11_ABI -DNDEBUG \
-L$TF_LIB -ltensorflow_framework \
-I $TF_INC -fPIC \
-lcudart -lcufft -L/usr/local/cuda/lib64

rm -rf sequential_batch_fft_kernel.cu.o

the only output I get in the terminal is this. compile.sh output

The problem is nothing else happens beyond that. No errors reported and it doesn't end the build. Left it for hours and still nothing. I am completely clueless as to why this is. I later decided to try doing one of the examples provided by TensorFlow on adding_an_op and I got the same result. What could be the problem here? It's really confusing for me because there are no errors just a never-ending program.


Solution

  • Turns out the problem was with the tensorflow_framework.so. Was able to fix this by creating a symbolic link ln -s libtensorflow_framework.so.2 libtensorflow_framework.so in ../site-packages/tensorflow. Also, I think my project path was not suitable as it contains spaces. Not so sure about this one but when I used TensorFlow installed on a path with no spaces everything worked fine.