Search code examples
dockertensorflowjetson-xavier

Building Tensorflow on Jetson Xavier fails to find CUDA


I'm trying to compile the tensorflow 2.3 C API for Xavier in a docker image. I'm using this as the base docker image which seems to have the correct version of CUDA installed, but the build fails with the following message:

ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
#9 51.98    File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
#9 51.98        _create_local_cuda_repository(<1 more arguments>)
#9 51.98    File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 955, in _create_local_cuda_repository
#9 51.98        _get_cuda_config(repository_ctx, <1 more arguments>)
#9 51.98    File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 657, in _get_cuda_config
#9 51.98        find_cuda_config(repository_ctx, <2 more arguments>)
#9 51.98    File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 635, in find_cuda_config
#9 51.98        _exec_find_cuda_config(<3 more arguments>)
#9 51.98    File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 629, in _exec_find_cuda_config
#9 51.98        execute(repository_ctx, <1 more arguments>)
#9 51.98    File "/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
#9 51.98        fail(<1 more arguments>)
#9 51.98 Repository command failed
#9 51.98 Could not find any libcudart.so.10* in any subdirectory:
#9 51.98         ''
#9 51.98         'lib64'
#9 51.98         'lib'
#9 51.98         'lib/*-linux-gnu'
#9 51.98         'lib/x64'
#9 51.98         'extras/CUPTI/*'
#9 51.98 of:
#9 51.98         '/usr/local/cuda-10.2'

Here are the relevant parts of my Dockerfile for reference:

FROM nvcr.io/nvidia/l4t-base:r32.5.0

# ... setup bazel etc

# Tensorflow
ENV TF_NEED_CUDA=1 \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    TF_CUDA_VERSION=10.2 \
    CUDA_TOOLKIT_PATH=/usr/local/cuda-10.2 \
    TF_CUDNN_VERSION=8 \
    CUDNN_INSTALL_PATH=/usr/local/cuda-10.2 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.2,7.5 \
    CC_OPT_FLAGS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 --copt=-mfpmath=both --config=cuda" \
    PYTHON_BIN_PATH="/usr/bin/python" \
    USE_DEFAULT_PYTHON_LIB_PATH=1 \
    TF_NEED_JEMALLOC=1 \
    TF_NEED_GCP=0 \
    TF_NEED_HDFS=0 \
    TF_ENABLE_XLA=0 \
    TF_NEED_OPENCL=0

RUN cd / && git clone https://github.com/tensorflow/tensorflow

# The bazel build in the next line fails
RUN cd /tensorflow && git checkout r2.3 && bazel build -c opt //tensorflow/tools/lib_package:libtensorflow

Am I missing some compile options, or do I have to do some extra steps to properly set up CUDA?


Solution

  • It seems that building Tensorflow 2.3 for 64 bit ARM with CUDA isn't possible. Tensorflow 2.3 needs CUDA 10.2, but the CUDA toolkit isn't supported on ARM until version 11 [1], and CUDA 11 isn't supported by Tensorflow until version 2.4 [1].

    [1] https://developer.nvidia.com/cuda-toolkit/arm

    [2] https://www.tensorflow.org/install/gpu