Search code examples
pythontensorflowubuntutensorrt

CUDA & TensorRT issue, I'd appreciate any insights


On Ubuntu 22.04, I followed all the directions to install CUDA, CUDNN, Tensorflow, and TensorRT. CUDA and CUDNN tests run successfully. However, Tensorflow test step fails post installation. When I enter the following in Python 3, “import tensorflow as tf” it says “TF-TRT Warning: Could not find TensorRT” (see exhibit 1 below for the detailed error message. When I type “hello = tf.constant(‘Hello, TensorFlow!’)” in Python3, it says “failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination”. See exhibit 2 below for detailed error message.

NVCC INFO: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

EXHIBIT 1:

import tensorflow as tf

Output:

2023-03-25 23:46:25.078082: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-03-25 23:46:25.101349: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-25 23:46:25.470624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

EXHIBIT 2:

hello = tf.constant(‘Hello, TensorFlow!’)

Output:

2023-03-25 23:46:34.600824: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2023-03-25 23:46:34.600845: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: aham-US-Desktop-Codex-R
2023-03-25 23:46:34.600848: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: aham-US-Desktop-Codex-R
2023-03-25 23:46:34.600902: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 510.108.3
2023-03-25 23:46:34.600912: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 530.30.2
2023-03-25 23:46:34.600915: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:312] kernel version 530.30.2 does not match DSO version 510.108.3 – cannot find working devices in this configuration 

I have meticulously followed all instructions from NVIDIA install docs. Also, checked many forums on any potential solution.


Solution

  • This has happened because you have a mixture of Ubuntu and Nvidia packages on your system. From the last line of your second log:

    kernel version 530.30.2 does not match DSO version 510.108.3 – cannot find working devices in this configuration 
    

    "510" is the latest Nvidia driver version supported on Ubuntu 22.04 which can work with CUDA. "530" is the Nvidia driver version you get when you use Nvidia's driver repository. You have parts of both drivers installed, and they are not compatible.

    Nvidia's repository is notorious for causing problems like this, so it is recommended to never use it. You will need to remove all Nvidia repository references and purge all the packages they installed. The easiest way to do this is usually to do a full re-installation of Ubuntu.

    After you have fully removed the Nvidia packages, re-install the nvidia-510 driver from "Additional Drivers" along with nvidia-cuda-toolkit and nvidia-cudnn apt packages from the official Ubuntu repositories.

    To get TensorRT working, download the tar installer (the first one in the list of downloads, called "TensorRT 8.6 EA for Linux x86_64 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7 and 11.8 TAR Package"), unpack it to /opt and then add it to your LD_LIBRARY_PATH, eg:

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/TensorRT-8.6.0.12/lib

    You don't need to install the tensorrt Python package at all. Tensorflow uses the C++ libraries directly. And anyway, the tensorrt package only works with CUDA 12, which is only available on Ubuntu 22.04 if you use Nvidia's repositories.

    If you run into a problem where cuDNN is too old, then you should again download the cuDNN TAR package unpack in /opt and add it to your LD_LIBRARY_PATH.

    If you run into a problem where Tensorflow cannot find libdevice, you need to set up a symlink:

    sudo mkdir -p /usr/local/cuda/nvvm
    sudo ln -s /usr/lib/nvidia-cuda-toolkit/libdevice/ /usr/local/cuda/nvvm/libdevice
    

    After performing these steps I was able to run Tensorflow on my RTX 2070S on Ubuntu 22.04. Unfortunately it now runs slower than the CPU version on my i7 6700. (This is probably because my model is very small.)