I have been struggling with downloading all the necessary drivers required for tesnorflow-gpu library. I want to compile my model using gpu instead of cpu. I am using Linux Mint. This is my neofetch
...-:::::-...
.-MMMMMMMMMMMMMMM-. -----------
.-MMMM`..-:::::::-..`MMMM-. OS: Linux Mint 21.3 x86_64
.:MMMM.:MMMMMMMMMMMMMMM:.MMMM:. Kernel: 5.15.0-101-generic
-MMM-M---MMMMMMMMMMMMMMMMMMM.MMM- Uptime: 2 hours, 33 mins
`:MMM:MM` :MMMM:....::-...-MMMM:MMM:` Packages: 3307 (dpkg), 13 (flatpak)
:MMM:MMM` :MM:` `` `` `:MMM:MMM: Shell: bash 5.1.16
.MMM.MMMM` :MM. -MM. .MM- `MMMM.MMM. Resolution: 1920x1080
:MMM:MMMM` :MM. -MM- .MM: `MMMM-MMM: DE: Cinnamon
:MMM:MMMM` :MM. -MM- .MM: `MMMM:MMM: WM: Mutter (Muffin)
:MMM:MMMM` :MM. -MM- .MM: `MMMM-MMM: WM Theme: WhiteSur-Dark (Sweet-Dark-v40)
.MMM.MMMM` :MM:--:MM:--:MM: `MMMM.MMM. Theme: Sweet-Dark-v40 [GTK2/3]
:MMM:MMM- `-MMMMMMMMMMMM-` -MMM-MMM: Icons: candy-icons [GTK2/3]
:MMM:MMM:` `:MMM:MMM: Terminal: gnome-terminal
.MMM.MMMM:--------------:MMMM.MMM. CPU: Intel i5-3570 (4) @ 3.800GHz
'-MMMM.-MMMMMMMMMMMMMMM-.MMMM-' GPU: NVIDIA GeForce GTX 1060 6GB
'.-MMMM``--:::::--``MMMM-.' Memory: 2070MiB / 7883MiB
'-MMMMMMMMMMMMM-'
``-:::::-``
And this is my nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1060 6GB Off | 00000000:01:00.0 On | N/A |
| 25% 40C P8 7W / 120W | 316MiB / 6144MiB | 11% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1142 G /usr/lib/xorg/Xorg 148MiB |
| 0 N/A N/A 1881 G cinnamon 45MiB |
| 0 N/A N/A 9746 G /app/extra/viber/Viber 27MiB |
| 0 N/A N/A 14935 G ...seed-version=20240322-165906.502000 90MiB |
+-----------------------------------------------------------------------------------------+
I also installed tensorrt, cuDNN and tensorflow-gpu. Most installations were made using pip. Here is my tensorrt version
import tensorrt
print(tensorrt.__version__)
8.6.1
assert tensorrt.Builder(tensorrt.Logger())
The error I am recieving is the following:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-25 12:49:24.151959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-25 12:49:24.939265: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-25 12:49:24.973806: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
I am unsure if the conflict is due to a mismatch in the versions or due to the path, but when I echo the path I get echo $LD_LIBRARY_PATH :/home/vuk/miniconda3/lib/python3.1/site-packages/tensorrt_libs
This has been bugging me for months now...
I tried installing and uninstalling the libraries multiple times, configuring the path, installing tensorflow gpu through docker but nothing has worked so far. The issue might be in the mismatch of libraries that I am using but I am unsure...
I managed to solve this by creating two bash scripts for my conda environment. Inside your Conda environment directory, navigate to the etc/conda/activate.d and etc/conda/deactivate.d directories. If these directories do not exist, you can create them. Then, create a script file (e.g., set_env_vars.sh) in both directories.
The first one is activate.d which goes like this:
#!/bin/sh
export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
The second one is deactivate.d which contains :
#!/bin/sh
unset NVIDIA_DIR
unset LD_LIBRARY_PATH
Then I added execute permissions to both with
chmod +x /path/to/your/conda/env/etc/conda/activate.d/set_env_vars.sh
chmod +x /path/to/your/conda/env/etc/conda/deactivate.d/set_env_vars.sh