i got an error about error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice in tensorflow object_detection api

Windows Version: Windows 10 Pro 21H2 19044.1706 GPU: rtx2070

import tensorflow as tf
import torch
print(torch.__version__) #1.10.1+cu113
print(torch.version.cuda) #11.3
print(tf.__version__) #2.9.1

and i run

python .\object_detection\builders\model_builder_tf2_test.py

i can get 'Ran 24 tests in 18.279s OK (skipped=1)' result;

But when I want to train my model, i use

feature_extractor {
   type: 'faster_rcnn_inception_resnet_v2_keras'
}

in my pipeline_config, and i run

python .\object_detection\model_main_tf2.py --logtostderr --pipeline_config_path=LOCATION_OF_MY_PIPECONFIG --model_dir=LOCATION_OF_MY_MODEL_DIR

And then i get the following error In my system environment variable , 'CUDA_DIR' is variable and can be accessed

Solution

I had the same problem and just fixed it. The library can't find the folder even if you set the "CUDA_DIR" because it's not using that variable or any other I tried. This post is helpful in understanding the issue. The only solution I was able to find is just copying the required files.

Steps for a quick fix:

Find where your CUDA nvvm is installed (for me it is "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6").
Find the working directory for your script (the environment or the directory you are running the script in).
Copy the entire nvvm folder into the working directory and your script should work.

This is not a great solution but until someone else posts a answer you can at least run your code.