Search code examples
gpugoogle-colaboratorytensorflow2.0python-3.10mask-rcnn

TensorFlow 2.14.0 Fails to Detect GPU on Google Colab


I'm trying to use Mask-RCNN with a GPU on Google Colab, as I need a powerful GPU that my local machine lacks. I’ve even subscribed to Google Colab Pro for this purpose. I've implemented a repository using Mask-RCNN with TensorFlow 2.14.0 and Python 3.10.12, which should be compatible with Google Colab: https://github.com/z-mahmud22/Mask-RCNN_TF2.14.0

However, I'm facing an issue where TensorFlow 2.14.0 fails to detect the GPU in Google Colab.

I verified that the GPU is available using nvidia-smi, which shows CUDA version 12.2. Other GPU options also indicate CUDA 12.2.

enter image description here

enter image description here

When I run the following code to check the GPU availability in TensorFlow:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

The output is always:

Num GPUs Available: 0

When I run the training code, it is not faster than if I run it with CPU, so I confirm that the GPU is not working.

The setup details are:

  • Environment: Google Colab (Pro)

  • TensorFlow version: 2.14.0

  • CUDA version: 12.2 (as shown by nvidia-smi)

  • Python version: 3.10.12

I'm wondering if this issue is related to the compatibility of TensorFlow 2.14.0 with CUDA 12.2? If that’s the case, what would be a viable solution to successfully train Mask-RCNN in Google Colab?


Solution

  • I've provided a detailed answer in the issue you created at my repo. Here, I'm just highlighting the steps that would help you downgrade to CUDA 11.8 and afterward, you can directly use the repo on Google Colab. Please follow these steps:

    1. Try removing any existing CUDA installation via:

      !sudo apt-get --allow-change-held-packages --purge remove "*cublas*" "cuda*" "nsight*"
      
    2. Install CUDA 11.8 via:

      !wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
      !chmod +x cuda_11.8.0_520.61.05_linux.run
      !./cuda_11.8.0_520.61.05_linux.run --silent --toolkit
      
    3. Set environment variables to point to the new CUDA installation:

      import os
      os.environ['PATH'] = "/usr/local/cuda-11.8/bin:" + os.environ['PATH']
      os.environ['CUDA_HOME'] = "/usr/local/cuda-11.8"
      os.environ['LD_LIBRARY_PATH'] = "/usr/local/cuda-11.8/lib64:" + 
      os.environ.get('LD_LIBRARY_PATH', '')