Search code examples
pytorchgoogle-colaboratory

Cuda version issue while using Detectron2 in Google Colab


I am trying to run the Detectron2 module on Colab using CUDA version 10.0 but since today there have been some issues regarding the versions of Cuda Compiler.

The output I get after running !nvidia-smi is :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And what I get after running !nvcc --version is :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

I am not able to understand the reason for the mismatch. Also the output from detectron after running !python -m detectron2.utils.collect_env is :

----------------------  ----------------------------------------------------------------------------
sys.platform            linux
Python                  3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
numpy                   1.18.5
detectron2              0.1.3 @/content/gdrive/My Drive/Data/Table_Struct/detectron2_repo/detectron2
Compiler                GCC 7.5
CUDA compiler           CUDA 10.1
detectron2 arch flags   sm_60
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.4.0+cu100 @/usr/local/lib/python3.6/dist-packages/torch
PyTorch debug build     False
GPU available           True
GPU 0                   Tesla K80
CUDA_HOME               /usr/local/cuda
Pillow                  7.0.0
torchvision             0.5.0+cu100 @/usr/local/lib/python3.6/dist-packages/torchvision
torchvision arch flags  sm_35, sm_50, sm_60, sm_70, sm_75
fvcore                  0.1.1
cv2                     4.1.2
----------------------  ----------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

My guess is that the version of CUDA on Colab doesn't match the Detectron2 I am using. IF so how can I change something to make this work on Google Colab.


Solution

  • The problem was with the compiled Detectron2 Cuda runtime version and once I recompiled Detectron2 the error was solved.

    Here is the result from !python -m detectron2.utils.collect_env command:

    ----------------------  ----------------------------------------------------------------------------
    sys.platform            linux
    Python                  3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
    numpy                   1.18.5
    detectron2              0.1.3 @/content/gdrive/My Drive/Data/Table_Struct/detectron2_repo/detectron2
    Compiler                GCC 7.5
    CUDA compiler           CUDA 10.0
    detectron2 arch flags   sm_75
    DETECTRON2_ENV_MODULE   <not set>
    PyTorch                 1.4.0+cu100 @/usr/local/lib/python3.6/dist-packages/torch
    PyTorch debug build     False
    GPU available           True
    GPU 0                   Tesla T4
    CUDA_HOME               /usr/local/cuda
    Pillow                  7.0.0
    torchvision             0.5.0+cu100 @/usr/local/lib/python3.6/dist-packages/torchvision
    torchvision arch flags  sm_35, sm_50, sm_60, sm_70, sm_75
    fvcore                  0.1.1
    cv2                     4.1.2
    ----------------------  ----------------------------------------------------------------------------
    PyTorch built with:
      - GCC 7.3
      - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CUDA Runtime 10.0
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
      - CuDNN 7.6.3
      - Magma 2.5.1
      - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,