Search code examples
pythonmachine-learningcudarapids

rapids cannot import cudf: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)


To install RAPIDS, i have already installed WSL2.

But i still got the following error when import cudf:

/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:

stdout:



stderr:

Traceback (most recent call last):
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
    self.cuInit(0)
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 331, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 399, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 296, in __getattr__
    self.ensure_initialized()
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 262, in ensure_initialized
    raise CudaSupportError(f"Error at driver init: {description}")
...


Not patching Numba
  warnings.warn(msg, UserWarning)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
CudaSupportError                          Traceback (most recent call last)
/mnt/d/learn-rapids/Untitled.ipynb Cell 4 line 1
----> 1 import cudf

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/__init__.py:26
     20 from cudf.api.extensions import (
     21     register_dataframe_accessor,
     22     register_index_accessor,
     23     register_series_accessor,
     24 )
     25 from cudf.api.types import dtype
---> 26 from cudf.core.algorithms import factorize
     27 from cudf.core.cut import cut
     28 from cudf.core.dataframe import DataFrame, from_dataframe, from_pandas, merge

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/algorithms.py:10
      8 from cudf.core.copy_types import BooleanMask
      9 from cudf.core.index import RangeIndex, as_index
---> 10 from cudf.core.indexed_frame import IndexedFrame
     11 from cudf.core.scalar import Scalar
     12 from cudf.options import get_option

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/indexed_frame.py:59
     57 from cudf.core.dtypes import ListDtype
...
    302 if USE_NV_BINDING:
    303     return self._cuda_python_wrap_fn(fname)

CudaSupportError: Error at driver init: 
Call to cuInit results in CUDA_ERROR_NO_DEVICE (100):

Tried the latest install line below:

conda create --solver=libmamba -n rapids-23.12 -c rapidsai-nightly -c conda-forge -c nvidia  \
    cudf=23.12 cuml=23.12 python=3.10 cuda-version=12.0 \
    jupyterlab
 NVIDIA-SMI 545.23.05              Driver Version: 545.84       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:01:00.0  On |                  Off |
| 30%   53C    P3              54W / 300W |   1783MiB / 49140MiB |     10%      Default |
|                                         |                      |                  N/A

Also that cudf has been in the conda env:

cudf                      23.12.00a       cuda12_py310_231028_g2a923dfff8_124    rapidsai-nightly
cuml                      23.12.00a       cuda12_py310_231028_gff635fc25_31    rapidsai-nightly

I also tried using numba-s in the wsl env, and found the following:

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

__Warning log__
Warning (cuda): CUDA device initialisation problem. Message:Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us

Seems like the CUDA is not initiated in wsl but when i run this command in windows prompt, it returns:

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
Found 1 CUDA devices
id 0     b'NVIDIA RTX A6000'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 1
                                    UUID: GPU-17e7be94-251e-a2d9-3924-d167c0e59a56
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
None
__Warning log__
Warning (cuda): Probing CUDA failed (device and driver present, runtime problem?)
(cuda) <class 'FileNotFoundError'>: Could not find module 'cudart.dll' (or one of its dependencies). Try using the full path with constructor syntax.

Solution

  • The problem has been solved. Do the following to register in the nano .bashrc Under the wsl instance:

    sudo nano .bashrc
    

    Insert the followings:

    export LD_LIBRARY_PATH="/usr/lib/wsl/lib/"  
    export NUMBA_CUDA_DRIVER="/usr/lib/wsl/lib/libcuda.so.1"
    

    And then:

    source .bashrc