Search code examples
pythoncudacumlwsl2

CUDA_ERROR_NO_DEVICE in WSL2


I am trying to run cuML in WSL2 in Windows.

  • Ubuntu 22.04
  • amd64
  • NVIDIA RTX A2000 8GB Laptop GPU
  • CUDA 12.6

main.py

import logging

import cudf
from cuml.ensemble import RandomForestClassifier
from cuml.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split

logger = logging.getLogger(__name__)


def main() -> None:
    # Load the iris dataset
    iris = load_iris()
    x = iris.data
    y = iris.target

    # Split the data
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.2, random_state=42
    )

    # Convert to cuDF DataFrames
    X_train_cudf = cudf.DataFrame(x_train)
    X_test_cudf = cudf.DataFrame(x_test)
    y_train_cudf = cudf.Series(y_train)
    y_test_cudf = cudf.Series(y_test)

    # Scale the features
    scaler = StandardScaler()
    x_train_scaled = scaler.fit_transform(X_train_cudf)
    x_test_scaled = scaler.transform(X_test_cudf)

    # Create and train the model
    rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
    rf_classifier.fit(x_train_scaled, y_train_cudf)

    # Make predictions
    y_pred_cudf = rf_classifier.predict(x_test_scaled)

    # Convert predictions back to CPU for evaluation
    y_pred = y_pred_cudf.values_host
    y_test = y_test_cudf.values_host

    # Print results
    logger.info("cuML Results:")
    logger.info(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
    logger.info("\nClassification Report:")
    logger.info(classification_report(y_test, y_pred, target_names=iris.target_names))


if __name__ == "__main__":
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s - %(levelname)s - %(message)s",
    )
    main()

pyproject.toml

[project]
name = "hm-cuml"
version = "1.0.0"
requires-python = "~=3.12.0"
dependencies = [
  "cudf-cu12==24.12.0",
  "cuml-cu12==24.12.0",
  "scikit-learn==1.6.1",
]

Currently, uv run main.py gives error:

hm-cuml/.venv/lib/python3.12/site-packages/cudf/utils/_ptxcompiler.py:64: UserWarning: Error getting driver and runtime versions:

stdout:

stderr:

Traceback (most recent call last):
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 254, in ensure_initialized
    self.cuInit(0)
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 304, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 372, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 269, in __getattr__
    self.ensure_initialized()
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
    raise CudaSupportError(f"Error at driver init: {description}")
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)


Not patching Numba
  warnings.warn(msg, UserWarning)
Traceback (most recent call last):
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 254, in ensure_initialized
    self.cuInit(0)
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 304, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 372, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "hm-cuml/src/main.py", line 58, in <module>
    main()
  File "hm-cuml/src/main.py", line 32, in main
    x_train_scaled = scaler.fit_transform(X_train_cudf)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/_thirdparty/sklearn/utils/skl_dependencies.py", line 162, in fit_transform
    return self.fit(X, **fit_params).transform(X)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/_thirdparty/sklearn/preprocessing/_data.py", line 678, in fit
    return self.partial_fit(X, y)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/_thirdparty/sklearn/preprocessing/_data.py", line 707, in partial_fit
    X = self._validate_data(X, accept_sparse=('csr', 'csc'),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/_thirdparty/sklearn/utils/skl_dependencies.py", line 111, in _validate_data
    X = check_array(X, **check_params)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/thirdparty_adapters/adapters.py", line 322, in check_array
    X, n_rows, n_cols, dtype = input_to_cupy_array(
                               ^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cuml/internals/input_utils.py", line 465, in input_to_cupy_array
    X = X.values
        ^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/frame.py", line 420, in values
    return self.to_cupy()
           ^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/frame.py", line 542, in to_cupy
    return self._to_array(
           ^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/utils/performance_tracking.py", line 51, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/frame.py", line 507, in _to_array
    matrix[:, i] = to_array(col, dtype)
                   ^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/frame.py", line 471, in to_array
    array = get_array(col)
            ^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/frame.py", line 543, in <lambda>
    lambda col: col.values,
                ^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/column/column.py", line 233, in values
    return cupy.asarray(self.data_array_view(mode="write"))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/cudf/core/column/column.py", line 135, in data_array_view
    return cuda.as_cuda_array(obj).view(self.dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/api.py", line 76, in as_cuda_array
    return from_cuda_array_interface(obj.__cuda_array_interface__,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/devices.py", line 231, in _require_cuda_context
    with _runtime.ensure_context():
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hongbo-miao/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/devices.py", line 121, in ensure_context
    with driver.get_active_context():
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 472, in __enter__
    driver.cuCtxGetCurrent(byref(hctx))
    ^^^^^^^^^^^^^^^^^^^^^^
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 269, in __getattr__
    self.ensure_initialized()
  File "hm-cuml/.venv/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
    raise CudaSupportError(f"Error at driver init: {description}")
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)

I saw a lot of people meeting same issue here: https://forums.developer.nvidia.com/t/installation-on-wsl2-windows-11-problem-cant-see-gpu/237895/9 but no solution.

Any guide would be appreciate!


Solution

  • Correct Solution - Uninstall Incorrect Driver

    Graham Markall from NVIDIA at GitHub pointed out export LD_LIBRARY_PATH=/usr/lib/wsl/lib is a workaround solution. And thanks!

    In my case, in WSL2,

    from numba import cuda;
    cuda.cudadrv.libs.test()
    

    returns

    Finding driver from candidates:
            libcuda.so
            libcuda.so.1
            /usr/lib/libcuda.so
            /usr/lib/libcuda.so.1
            /usr/lib64/libcuda.so
            /usr/lib64/libcuda.so.1
    Using loader <class 'ctypes.CDLL'>
            Trying to load driver...        ok
                    Loaded from libcuda.so
            Mapped libcuda.so paths:
                    /usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01
    Finding nvvm from System
            Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
            Trying to open library...       ok
    Finding nvrtc from System
            Located at /usr/local/cuda/lib64/libnvrtc.so.12.6.85
            Trying to open library...       ok
    Finding cudart from System
            Located at /usr/local/cuda/lib64/libcudart.so.12.6.77
            Trying to open library...       ok
    Finding cudadevrt from System
            Located at /usr/local/cuda/lib64/libcudadevrt.a
            Checking library...     ok
    Finding libdevice from System
            Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
            Checking library...     ok
    

    Inside /usr/lib/x86_64-linux-gnu/libcuda.so.535.183.01 means Linux x64 (AMD64/EM64T) Display Driver got installed, however it should not be installed in WSL2.

    To uninstall Linux x64 (AMD64/EM64T) Display Driver, I made two attempts. The method you should use depends on how the driver was originally installed.

    Attempt 1

    I downloaded Linux x64 (AMD64/EM64T) Display Driver at https://www.nvidia.com/en-us/drivers/details/226764/ and ran

    sudo sh ./NVIDIA-Linux-x86_64-xxx.xx.run --uninstall
    

    It shows

    There is no NVIDIA driver currently installed.

    which means I didn't install Linux x64 (AMD64/EM64T) Display Driver through this installer way in the past. So I may have installed it through Linux distribution way in the past.

    Attempt 2

    I tried to uninstall by

    sudo apt-get --purge remove "*nvidia*"
    

    which removed these files:

    ...
    The following packages will be REMOVED:
      libcuinj64-11.5* libnvidia-compute-495* libnvidia-compute-510* libnvidia-compute-535* libnvidia-ml-dev*
      nvidia-cuda-dev* nvidia-cuda-gdb* nvidia-cuda-toolkit* nvidia-cuda-toolkit-doc* nvidia-opencl-dev*
      nvidia-profiler* nvidia-visual-profiler*
    0 upgraded, 0 newly installed, 12 to remove and 219 not upgraded.
    

    Now I can confirm I can use cuML directly without export LD_LIBRARY_PATH=/usr/lib/wsl/lib workaround way! ☺️

    Old Workaround Solution

    I found in WSL2, simply set environment variable export LD_LIBRARY_PATH=/usr/lib/wsl/lib before I run Python code helps.