Search code examples
pythondependenciesnumbacudf

cudf and numba version conflict


I have cudf, and numba installed. My *.py file itself does not rely on numba. Before I installed cudf related packages, my code worked fine. After I have cudf related packages installed, python3 -m cudf.pandas my_py_101.py leads to the following error:

[Actual outcome]

/usr/local/lib/python3.10/dist-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:

stdout:

stderr:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
    path = get_cudalib(lib)
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
    libdir = get_cuda_paths()[dir_type].info
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
    candidates = find_lib('nvvm', path)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
    return find_file(regex, libdir)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
    entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'


Not patching Numba
  warnings.warn(msg, UserWarning)
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 10, in <module>
    validate_setup()
  File "/usr/local/lib/python3.10/dist-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
    cuda_runtime_version = runtimeGetVersion()
  File "/usr/local/lib/python3.10/dist-packages/rmm/_cuda/gpu.py", line 88, in runtimeGetVersion
    major, minor = numba.cuda.runtime.get_version()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
    path = get_cudalib(lib)
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
    libdir = get_cuda_paths()[dir_type].info
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
  File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
    candidates = find_lib('nvvm', path)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
    return find_file(regex, libdir)
  File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
    entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'

[What I did]

My docker environment Dockerfile is built as follow:

FROM ubuntu:22.04
FROM nvidia/cuda:12.0.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y wget && apt-get install curl -y && apt-get install unzip && apt-get install python3-pip -y
ENV PATH=$PATH:~/.local/bin:~/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
RUN pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==23.12.* dask-cudf-cu12==23.12.* cuml-cu12==23.12.* cugraph-cu12==23.12.*
RUN pip install numpy==1.24.3 pandas==1.5.3 Cython==3.0.6 scikit-learn==1.3.2 swifter==1.3.4 requests==2.28.2 numba==0.57.1 scikit-learn-intelex==2024.0.1
RUN pip install torch torchvision torchaudio
  1. the error seems to relate to numba package. I checked the depency page and find cudf relies on numba>=0.57,numba<0.58 where I have numba==0.57.1. Note that I don't have any numba related code in my script.
  2. cudf requires cuda 12.0 while I'm using cuda 12.0.1, which is the closest version.

The yaml file to start the docker is this:

apiVersion: batch/v1
kind: Job
metadata:
        name: test-cuda
        namespace: tom # job and pvc should be in the same namespace
spec:
        template:
                metadata:
                        labels:
                                app: test-cuda
                spec:
                        containers:
                        - name: test-cuda
                          image: <my_url>/tom/valid:cudf
                          command: ["bash", "-c", "tail /proc/cpuinfo -n 28 &>> job.log; python3 -m cudf.pandas my_py_101.py &>> job.log; echo 'test my_py & GPU' &>> job.log; mkdir result_my_py_20231229 ; mv job.log result_my_py_20231229/ ; tar -cjf result_my_py_20231229.bz2 result_my_py_20231229/ ; ls *.bz2; pwd ; aws s3 cp --endpoint http://<my_url> /result_my_py_20231229.bz2  s3://mybucket01/"]
                          resources:
                                requests:
                                        cpu: 9
                                        memory: 128Gi
                                limits:
                                        cpu: 12
                                        memory: 256Gi
                          imagePullPolicy: IfNotPresent #Always
                        restartPolicy: Never

How can I fix it?


Solution

  • I have experienced this issue before as a cuDF developer. I think you can fix this by changing one line in your Dockerfile. Try making your Docker image from the "devel" flavor of the CUDA containers:

    FROM nvidia/cuda:12.0.1-devel-ubuntu22.04
    

    When you import cudf, it imports numba as a dependency. However, numba fails at import time because it only finds part of its CUDA Toolkit requirements. The runtime CUDA images are fairly minimal and don't have some of the NVVM pieces that Numba needs.

    Background: The cuDF library supports user-defined functions (UDFs) for features like df.apply. To execute user-defined Python code on the GPU, cuDF calls Numba to perform just-in-time (JIT) CUDA compilation. Numba requires some pieces of the CUDA Toolkit to do this, including NVVM. The CUDA Toolkit that comes with the nvidia/cuda "runtime" image does not include all the pieces that are needed, because NVVM and related tools that Numba needs are considered to be compilers. The goal of the "runtime" images is to have a minimal size that can run pre-built CUDA code, so compilers are excluded. The "devel" flavor does contain NVVM, and all other components needed to build CUDA code (which includes Numba's JIT functionality).