Search code examples
apache-sparkpysparknvidia

'Could not load cudf jni library' when trying to run pyspark with GPU support in Windows 10


I am trying to run pyspark on windows 10 with GPU support but I am stuck with the error that the cudf jni libraries could not be loaded. I am running pyspark with the following command:

pyspark --jars "${SPARK_HOME}/jars/rapids-4-spark_2.12-23.12.2.jar,${SPARK_HOME}/jars/cudf-23.12.1" --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.incompatibleOps.enabled=true

When I run the above command, I get the following error:

Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/02/08 15:55:25 WARN RapidsPluginUtils: RAPIDS Accelerator 23.12.2 using cudf 23.12.1.
24/02/08 15:55:25 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.
24/02/08 15:55:25 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
24/02/08 15:55:25 ERROR NativeDepsLoader: Could not load cudf jni library...
java.io.IOException: Error loading dependencies

The application then prints a stacktrace and exits.

If I run pyspark without any parameters, I get a pyspark prompt without any issues. I have also been able to run spark-submit with a python file and it executed without errors.

I am using rapids-4-spark_2.12-23.12.2.jar and cudf-23.12.1.jar in the jars directory of my spark installation.

Running nvidia-smi indicates that I am using:

  • NVIDIA GeForce RTX 3090
  • Driver Version: 551.23
  • CUDA Version: 12.4

One possible issue is that I have seen some references to a "gpu discovery script" but I can't find any information about what it looks like and where I can download it!


Solution

  • A list of supported hardware and Linux distributions for Spark-RAPIDS is available here: https://nvidia.github.io/spark-rapids/docs/download.html

    Currently the supported OSes include Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8. Windows is not supported by Spark-RAPIDS.

    However, RAPIDS is supported under Windows Subsystem for Linux 2 (WSL2). Some users have reported success with using Spark-RAPIDS under WSL2 (not native Windows), though it is not officially supported to my knowledge.