Search code examples
pythonapache-sparkpyspark

No module named 'pyspark.resource' when running pyspark command


I am trying to setup the Pyspark environment for the first time in my system. I followed all the instructions carefully while installing the Apache Spark. I am using Windows 11 system.

When I run the pyspark cmd, I got this error,

Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\shell.py", line 31, in <module>
    import pyspark
  File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\rdd.py", line 78, in <module>
    from pyspark.resource.requests import ExecutorResourceRequests, TaskResourceRequests
ModuleNotFoundError: No module named 'pyspark.resource'

These are all the environment variables I have set,

HADOOP_HOME = D:\SoftwareInstallations\hadoop-winutils\hadoop-3.3.5
PYTHONPATH = %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.9.7-src.zip
SPARK_HOME = D:\SoftwareInstallations\spark-3.5.1
 

I also tried installing the pyspark again using pip install pyspark, but I still face this issue.


Solution

  • I was able to resolve this issue finally. The problem seems to be with the SPARK_HOME environment variable.

    Intially the SPARK_HOME variable pointing to the spark folder

    SPARK_HOME = D:\SoftwareInstallations\spark-3.5.1
    

    After changing it to the pyspark directory in the site-packages folder, it worked as expected

    SPARK_HOME=C:\Users\my-user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyspark