I am trying to setup the Pyspark environment for the first time in my system. I followed all the instructions carefully while installing the Apache Spark. I am using Windows 11 system.
When I run the pyspark
cmd, I got this error,
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\shell.py", line 31, in <module>
import pyspark
File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\__init__.py", line 59, in <module>
from pyspark.rdd import RDD, RDDBarrier
File "D:\SoftwareInstallations\spark-3.5.1\python\pyspark\rdd.py", line 78, in <module>
from pyspark.resource.requests import ExecutorResourceRequests, TaskResourceRequests
ModuleNotFoundError: No module named 'pyspark.resource'
These are all the environment variables I have set,
HADOOP_HOME = D:\SoftwareInstallations\hadoop-winutils\hadoop-3.3.5
PYTHONPATH = %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.9.7-src.zip
SPARK_HOME = D:\SoftwareInstallations\spark-3.5.1
I also tried installing the pyspark again using pip install pyspark
, but I still face this issue.
I was able to resolve this issue finally. The problem seems to be with the SPARK_HOME
environment variable.
Intially the SPARK_HOME variable pointing to the spark folder
SPARK_HOME = D:\SoftwareInstallations\spark-3.5.1
After changing it to the pyspark directory in the site-packages
folder, it worked as expected
SPARK_HOME=C:\Users\my-user\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyspark