Search code examples
pythonapache-sparkpyspark

Environment Variable Error when running Python/PySpark script


Is there an easy way to fix this error:

Missing Python executable 'python3', defaulting to 'C:\Users\user1\Anaconda3\Lib\site-packages\pyspark\bin\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.

Would I have to modify the PATH system variable? Or export/create the environment variables PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON? I have Python 3.8.8.


Solution

  • you need to add an environment variable called SPARK_HOME: this variable contain the path to the installed pyspark library .

    In my case , pyspark is installed under my home directory, so this is the content of the variable :

    SPARK_HOME=/home/zied/.local/lib/python3.8/site-packages/pyspark
    

    also you need another variable called PYSPARK_PYTHON which have the python version you are using like this :

    PYSPARK_PYTHON=python3.8
    

    EDIT: for windows use

    PYSPARK_PYTHON=python