Search code examples
pythonpysparkpycharmpy4jpyspark-transformer

Error from PySpark code to showdataFrame : py4j.protocol.Py4JJavaError


I was running this code to show a dataframe[df.show()]:


import os
import sys

from pyspark.sql import *
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

os.environ['PYSPARK_PYTHON']=sys.executable
os.environ['PYSPARK_DRIVER_PYTHON']=sys.executable

  spark=SparkSession.builder\
        .appName("Hello Spark")\
        .master("local[2]")\
        .getOrCreate()
    
def spark_practice():
    
  date_list = [("Ravi",28),
               ("David",45),
               ("Mani",27)]
    
  df=spark.createDataFrame(date_list).toDF("Name","Age")
  df.printSchema()
  df.show()

spark_practice()

However, I got the following error:

File "C:\Program Files\Hadoop\spark-3.5.1\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o46.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (Prince-PC executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)

I have tried to set the path variable PYSPARK_DRIVER_PYTHON to the latest version of Python, which is same as the one used in project, but it did not help.


Solution

  • Downgrading python from python==3.12.1 to python==3.11.8 should resolve this issue. Also, avoid importing everything from pyspark.sql, you only need :

    from pyspark.sql.session import SparkSession