Search code examples
pythondataframeapache-sparkhadooppyspark

Unable to create Dataframe


I am trying to run a simple PySpark program to test.

Here is my code:

if __name__ == "__main__":    
spark = SparkSession.builder \    
    .appName("Welcome Spark") \    
    .master("local[2]") \    
    .getOrCreate()    

data_list = [("Aishwarya", 21),("Jhanavi", 19),("Maithree", 23),];

df = spark.createDataFrame(data_list).toDF("Name", "Age")    
df.show()

I am trying to add the list to a dataframe. I am getting an error while creating the data frame.

data_list = [("Aishwarya", 21),("Jhanavi", 19),("Maithree", 23),];  
df = spark.createDataFrame(data_list).toDF("Name", "Age") 
df.show()

Solution

  • You can try below two methods, both works for me:

    # option 1
    
    data_list = [("Aishwarya", 21),("Jhanavi", 19),("Maithree", 23),]
    
    new_dfdf = spark.createDataFrame(data_list).toDF("Name", "Age")
    new_dfdf.show(3)
    
    # option 2
    
    op_dfdf = spark.createDataFrame(data_list, ("Name", "Age"))
    op_dfdf.show(3)