Search code examples
pythonpandasapache-sparkpysparkrdd

pyspark- how to add a column to spark dataframe from a list


I'm looking for a way to add a new column in a Spark DF from a list. In pandas approach it is very easy to deal with it but in spark it seems to be relatively difficult. Please find an examp

#pandas approach
list_example = [1,3,5,7,8]
df['new_column'] = list_example

#spark ?

Could you please help to resolve this tackle (the easiest possible solution)?


Solution

  • You could try something like:

    import pyspark.sql.functions as F
    
    list_example = [1,3,5,7,8]
    new_df = df.withColumn("new_column",  F.array( [F.lit(x) for x in list_example] ))
    new_df.show()