Search code examples
dataframestructpysparkwritetofile

How to write a file to csv when a column is 'struct' type?


I have an output spark Dataframe which needs to be written to CSV. A column in the Dataframe is 'struct' type and is not supported by csv. I am trying to convert it to string or convert to pandas DF but nothing works.

userRecs1=userRecs.withColumn("recommendations", explode(userRecs.recommendations))


#userRecs1.write.csv('/user-home/libraries/Sampled_data/datasets/rec_per_user.csv')

Expected result: Recommendations column as string type so that it can be split into two separate columns and write to csv.

Actual results: (recommendations column is struct type and cannot be written to csv)

 ID_CTE|  recommendations|
+-------+-----------------+
|3974081| [2229,0.8915096]|
|3974081| [2224,0.8593609]|
|3974081| [2295,0.8577902]|
|3974081|[2248,0.29922757]| 
|3974081|[2299,0.28952467]|

Solution

  • The following command will flatten your StructType into separate named columns:

    userRecs1 \
      .select('ID_CTE', 'recommendations.*') \
      .write.csv('/user-home/libraries/Sampled_data/datasets/rec_per_user.csv')