Search code examples
csvpyspark

Write to CSV and read it back to dataframe


I need to read rather big data from MySQL, write it to file for optimized further work - and then work with this file.

But Spark creates not one file - but the whole folder. I can figure out the exact name of the file - but may be spark has proper way to get df from file that was just written?


Solution

  • Spark will always create a folder but you can force it to push data to one file by using coalesce(). This way data will be repartitioned:

    df.coalesce(1).write.csv("file_name")