I need to read rather big data from MySQL, write it to file for optimized further work - and then work with this file.
But Spark creates not one file - but the whole folder. I can figure out the exact name of the file - but may be spark has proper way to get df from file that was just written?
Spark will always create a folder but you can force it to push data to one file by using coalesce()
. This way data will be repartitioned:
df.coalesce(1).write.csv("file_name")