This is in continuation of this how to save dataframe into csv pyspark thread.
I'm trying to save my pyspark data frame df in my pyspark 3.0.1. So I wrote
df.coalesce(1).write.csv('mypath/df.csv)
But after executing this, I'm seeing a folder named df.csv in mypath which contains 4 following files
1._committed_..
2._started_...
3._Success
4. part-00000-.. .csv
Can you suggest to me how do I save all data in df.csv
?
You can use .coalesce(1)
to save the file in just 1 csv partition, then rename this csv and move it to the desired folder.
Here is a function that does that:
df
: Your df
fileName
: Name you want to for the csv file
filePath
: Folder where you want to save to
def export_csv(df, fileName, filePath):
filePathDestTemp = filePath + ".dir/"
df\
.coalesce(1)\
.write\
.csv(filePathDestTemp) # use .csv to save as csv
listFiles = dbutils.fs.ls(filePathDestTemp)
for subFiles in listFiles:
if subFiles.name[-4:] == ".csv":
dbutils.fs.cp (filePathDestTemp + subFiles.name, filePath + fileName+ '.csv')
dbutils.fs.rm(filePathDestTemp, recurse=True)