How to save pyspark data frame in a single csv file

This is in continuation of this how to save dataframe into csv pyspark thread.

I'm trying to save my pyspark data frame df in my pyspark 3.0.1. So I wrote

df.coalesce(1).write.csv('mypath/df.csv)

But after executing this, I'm seeing a folder named df.csv in mypath which contains 4 following files

1._committed_..
2._started_...
3._Success  
4. part-00000-.. .csv

Can you suggest to me how do I save all data in df.csv?

Solution

You can use .coalesce(1) to save the file in just 1 csv partition, then rename this csv and move it to the desired folder.

Here is a function that does that:

df: Your df
fileName: Name you want to for the csv file
filePath: Folder where you want to save to

def export_csv(df, fileName, filePath):
  
  filePathDestTemp = filePath + ".dir/" 

  df\
    .coalesce(1)\
    .write\
    .csv(filePathDestTemp) # use .csv to save as csv

  listFiles = dbutils.fs.ls(filePathDestTemp)
  for subFiles in listFiles:
    if subFiles.name[-4:] == ".csv":
      
      dbutils.fs.cp (filePathDestTemp + subFiles.name,  filePath + fileName+ '.csv')

  dbutils.fs.rm(filePathDestTemp, recurse=True)