Search code examples
apache-sparkpysparkdatabricks

Trying to save pyspark dataframe with double quotes


I have a pyspark dataframe that looks like this

"col1"      "col2"      "col3"

"value1"    "value2"    value3

"value4"    "value5"    value6

Want to save it as csv file. so I tried the following option

df.write.format('csv').option('delimitor',',').option("quote",'').save(path)

it is working fine for the data element, but for headers it is not working.

the output looks like this

"""col1""","""col2""","""col3"""

"value1","value2",value3

"value4","value5",value6

The output should be look like this

"col1","col2","col3"

"value1","value2",value3

"value4","value5",value6

In the header part extra double quotes are added. The data part looks fine.

Any suggestion what am I missing here. Tried quoteAll but didn't worked out.


Solution

  • You have a typo in your code, it should be option('delimiter), not delimitor. Also you can make it easier on yourself by using the header option:

    df.write.format('csv').option('delimiter', ',').option('quote', '').option('header', 'true').save(path)
    

    When header is set to 'true', the first row of the output file will contain the column names. When set to 'false', the column names will not be included in the output file.