Search code examples
python-3.xdataframeapache-sparkpysparkaws-glue

How add double quotes to all columns in my dataframe and save into csv


i need a help to do something related dataframes

I need save a csv file, where all the columns containe double quotes at the beginnign and at the end of the value.

this dataframe is created after read a set of parquet files, something like :

emp = [(1,"Smith",-1,"2018","10","M",3000), \
    (2,"Rose",1,"2010","20","M",4000), \
    (3,"Williams",1,"2010","10","M",1000), \
    (4,"Jones",2,"2005","10","F",2000), \
    (5,"Brown",2,"2010","40","",-1), \
    (6,"lara",2,"2010","30","",-1), \
    (7,"mario",2,"2010","10","",-1), \
    (8,"bruno",2,"2010","40","",-1), \
    (9,"luis",2,"2010","20","",-1) \
  ]


empDF = spark.createDataFrame(data=emp)
empDF.show()


empDF.coalesce(1).write.format('csv').option('quote', '').option('header','true').option("delimiter","|").save(path_destination,mode='overwrite')

the result must be something like:

_1|_2|_3|_4|_5|_6|_7
"1"|"Smith"|"-1"|"2018"|"10"|"M"|"3000"
"2"|"Rose"|"1"|"2010"|"20"|"M"|"4000"
"3"|"Williams"|"1"|"2010"|"10"|"M"|"1000"
...
...
...

I'm using option('quote', '') , but there is no way to save the csv file as i want.

Can somebody help me ?


Solution

  • Use option('quoteAll', 'true').

    Option Guide:

    https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option