i need a help to do something related dataframes
I need save a csv file, where all the columns containe double quotes at the beginnign and at the end of the value.
this dataframe is created after read a set of parquet files, something like :
emp = [(1,"Smith",-1,"2018","10","M",3000), \
(2,"Rose",1,"2010","20","M",4000), \
(3,"Williams",1,"2010","10","M",1000), \
(4,"Jones",2,"2005","10","F",2000), \
(5,"Brown",2,"2010","40","",-1), \
(6,"lara",2,"2010","30","",-1), \
(7,"mario",2,"2010","10","",-1), \
(8,"bruno",2,"2010","40","",-1), \
(9,"luis",2,"2010","20","",-1) \
]
empDF = spark.createDataFrame(data=emp)
empDF.show()
empDF.coalesce(1).write.format('csv').option('quote', '').option('header','true').option("delimiter","|").save(path_destination,mode='overwrite')
the result must be something like:
_1|_2|_3|_4|_5|_6|_7
"1"|"Smith"|"-1"|"2018"|"10"|"M"|"3000"
"2"|"Rose"|"1"|"2010"|"20"|"M"|"4000"
"3"|"Williams"|"1"|"2010"|"10"|"M"|"1000"
...
...
...
I'm using option('quote', '') , but there is no way to save the csv file as i want.
Can somebody help me ?
Use option('quoteAll', 'true').
Option Guide:
https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option