I have a dataframe which contains double quote (") and comma in value. I am trying to export the dataframe into csv but unfortunately double quote is not displayed properly in exported csv.
I managed to handle all other special characters by setting "quoteAll" to true.
In csv, if you replace the single double quote with two double quotes, it works fine. But when I export to csv with below code, it replaces " with " in exported csv.
%scala
import org.apache.spark.sql.functions.regexp_replace
val df = Seq((1, "A,B,C,\"DEF\""), (2, "DEF")).toDF("ID", "Val")
val updatedDf = df.columns.foldLeft(df)((acc, colname) => acc.withColumn(colname,regexp_replace(acc(s"`$colname`"), "\"", "\"\"")))
deltaDS.coalesce(1).write
.option("header", true)
.option("encoding", "utf-8")
.option("quoteAll", true)
.mode("Overwrite").csv("[Location to store csv]")
Output:
Expected Output:
How can I handle this ?
You haven't provided exact plaintext csv you expect, so it's hard to guess. Are you looking for this?
val df = Seq((1, "A,B,C,\"DEF\""), (2, "DEF")).toDF("ID", "Val")
df.coalesce(1).write
.option("header", true)
.option("encoding", "utf-8")
.option("quoteAll", true)
.option("escape", "\"") // escapes quotes inside quote using "
.mode("Overwrite").csv("xxx")
Resulting csv:
"ID","Val"
"1","A,B,C,""DEF"""
"2","DEF"