Search code examples
apache-sparkspark-streamingparquetspark-structured-streaming

spark structured streaming parquet overwrite


i would like to be able to overwrite my output path with parquet format, but it's not among available actions (append, complete, update), Is there another solution here ?

val streamDF = sparkSession.readStream.schema(schema).option("header","true").parquet(rawData)

val query = streamDF.writeStream.outputMode("overwrite").format("parquet").option("checkpointLocation",checkpoint).start(target)
query.awaitTermination()

Solution

  • Apache Spark only support Append mode for File Sink. Check out here

    You need to write code to delete path/folder/files from file system before writing a data.

    Check out this stackoverflow link for ForeachWriter. This will help you to achieve your case.