I am using the file source in Spark Structures Streaming and want to delete the files after I process them.
I am reading in a directory filled with JSON files (1.json
, 2.json
, etc) and then writing them as Parquet files. I want to remove each file after it successfully processes it.
The documentation points to usage of cleanSource.
cleanSource: option to clean up completed files after processing.
Available options are "archive", "delete", "off". If the option is not provided, the default value is "off".
Refer: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources