Search code examples
apache-sparkspark-structured-streaming

Delete files after processing with Spark Structured Streaming


I am using the file source in Spark Structures Streaming and want to delete the files after I process them.

I am reading in a directory filled with JSON files (1.json, 2.json, etc) and then writing them as Parquet files. I want to remove each file after it successfully processes it.


Solution

  • The documentation points to usage of cleanSource.

    cleanSource: option to clean up completed files after processing.
    Available options are "archive", "delete", "off". If the option is not provided, the default value is "off".
    

    Refer: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources