Search code examples
apache-sparkapache-spark-sqlspark-structured-streaming

How to add new files to spark structured streaming dataframe


I am getting daily files in a folder in linux server, How should I add these to my spark structured streaming dataframe? (Delta Update)


Solution

  • Have you read the document?

    File source - Reads files written in a directory as a stream of data. Supported file formats are text, csv, json, parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations.

    https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources