Search code examples
apache-sparkpyspark

How does one actually set maxFilesPerTrigger in Spark?


I want to limit the number of files per trigger (maxFilesPerTrigger) to adjust micro-batching for my streaming workflow. I read the documentation but I can't exactly figure out if it's possible to set in spar.conf.set(...) or is it only set using .readStream?

Can I set it through spark.conf.set when setting up a spark session>?


Solution

  • Here you have all properties you can set when creating the SparkSession. There is no such a property like maxFilesPerTrigger.

    maxFilesPerTrigger is a DataStreamReader option. No a Spark configuration property.