Search code examples
apache-sparkspark-structured-streaming

Spark structured streaming asynchronous batch blocking


I’m using Apache Spark structured streaming for reading from Kafka. Sometimes my micro batches get processed in a greater time than specified, due to heavy writes IO operations. I was wondering if there’s an option of starting the next batch before the first one has finished, but make the second batch blocked by the first?

I mean that if the first one took 7 seconds and the batch is set for 5 seconds, then start the second batch on the fifth second. But if the second batch finishes block it so it won’t write before it’s previous batch (because of the will to keep the correct messages order).


Solution

  • No. Next batch only starts if previous completed. I think you mean term interval. It would become a mess otherwise.

    See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers