Search code examples
apache-sparkspark-streaming

Spark Streaming how to guarantee order of multiple foreachRDD


I would like to perform a sequence of actions on a DStream. Action N+1 must be executed after action N. What is the difference between these implementations?

val myDStream = ???

//version 1
myDStream.foreachRDD(rdd => action 1)
myDStream.foreachRDD(rdd => action 2)
myDStream.foreachRDD(rdd => action 3)

//version 2
myDStream.foreachRDD{rdd => 
  action 1
  action 2
  action 3
}


Solution

  • If we assume that each action operates on the complete RDD, such as action(rdd), then the two expressions should be equivalent in the order of the results.

    At execution level, the top version will generate three spark jobs, while the bottom version will generate only one.