I would like to perform a sequence of actions on a DStream. Action N+1 must be executed after action N. What is the difference between these implementations?
val myDStream = ???
//version 1
myDStream.foreachRDD(rdd => action 1)
myDStream.foreachRDD(rdd => action 2)
myDStream.foreachRDD(rdd => action 3)
//version 2
myDStream.foreachRDD{rdd =>
action 1
action 2
action 3
}
If we assume that each action
operates on the complete RDD, such as action(rdd)
, then the two expressions should be equivalent in the order of the results.
At execution level, the top version will generate three spark jobs, while the bottom version will generate only one.