I use Spark 1.6.0 with Cloudera 5.8.3.
I have a DStream
object and plenty of transformations defined on top of it,
val stream = KafkaUtils.createDirectStream[...](...)
val mappedStream = stream.transform { ... }.map { ... }
mappedStream.foreachRDD { ... }
mappedStream.foreachRDD { ... }
mappedStream.map { ... }.foreachRDD { ... }
Is there a way to register a last foreachRDD
that is guaranteed to be executed last and only if the above foreachRDD
s finished executing?
In other words, when the Spark UI shows that the job was complete - that's when I want to execute a lightweight function.
Is there something in the API that allows me to achieve that?
Thanks
Using streaming listeners should solve the problem for you:
(sorry it's a java example)
ssc.addStreamingListener(new JobListener());
// ...
class JobListener implements StreamingListener {
@Override
public void onBatchCompleted(StreamingListenerBatchCompleted batchCompleted) {
System.out.println("Batch completed, Total delay :" + batchCompleted.batchInfo().totalDelay().get().toString() + " ms");
}
/*
snipped other methods
*/
}