I have a huge file to process, loaded into an RDD, and performing some validations on its lines using map function. I have a set of errors that are fatal for the whole file if encountered even on one line of the file. Thus, I would like to abort any other processing (all launched mappers across the whole cluster) as soon as that validation fails on a line (to save some time).
Is there a way to archive this?
Thank you.
PS: Using Spark 1.6, Java API
Well, after further search and understanding of Spark transformations laziness , I just have to do something like:
rdd.filter(checkFatal).take(1)
Then due to the laziness, the processing will just stop itself once one record matching that rule is found :)