I am building a lambda architecture and need Spark as the batch part of it to restart itself either at regular intervals or right after finishing, or have the restart be called by a Spark Streaming job. I've looked at things and I probably don't understand the Spark context, but am not sure whether I can just put the Spark context in a loop. Could anyone provide any quick guidance? Another quick question is that, considering there will be data being continually added into HBase, where Spark will be reading the data from, is there any use for caching? Thanks in advance for the help.
Edit: would all computations be redone if I implement a SparkListener and upon job ending call collect?
Seems it was easier than I thought. I suspected while loops wouldn't work outside RDD functions, because of that lazy execution going on with Spark. I was wrong. This example here hinted it was possible: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaPageRank.java