I have an object in spark scala that reads an HDFS file and export it in a local file, within my cluster. I created the function, with an object, I created a SparkSession and the function correctly returns what I want with the following command:
ReadFiles.main(Array("hdfs://.../info.log"))
But I wanted this function to run every 5 minutes. Is there a way to execute the command every 5 minutes? Or else create some variable in SparkSession function that does?
Thanks
You can go ahead with threads as below.
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit.SECONDS
Executors.newSingleThreadScheduledExecutor.scheduleWithFixedDelay(fileReaderThread(), 0L, 300L, SECONDS)
def fileReaderThread() = new Runnable {
override def run(): Unit = {
ReadFiles.main(Array("hdfs://.../info.log"))
}
}
Call newSingleThreadScheduledExecutor in a separate main only once. Later it will keep on calling your read files method in a fixed time.