I'm developing a Spring Boot application that looks for given keywords
in given websites, and scraps the webpages if a match is found. I am writing a cron job to refresh the results every 5 minutes like the following:
@Scheduled(cron = "* */5 * * * *")
public void fetchLatestResults() throws Exception {
LOG.debug("Fetching latest results >>>");
List<Keyword> keywords = keywordService.findOldestSearched10();
keywordService.updateLastSearchDate(keywords);
searchResultService.fetchLatestResults(keywords);
LOG.debug("<<< Latest results fetched");
}
The database has 100 keywords
and in the cron job I'm first listing the oldest 10 keywords for which the results were last fetched. So, for example the first run should use keywords
with ids 1 to 10 and the second run should use ids 11 to 20 and so on and the 11th run should again use ids 1 to 10 and the process continues.
Now, the problem is that executing the search takes much longer than 5 minutes. So, eventhough I've set the cron job to run every 5 minutes, the second run doesn't take place until the first is completed. As a result, completing the search is taking hours. How can I make this process multithreaded so that multiple instances of the cron job can be run simultaneously since they are operating on different list of keywords
?
I suggest you make execution of your cron job asynchronous.
Create executor
class that'll create a new thread to run your cron job:
@Component
public class YourCronJobExecutor {
private int threadsNumber = 10;
private ExecutorService executorService;
@PostConstruct
private void init() {
executorService = Executors.newFixedThreadPool(threadsNumber);
}
/**
* Start.
* @param runnable - runnable instance.
*/
public void start(Runnable runnable) {
try {
executorService.execute(runnable);
} catch (RejectedExecutionException e) {
init();
executorService.execute(runnable);
}
}
}
Create a processor
class that will contain the logic of your cron job:
@Component
public class CronJobProcessor {
//logger
//autowired beans
public void executeYouCronJob() {
LOG.debug("Fetching latest results >>>");
List<Keyword> keywords = keywordService.findOldestSearched10();
keywordService.updateLastSearchDate(keywords);
searchResultService.fetchLatestResults(keywords);
LOG.debug("<<< Latest results fetched");
}
}
And finally, your cron job class will look like this:
@Component
public class YourCronJobClass {
private final YourCronJobExecutor yourCronJobExecutor;
private final CronJobProcessor cronJobProcessor;
@Autowired
public PopulateCourseStateController(YourCronJobExecutor yourCronJobExecutor,
CronJobProcessor cronJobProcessor) {
this.yourCronJobExecutor = yourCronJobExecutor;
this.cronJobProcessor = cronJobProcessor;
}
@Scheduled(cron = "* */5 * * * *")
public void fetchLatestResults() throws Exception {
yourCronJobExecutor.start(cronJobProcessor::executeYouCronJob);
}
}
This way execution of your cron job will take couple of milliseconds, and a separate thread, that'll actually be performing the job will run as long as it needs to.
But perhaps, you'd want to execute search of every keyword in a separate thread, but that's a bit of a different story.