I am using spark streaming and I read streams from Kafka. After reading this stream, I am adding it to hazelcast map.
The issue is, I need to filter values from map right after reading stream from Kafka.
I am using below code to parallelize map values.
List<MyCompObj> list = CacheManager.getInstance().getMyMap().values().stream().collect(Collectors.toList());
JavaRDD<MyCompObj> myObjRDD = sparkContext.parallelize(list);
But in this logic, I am using JavaRDD in another one which is JavaInputDStream.foreachRDD and this causes serialization issues.
First question is, how can I run my spark job by event driven?
On the other hand, I just want to get some opinion about scheduled spark jobs. what is the best practice to schedule a spark job to execute it in specific time(s)?
I solved my issues by separating streaming and batch processing in two parts as it must be.
I am using quartz and SparkLauncher to trigger a new job (example)