Search code examples
scalaapache-sparkspark-streamingsolace

How can I set alert of spark streaming job if no records are being pushed in last 1 hour?


I have a spark streaming job that reads and processes data from the solace queue. I want to set an alert on it if no data is consumed in last one hour. Currently, I have set a batch window as 1 minute. How can add an alert if no data is consumed continuously for an hour so that source can be notified?

enter image description here


Solution

  • You can keep track of it by saving the timestamp of the last received record in a hdfs file. And then in while processing micro-batch, if rdd is empty and the difference in current timestamp and timestamp in hdfs is more than an hour you can send a mail using your mailing service. If you receive some records in your micro batch you can update the timestamp in hdfs file accordingly. Your code will look something like below where you need to implement getTimeStampFromHDFS() which will return timestamp in your hdfs file and updateTimestampHDFS(currentTimestamp) in which you will update the timestamp when you received record in your micro batch.

    dstream.foreachRDD{rdd => 
        if(rdd.isEmpty) {
            if((System.currentTimeMillis - getTimeStampFromHDFS()) / (1000 * 60 * 60) >= 1) sendMailAlert()
        }
        else {
            updateTimestampHDFS(System.currentTimeMillis)
        }
    }