Search code examples
apache-sparkpysparkspark-streamingspark-streaming-kafka

I want to keep jobs running with sparkstreaming


Is it possible to keep the streamingjob running all the time? After about 24 hours, it spits out this error and stops processing. I'm not quite sure how to handle this.

21/01/01 00:03:30 WARN KafkaOffsetReader [stream execution thread for [id =17bf-45aa-a9cd-2f77ec14df61, runId = 43c1-a932-d9f790996a6e]]: Retrying to fetch latest offsets because of incorrect offsets
21/01/01 07:17:04 ERROR RawSocketSender [MdsLoggerSenderThread]: org.fluentd.logger.sender.RawSocketSender
java.net.SocketException: Broken pipe (Write failed)

ssc.awaitTermination()

Doesn't the above code always run?


Solution

  • Reason: There are no messages in your kafka queue for consumption .

    Increase the and max time retry for awaitTermination() .

    ie for 3000000 milliseconds = wait for message for 5 minute

      ssc.awaitTermination(100000)
    

    Note: Change the value as per your environment . it is the max duration will at least one new message arrive in kafka queue.