Search code examples
javaapache-sparktwitter4jspark-streaming

Twitter receiver not running in spark 1.6.0


For some reason if I include a twitter receiver and start the streaming context, I get the below exception not sure why Can someone let me know if anyone has already encountered this issue or am I doing something wrong?

java.lang.ArithmeticException: / by zero 
        at org.apache.spark.streaming.Duration.isMultipleOf(Duration.scala:59) 
        at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:324) 
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344) 
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344) 
        at scala.Option.orElse(Option.scala:257) 
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:341) 
        at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:47) 
        at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115) 
        at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:114) 
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) 
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) 
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) 
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) 
        at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:114) 
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248) 
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246) 
        at scala.util.Try$.apply(Try.scala:161) 
        at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246) 
        at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181) 
        at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87) 
        at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86) 
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
2016-03-28 10:08:00,112 ERROR [DefaultQuartzScheduler_Worker-8] org.quartz.core.JobRunShell 
Job sample_TwitterListener.SocialMedia_sample threw an unhandled Exception: 

Solution

  • Finally found what the issue was, in the code I was initializing two windows on the stream as below

    JavaReceiverInputDStream<Status> inputStream = TwitterUtils.createStream(jssc, getAuth());
    JavaDStream<String> batchedInput = rawInput.window(new Duration(windowInterval), new Duration(slideInterval));
    processStreamData(batchedInput);
    
    private void processStreamData(JavaDStream<String> _input) {
        JavaDStream<String> input = _input.window(new Duration(windowInterval), new Duration(slideInterval));
    }
    

    So, the first windowed stream used to get the correct sliding interval, but somehow the second window got the sliding interval of 0 ms, which was causing / by zero exception. After removing the second window operation I was able receive the twitter data.