I am new to Scala and Spark. I am working on spark streaming with twitter data. I flatmapped the stream into individual words.Now, I need to eliminate tweet words like which start with #,@ and words like RT from streaming data before processing them. I knew it is quite easy to do.I wrote filter for this, but it is not working. Can anyone help on this. My code is
val sparkConf = new SparkConf().setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val stream = TwitterUtils.createStream(ssc, None)
//val lanFilter = stream.filter(status => status.getLang == "en")
val RDD1 = stream.flatMap(status => status.getText.split(" "))
val filterRDD = RDD1.filter(word =>(word !=word.startsWith("#")))
Also language filter is showing error.
Thank you.
Is your lambda expression correct? I think you want:
val filterRDD = RDD1.filter(word => !word.startsWith("#"))