Search code examples
apache-sparkapache-kafkaspark-streamingsentiment-analysisspark-streaming-kafka

Spark streaming and Kafka intergration


I'm new to Apache Spark and I've been doing a project related to sentiment analysis on twitter data which involves spark streaming and kafka integration. I have been following the github code (link provided below)

https://github.com/sridharswamy/Twitter-Sentiment-Analysis-Using-Spark-Streaming-And-Kafka However, in the last stage, that is during the integration of Kafka with Apache Spark, the following errors were obtained

py4j.protocol.Py4JError: An error occurred while calling o24.createDirectStreamWithoutMessageHandler. Trace:
py4j.Py4JException: Method createDirectStreamWithoutMessageHandler([class org.apache.spark.streaming.api.java.JavaStreamingContext, class java.util.HashMap, class java.util.HashSet, class java.util.HashMap]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:272)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)

Command used: bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.5.1 twitterStream.py

Apache Spark version: spark-2.1.0-bin-hadoop2.4

Kafka version: kafka_2.11-0.10.1.1

I haven't been able to debug this and any help would be much appreciated.


Solution

  • The example you are trying to run is desinged for running in spark 1.5. You should either download spark 1.5 or run the spark-submit from spark 2.1.0 but with kafka package related to 2.1.0, for example: ./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.0.