Search code examples
apache-sparksbtapache-kafkaspark-streaming

KafkaUtils java.lang.NoClassDefFoundError Spark Streaming


I am attempting to print messages consumed from Kafka via Spark streaming. However, I keep running into the following error:

16/09/04 16:03:33 ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

There have been a few questions asked on StackOverflow regarding this very issue. Ex: https://stackoverflow.com/questions/27710887/kafkautils-class-not-found-in-spark-streaming#=

The answers given have not resolved this issue for me. I have tried creating an "uber jar" using sbt assembly and that did not work either.

Contents of sbt file:

name := "StreamKafka"

version := "1.0"

scalaVersion := "2.10.5"


libraryDependencies ++= Seq(
    "org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
    "org.apache.spark" % "spark-streaming_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided" exclude("com.esotericsoftware.minlog", "minlog") exclude("com.esotericsoftware.kryo", "kryo")
)

resolvers ++= Seq(
    "Maven Central" at "https://repo1.maven.org/maven2/"
)


assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")          =>     MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      =>     MergeStrategy.discard
  case "log4j.properties"                                  =>     MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") =>     MergeStrategy.filterDistinctLines
  case "reference.conf"                                    =>     MergeStrategy.concat
  case _                                                   =>     MergeStrategy.first
  case PathList(ps @ _*) if ps.last endsWith "pom.properties" =>  MergeStrategy.discard
  case x => val oldStrategy = (assemblyMergeStrategy in assembly).value
  oldStrategy(x)
}

Solution

  • Posting the answer from the comments so that it will be easy for others to to solve the issue.

    You have to remove "provided" from kafka dependencies

    "org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
    "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided"
    

    For the dependencies to be bundled in the jar you have to run the command sbt assembly

    Also make sure that you are running the right jar file. You can find the right jar file name by checking the sbt assembly command's log.