I'm new to Spark (using v2.4.5), and am still trying to figure out the correct way to add external dependencies. When trying to add Kafka streaming to my project, my build.sbt
looked liked this:
name := "Stream Handler"
version := "1.0"
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.5" % "provided",
"org.apache.spark" % "spark-streaming_2.11" % "2.4.5" % "provided",
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5"
)
This builds successfully, but when running with spark-submit
, I get a java.lang.NoClassDefFoundError
with KafkaUtils
.
I was able to get my code working by passing in the dependency thru the --packages
option like this:
$ spark-submit [other_args] --packages "org.apache.spark:org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5"
Ideally I would like to set up all the dependencies in the build.sbt
, but I'm not sure what I'm doing wrong. Any advice would be appreciated!
your "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5"
is wrong.
change that to below like mvnrepo.. https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.4.5"