Search code examples
scalanoclassdeffounderrorspark-submit

NoClassDefFoundError while executing spark-submit for KafkaProducer


I've coded a Kafka Producer using Scala in Intellij and passed two args as files. I've used the following code.

   package kafkaProducer

    import java.util.Properties

    import org.apache.kafka.clients.producer._
    import org.apache.spark._

    import scala.io.Source


    object kafkaProducerScala extends App {
          val conf = new SparkConf().
        setMaster(args(0)).
        setAppName("kafkaProducerScala")
        val sc = new SparkContext(conf)
         sc.setLogLevel("ERROR")


        val props = new Properties ()
        props.put ("bootstrap.servers", "localhost:9092")
        props.put ("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
        props.put ("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
        val producer = new KafkaProducer[String, String] (props)

        val topic = "KafkaTopics"
               for (line2 <- Source.fromFile (args (2) ).getLines) {
        val c = line2.toInt

        for (line <- Source.fromFile (args (1) ).getLines) {
        val a = line.toInt
        val b = if (a > c) {
        var d = a
        println(d)
        val record = new ProducerRecord[String, String] (topic, d.toString)
        producer.send (record)
                          }
        }
          producer.close ()

      }

      }

Following is the build.sbt file

name := "KafkaProducer"

version := "0.1"

scalaVersion := "2.12.7"

libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.1"
resolvers += Resolver.mavenLocal

// https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.1"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"

My goal is to get the output in Kafka Consumer. I'm getting it perfectly. Then I created a .jar file for spark-submit.

i've given following spark-submit command

C:\spark-2.3.1-bin-hadoop2.7\bin>spark-submit --class kafkaProducer.kafkaProducerScala C:\Users\Shaheel\IdeaProjects\KafkaProducer\target\scala-2.12\k
afkaproducer_2.12-0.1.jar local C:\Users\Shaheel\Desktop\demo.txt C:\Users\Shaheel\Desktop\condition.properties

But I'm getting following error

2018-11-28 17:53:58 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/KafkaProducer
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
        at java.lang.Class.privateGetMethodRecursive(Unknown Source)
        at java.lang.Class.getMethod0(Unknown Source)
        at java.lang.Class.getMethod(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:42)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.clients.producer.KafkaProducer
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 11 more
2018-11-28 17:53:58 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-11-28 17:53:58 INFO  ShutdownHookManager:54 - Deleting directory C:\Users\Shaheel\AppData\Local\Temp\spark-96060579-36cc-4c68-b85e-429acad4fd38

Help me to resolve it.


Solution

  • you are using Scala version as 2.12.7 whereas, Spark is still being built with Scala version 2.11

    Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It’s easy to run locally on one machine — all you need is to have java installed on your system PATH, or the JAVA_HOME environment variable pointing to a Java installation.

    Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).

    Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0.

    The Above excerpt is taken directly from the documentation page of Apache Spark(v2.4.0). Change your Scala version to 2.11.12 and add the sbt-assembly plugin to your plugins.sbt file. all you need to do is then run the command sbt assembly in the projects' root(the location where src and build.sbt reside together) and the jar created would contain the dependency of kafka-client

    the corrected build.sbt would be as follows:

    val sparkVersion="2.4.0"
    
    name := "KafkaProducer"
    
    version := "0.1"
    
    scalaVersion := "2.11.12"
    
    libraryDependencies ++= Seq("org.apache.kafka" % "kafka-clients" % "2.0.1",
    "org.apache.spark" %% "spark-core" % sparkVersion % Provided)
    

    The dependencies of Apache Spark are always used in the Provided scope as Spark provides them to the code at Runtime.