Search code examples
scalaintellij-ideascala-spark

How to setup and run scala-spark in intellij?


I am trying to run use Intellij to build spark applications written in scala. I get the following error when I execute the scala program:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
    at Main$.main(Main.scala:10)
    at Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 2 more

the line throwing the error is the following one:

val conf = new SparkConf().setAppName("Sample Spark Scala Application")

I am not getting any error if I just try to import spark SparkConf() without trying to execute the above line.

The following are the contents of my sbt file:

ThisBuild / version := "0.1.0-SNAPSHOT"

ThisBuild / scalaVersion := "2.12.15"

val sparkVersion = "3.2.4"

// Note the dependencies are provided
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided"

lazy val root = (project in file("."))
  .settings(
    name := "untitled4"
  )

The versions mentioned in the sbt match with what I can see when I open the spark shell.

the following are the contents of my PATH, JAVA_HOME and SPARK_HOME variables

ray@Rayquaza-ASUS-TUF-Gaming-F15-FX506HEB-FX566HEB:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/usr/lib/jvm/java-8-openjdk-amd64/bin:/usr/spark/spark-3.2.4-bin-hadoop2.7/bin
ray@Rayquaza-ASUS-TUF-Gaming-F15-FX506HEB-FX566HEB:~$ echo $JAVA_HOME
/usr/lib/jvm/java-8-openjdk-amd64
ray@Rayquaza-ASUS-TUF-Gaming-F15-FX506HEB-FX566HEB:~$ echo $SPARK_HOME
/usr/spark/spark-3.2.4-bin-hadoop2.7

I am able to run code properly in the spark shell. So spark and scala seem to be properly setup. But Intellij isn't able to use those.

I tried following a few online guides but they weren't helpful. Would appreciate help solving the problem. Also let me know if any more information or details are required.


Solution

  • If you want to run the code from inside Intellij, the spark dependencies are not "provided". Running through Intellij with default configuration, your application is going to run as a simple JVM application and should be provided all its dependencies in the classpath. Something like the below is happening by default when you press the Run button in a JVM project:

    java -classpath /path/to/dependency:/path/to/dependency_2 YourMain
    

    So, my guess is that when your run the code from Intellij because your spark dependencies are marked as provided are nowhere to be found by java in the provided classpath. You should change the scope of the spark packages to compile in your sbt file, build the project and try again.