In a Spark job, I don't know how to import and use the jars that are shared by method SparkContext.addJar()
. It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:
package utils;
public class addNumber {
public int addOne(int i) {
return i + 1;
}
public int addTwo(int i) {
return i + 2;
}
}
I create a class called addNumber
and make it into a jar file utils.jar
.
Then I create a Spark job with the code below:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object TestDependencies {
def main(args:Array[String]): Unit = {
val sparkConf = new SparkConf
val sc = new SparkContext(sparkConf)
sc.addJar("/path/to//utils.jar")
val data = 1 to 100 toList
val rdd = sc.makeRDD(data)
val rdd_1 = rdd.map(x => {
val handler = new utils.addNumber
handler.addOne(x)
})
rdd_1.collect().foreach { x => print(x + "||") }
}
}
The error java.lang.NoClassDefFoundError: utils/addNumber
was raised after submission of the job through spark-submit
.
I know that addJar()
does not guarantee jars are included into classpath of the Spark job. If I want to use the jar files, I have to move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()
?
I am wondering if there is a way of using jars imported by method addJar()
. Thanks in advance.
Did you try set the path of jar with prefix "local"? From the documentation:
public void addJar(String path)
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
You can try with setJars
as well:
val conf = new SparkConf()
.setMaster('local[*]')
.setAppName('tmp')
.setJars(Array('/path1/one.jar', '/path2/two.jar'))
val sc = new SparkContext(conf)
and take a look here, see spark.jars
option
and set --jars
param in spark-submit
:
--jars /path/1.jar,/path/2.jar
or edit conf/spark-defaults.conf
:
spark.driver.extraClassPath /path/1.jar:/fullpath/2.jar
spark.executor.extraClassPath /path/1.jar:/fullpath/2.jar