I am transforming a existing package to make it run on spark, in order to serialize the class in the third-party tools, I used the following code:
SparkConf conf = new SparkConf().setAppName("my.app.spark").setMaster("local").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").set("spark.kryo.registrationRequired", "true");
try {
conf.registerKryoClasses(new Class<?>[]{
Class.forName("my.thirdparty.classes"),
Class.forName("my.thirdparty.classes2")
});
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
JavaSparkContext context = new JavaSparkContext(conf);
List<File> txtFiles = new ArrayList<File>();
for(File file: input.listFiles(filter)) {
txtFiles.add(file);
}
JavaRDD<File> distText = context.parallelize(txtFiles);
distText.foreach(
new VoidFunction<File>()
{ public void call(File file) {
processFile(file);
}});
context.close();
When I submit using the following command: spark-submit --class "mypackage.RunWithSpark" --master yarn --driver-memory 6g mypackage.jar
I got the error like:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkConf.registerKryoClasses([Ljava/lang/Class;)Lorg/apache/spark/SparkConf;
...
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am new to Spark, could you please help with it?
Thanks
The method org.apache.spark.SparkConf.registerKryoClasses
is added in Spark 1.2.0. Your cluster only runs Spark 1.0.0, hence the error.
You should either upgrade your cluster to 1.6.1, or switch to Spark 1.0.0 in your program and register Kryo classes using version 1.0.0 API. The rule is always link your program to the same version of Spark as your clusters, otherwise you'll encounter all sort of problems.