I wonder why does this work :
public final class JavaSparkPi {
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
ArrayList<Integer> list = new ArrayList<>();
for(int i = 0; i < 10 ; i++){
list.add(i);
}
JavaRDD<Integer> dataSet = jsc.parallelize(list)
.map(s->2*s)
.map(s->5*s);
int weirdStuff= dataSet.reduce((a, b) -> (a + b)/2);
System.out.println("stuff is " + weirdStuff);
jsc.stop();
}
}
and why this does not :
public final class JavaSparkPi {
private void startWorkingOnMicroSpark() {
SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
ArrayList<Integer> list = new ArrayList<>();
for(int i = 0; i < 10 ; i++){
list.add(i);
}
JavaRDD<Integer> dataSet = jsc.parallelize(list)
.map(s->2*s)
.map(s->5*s);
int weirdStuff = dataSet.reduce((a, b) -> (a + b)/2);
System.out.println("weirdStuff is " + weirdStuff);
jsc.stop();
}
public static void main(String[] args) throws Exception {
JavaSparkPi jsp = new JavaSparkPi();
jsp.startWorkingOnMicroSpark();
}
}
I'm working on Spark with EMR. The only difference i found between those two project is the fact that one have the spark part written in the main and the other not. I launched both of them as spark app in EMR with the --class JavaSparkPi argument.
Here is the failing statut :
Statut :FAILED
Raison :
Fichier journal :s3://mynewbucket/Logs/j-3AKSZXK7FKMX6/steps/s-2MT0SB910U3TE/stderr.gz
Détails:Exception in thread "main" org.apache.spark.SparkException: Application application_1501228129826_0003 finished with failed status
Emplacement JAR : command-runner.jar
Classe principale : Aucun
Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi s3://mynewbucket/Code/SparkAWS.jar
Action sur échec : Continuer
and there is the successful one :
Emplacement JAR : command-runner.jar
Classe principale : Aucun
Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi
s3://mynewbucket/Code/SparkAWS.jar
Action sur échec : Continuer
Put those Spark initialization methods to main.
SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp"); JavaSparkContext jsc = new JavaSparkContext(sparkConf);