Search code examples
javaamazon-web-servicesapache-sparkemramazon-emr

EMR Spark working in a java main, but not in a java function


I wonder why does this work :

public final class JavaSparkPi {

public static void main(String[] args) throws Exception {

    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff= dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("stuff is " + weirdStuff);
        jsc.stop();     

}
}

and why this does not :

public final class JavaSparkPi {

    private void startWorkingOnMicroSpark() {
    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        ArrayList<Integer> list = new ArrayList<>();
        for(int i = 0; i < 10 ; i++){
            list.add(i);
        }

        JavaRDD<Integer> dataSet = jsc.parallelize(list)
                .map(s->2*s)
                .map(s->5*s);

        int weirdStuff = dataSet.reduce((a, b) -> (a + b)/2);
        System.out.println("weirdStuff is " + weirdStuff);
        jsc.stop();     
    }
public static void main(String[] args) throws Exception {

    JavaSparkPi jsp = new JavaSparkPi();
    jsp.startWorkingOnMicroSpark();

}  

}

I'm working on Spark with EMR. The only difference i found between those two project is the fact that one have the spark part written in the main and the other not. I launched both of them as spark app in EMR with the --class JavaSparkPi argument.

Here is the failing statut :

Statut :FAILED

Raison :

Fichier journal :s3://mynewbucket/Logs/j-3AKSZXK7FKMX6/steps/s-2MT0SB910U3TE/stderr.gz

Détails:Exception in thread "main" org.apache.spark.SparkException: Application application_1501228129826_0003 finished with failed status

Emplacement JAR : command-runner.jar

Classe principale : Aucun

Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi s3://mynewbucket/Code/SparkAWS.jar

Action sur échec : Continuer

and there is the successful one :

Emplacement JAR : command-runner.jar
Classe principale : Aucun
Arguments : spark-submit --deploy-mode cluster --class JavaSparkPi 
s3://mynewbucket/Code/SparkAWS.jar
Action sur échec : Continuer

Solution

  • Put those Spark initialization methods to main.

    SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("mySparkApp"); JavaSparkContext jsc = new JavaSparkContext(sparkConf);