Search code examples
apache-sparkamazon-emramazon-data-pipeline

AWS EMR Spark: Error: Cannot load main class from JAR


I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with:

Cannot load main class from JAR. The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step.

On the local machine, the job seems to work perfectly fine when no main class is specified as below:

 ./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar

I have set main class to jar using run configuration. The main reason to avoid passing main class as --class is, I have to run this job in AWS Datapipeline using EMRAcivity. In AWS Datapipeline, currently there is no way to specify a main class to a job being submitted.

Any help will be appreciated.


Solution

  • Actually, you can pass the job's main class with EMRActivity and AWS Datapipeline.

    See https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html to launch a EMRActivity using step.

    as well as https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html to submit a spark job using an EMR step with a main class.

    The step would look as follows:

    command-runner.jar,spark-submit,--class,org.apache.spark.examples.SparkPi