Search code examples

Hadoop: Input and Output paths in AWS EMR job

I am trying to run a Hadoop job in Amazon Elastic Mapreduce. I have my data and jar located in aws s3. When i setup the job flow I pass the JAR Arguments as

s3n://my-hadoop/input s3n://my-hadoop/output

Below is my hadoop main function

public static void main(String[] args) throws Exception
        Configuration conf = new Configuration();
        Job job = new Job(conf, "MyMR");
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

However my jobflow fails with the following log in stderr

Exception in thread "main" java.lang.ClassNotFoundException: s3n://my-hadoop/input
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(
    at org.apache.hadoop.util.RunJar.main(

So how do I specify my input and output paths in aws emr?


  • So basically this is a classic error of not-defining-the-main-class while trying to create an executable jar. when you do not let the jar have the knowledge of the main-class, the first argument is taken to be the main-class, and hence the error here.

    So make sure that while you create the executable jar, you specify the main-class in the manifest.


    Use args[1] and args[2] respectively for input and output and execute the hadoop step something like following:

    ruby elastic-mapreduce -j $jobflow --jar s3:/my-jar-location/myjar.jar --arg com.somecompany.MyMainClass --arg s3:/input --arg s3:/output