Search code examples
hadoopamazon-web-servicesamazon-s3mapreduceelastic-map-reduce

Amazon EMR: "no output" found in S3


I am not getting any output in S3 when I run a job in Amazon EMR.

I specified the arguments:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output

When I checked the job log, I see that the job has completed successfully. But there is no output in the output folder of my bucket exdsyslab.

I also tried one more thing.

I chained two jobs: specified args while creating job flow:

-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/result -outputdir1 s3n://exdsyslab/result1

The second job's input is the output of the first job.

I faced the following exception for the second job as the program was running:

The output folder, "result", already exists.

This happened because the directory was created by the first job in the chain. How do I specify the input and output for the second job in the mapreduce chain?

Why is there output in the s3 buckets specified in the arguments?


Solution

  • For correct output, use this:

    -inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/output
    

    Note that the output directory is specified by "-output".

    For chaining jobs: you can't do it the way you specified, you MUST create multiple steps to an existing job in order to execute it. This other answer may help you: https://stackoverflow.com/a/11109592/1203129

    For your specific case, the input/output directories have to look like this:

    Step 1:

     -inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/result 
    

    Step 2:

     -input s3n://exdsyslab/result -output s3n://exdsyslab/result1