I am not getting any output in S3 when I run a job in Amazon EMR.
I specified the arguments:
-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/output
When I checked the job log, I see that the job has completed successfully. But there is no output in the output folder of my bucket exdsyslab.
I also tried one more thing.
I chained two jobs: specified args while creating job flow:
-inputfile s3n://exdsyslab/data/file.txt -outputdir s3n://exdsyslab/result -outputdir1 s3n://exdsyslab/result1
The second job's input is the output of the first job.
I faced the following exception for the second job as the program was running:
The output folder, "result", already exists.
This happened because the directory was created by the first job in the chain. How do I specify the input and output for the second job in the mapreduce chain?
Why is there output in the s3 buckets specified in the arguments?
For correct output, use this:
-inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/output
Note that the output directory is specified by "-output".
For chaining jobs: you can't do it the way you specified, you MUST create multiple steps to an existing job in order to execute it. This other answer may help you: https://stackoverflow.com/a/11109592/1203129
For your specific case, the input/output directories have to look like this:
Step 1:
-inputfile s3n://exdsyslab/data/file.txt -output s3n://exdsyslab/result
Step 2:
-input s3n://exdsyslab/result -output s3n://exdsyslab/result1