Search code examples
hadoopamazon-ec2amazon-web-serviceselastic-map-reduce

Amazon Elastic MapReduce: Output directory


I'm running through Amazon's example of running Elastic MapReduce and keep getting hit with the following error:

Error launching job , Output path already exists.

Here is the command to run the job that I am using:

C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --create --stream \
     --mapper  s3://elasticmapreduce/samples/wordcount/wordSplitter.py \
     --input   s3://elasticmapreduce/samples/wordcount/input \
     --output  [A path to a bucket you own on Amazon S3, such as, s3n://myawsbucket] \
     --reducer aggregate

Here is where the example comes from here

I'm following Amazon'd directions for the output directory. The bucket name is s3n://mp.maptester321mark/. I've looked through all their suggestions for problems on this url

Here is my credentials.json info:

{
"access_id": "1234123412",
"private_key": "1234123412",
"keypair": "markkeypair",
"key-pair-file": "C:/Ruby/elastic-mapreduce-cli/markkeypair",
"log_uri": "s3n://mp-mapreduce/",
"region": "us-west-2"
}

Solution

  • hadoop jobs won't clobber directories that already exist. You just need to run:

    hadoop fs -rmr <output_dir>
    

    before your job ot just use the AWS console to remove the directory.