I'm running through Amazon's example of running Elastic MapReduce and keep getting hit with the following error:
Error launching job , Output path already exists.
Here is the command to run the job that I am using:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --create --stream \
--mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py \
--input s3://elasticmapreduce/samples/wordcount/input \
--output [A path to a bucket you own on Amazon S3, such as, s3n://myawsbucket] \
--reducer aggregate
Here is where the example comes from here
I'm following Amazon'd directions for the output directory. The bucket name is s3n://mp.maptester321mark/
. I've looked through all their suggestions for problems on this url
Here is my credentials.json
info:
{
"access_id": "1234123412",
"private_key": "1234123412",
"keypair": "markkeypair",
"key-pair-file": "C:/Ruby/elastic-mapreduce-cli/markkeypair",
"log_uri": "s3n://mp-mapreduce/",
"region": "us-west-2"
}
hadoop jobs won't clobber directories that already exist. You just need to run:
hadoop fs -rmr <output_dir>
before your job ot just use the AWS console to remove the directory.