I'm trying to use hadoop on Amazon Elastic MapReduce where I have thousands of map tasks to perform. I'm OK if a small percentage of the tasks fail, however, Amazon shuts down the job and I lose all of the results when the first mapper fails. Is there a setting I can use to increase the number of failed jobs that are allowed? Thanks.
Here's the answer for hadoop:
Is there any property to define failed mapper threshold
To use the setting described above in EMR, look at:
Specifically, you create an xml file (config.xml in the example) with the setting that you want to change and apply bootstrap action:
./elastic-mapreduce --create \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --args "-M,s3://myawsbucket/config.xml"