Search code examples
amazon-web-serviceselastic-map-reduce

Force one reducer in AWS EMR


How do I ensure that there's only reducer for my EMR Streaming job? Is there any way to do this from the web frontend when I'm creating a new Jobflow?


Solution

  • You can configure Hadoop in the bootstrap action using the --arg flag. Specifically to your question you can set the mapred.tasktracker.reduce.tasks.maximum to 1.

    elastic-mapreduce --create --alive \
          --name "Configure Jobflow" \
          --bootstrap-actions s3://elasticmapreduce/bootstrap-actions/configure-hadoop
          --arg mapred.tasktracker.reduce.tasks.maximum=2