How to change memory in EMR hadoop streaming job

I am trying to overcome the following error in a hadoop streaming job on EMR.

Container [pid=30356,containerID=container_1391517294402_0148_01_000021] is running beyond physical memory limits

I tried searching for answers but the one I found isn't working. My job is launched as shown below.

hadoop jar ../.versions/2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar \
 -input  determinations/part-00000 \
 -output  determinations/aggregated-0 \
 -mapper cat \
 -file ./det_maker.py \
 -reducer det_maker.py \
 -Dmapreduce.reduce.java.opts="-Xmx5120M"

The last line above is supposed to do the trick as far as I understand, but I get the error:

ERROR streaming.StreamJob: Unrecognized option: -Dmapreduce.reduce.java.opts="-Xmx5120M"

What is the correct way change the memory usage ? Also is there some documentation that explains these things to n00bs like me?

Solution

You haven't elaborated on what memory you are running low, physical or virtual.

For both problems, take a look at Amazon's documentation: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

Usually the solution is to increase the amout of memory per mapper, and possibly reduce the number of mappers:

s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapreduce.map.memory.mb=4000
s3://elasticmapreduce/bootstrap-actions/configure-hadoop -m mapred.tasktracker.map.tasks.maximum=2