Search code examples
javalinuxhadoophadoop-yarn

Hadoop heap allocation


I am having issues with Hadoop 2.5.1 not increasing the heap space increase that I am requesting. Hadoop doesn't seem to be respecting the mapred.child.java.opts property in the mapred-site.xml file.

In my job I am doing the following:

R = new int[height * width];
G = new int[height * width];
B = new int[height * width];

Depending on the size of the image I pass job crashes saying

Caused by: java.lang.OutOfMemoryError: Java heap space

Which is understandable. I need to increase the heap space but for some reason Hadoop doesn't want to respect the change in my mapred-site.xml file.

I added this to my mapred-site.xml and restarted

 <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx12072m</value>
 </property>

When that didn't work I added this to my mapred-env.sh and restarted

export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=4000

When that didn't work I added this to my yarn-env.sh and restart

JAVA_HEAP_MAX=-Xmx4000m

When that didn't work I added this to my yarn-env.sh and restart

YARN_HEAPSIZE=4000

When that didn't work I added this to my hadoop-env.sh and restart

export HADOOP_HEAPSIZE=4000
export HADOOP_NAMENODE_INIT_HEAPSIZE="3000"

I have restarted using start/stop-dfs.sh start/stop-all.sh start/stop-yarn.sh with their combinations. I have restarted the server and I have yet to see a change make a difference.

I am at a loss of what I can do or what else I can change.

Is there any way I can determine the heap size from within the job so I can try to debug this?


Solution

  • I don't know what the actual original problem was but apparently it was a configuration issue I had on my end. Either a misconfiguration or a conflicting configuration that caused the issue. What I ended up doing was scratching the hadoop install and starting from scratch.

    I followed the instructions for Pseudo-Distributed Operation in this guide:

    http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/SingleCluster.html

    In addition to the configuration settings given in those instructions I added the following. You can find the information here http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx4096m</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>4096</value>
     </property>
    

    I do not have a reduce phase so I don't have to worry about those parameters.

    The job seems to complete now.