I'm using Hadoop 2.0.5 (Alpha) to run relatively big jobs, and I've ran into these errors:
Container [pid=15023,containerID=container_1378641992707_0002_01_000029] is running beyond virtual memory limits. Current usage: 492.4 MB of 1 GB physical memory used; 3.3 GB of 2.1 GB virtual memory used. Killing container.
I then learned about these two parameters:
yarn.nodemanager.vmem-pmem-ratio property which is set to 2.1 by default.
yarn.app.mapreduce.am.command-opts which is set to -Xmx1024mb (=1GB) by default.
That explained the limits marked above.
Setting these parameters to a higher value did help, but then I found this parameter: yarn.app.mapreduce.am.resource.mb which is set to 1536 by default.
And I can't quite tell the difference between the 3 from the description given in Hadoop's default XMLs, nor how should I properly set them in means of optimization.
An explanation or a good reference would be much appreciated
as we know Yarn is new architecture to govern resources in hadoop ecosystem.
yarn.nodemanager.vmem-pmem-ratio property: Is defines ratio of virtual memory to available pysical memory, Here is 2.1 means virtual memory will be double the size of physical memory.
yarn.app.mapreduce.am.command-opts: In yarn ApplicationMaster(AM) is responsible for securing necessary resources. So this property defines how much memory required to run AM itself. Don't confuse this with nodemanager, where job will be executed.
yarn.app.mapreduce.am.resource.mb: This property specify criteria to select resource for particular job. Here is given 1536 Means any nodemanager which has equal or more memory available will get selected for executing job.