java hadoop memory mapreduce hadoop-yarn

Specifying mapreduce.map.java.opts without overriding memory settings?

I am using a hadoop cluster running mapr 5.2 that has problems with unicode character encodings. I discovered that adding the following lines to mapred-site.xml solved this issue:

<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Dfile.encoding=utf-8</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Dfile.encoding=utf-8</value>
</property>

Unfortunately, this causes many jobs (that work fine without these properties) to throw errors like this:

Container [pid=63155,containerID=container_e40_1544666751235_12271_01_000004] is running beyond physical memory limits. Current usage: 8.0 GB of 8 GB physical memory used; 31.7 GB of 16.8 GB virtual memory used. Killing container.

I've tried increasing the value of mapreduce.map.memory.mb to the maximum allowed according to this error mesage:

Job job_1544666751235_12267 failed with state KILLED due to: MAP capability required is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:16000, vCores:1, disks:0.5> maxContainerCapability:<memory:8192, vCores:20, disks:4.0>

But containers are still killed. Like I said, these jobs worked fine before setting the mapreduce.*.java.opts properties, so I assume they are overriding something. Is there a way to set -Dfile.encoding without overriding other Java parameters?

Solution

Is there a value existed earlier for mapreduce.*.java.opts? Usually the Java memory settings like -Xmx etc goes in there. So just keeping -Dfile.encoding=utf-8 might have removed those settings and that might have affected other jobs. You have two options here

Append your encoding settings to the earlier existed value. But here encoding setting will be applicable to all the jobs using that mapred-site.xml

    <property>
      <name>mapreduce.map.java.opts</name>
      <value>your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8</value>
    </property>
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8</value>
    </property>

Set this value only to your job while running, provided you use org.apache.hadoop.util.GenericOptionsParser on your code. So encoding settings will be applicable only for your job.

yarn jar <your_jar> <class> -Dmapreduce.map.java.opts="your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8"