I am using a hadoop cluster running mapr 5.2 that has problems with unicode character encodings. I discovered that adding the following lines to mapred-site.xml
solved this issue:
<property>
<name>mapreduce.map.java.opts</name>
<value>-Dfile.encoding=utf-8</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Dfile.encoding=utf-8</value>
</property>
Unfortunately, this causes many jobs (that work fine without these properties) to throw errors like this:
Container [pid=63155,containerID=container_e40_1544666751235_12271_01_000004] is running beyond physical memory limits. Current usage: 8.0 GB of 8 GB physical memory used; 31.7 GB of 16.8 GB virtual memory used. Killing container.
I've tried increasing the value of mapreduce.map.memory.mb
to the maximum allowed according to this error mesage:
Job job_1544666751235_12267 failed with state KILLED due to: MAP capability required is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:16000, vCores:1, disks:0.5> maxContainerCapability:<memory:8192, vCores:20, disks:4.0>
But containers are still killed. Like I said, these jobs worked fine before setting the mapreduce.*.java.opts
properties, so I assume they are overriding something. Is there a way to set -Dfile.encoding
without overriding other Java parameters?
Is there a value existed earlier for mapreduce.*.java.opts
? Usually the Java memory settings like -Xmx
etc goes in there. So just keeping -Dfile.encoding=utf-8
might have removed those settings and that might have affected other jobs. You have two options here
mapred-site.xml
<property>
<name>mapreduce.map.java.opts</name>
<value>your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8</value>
</property>
org.apache.hadoop.util.GenericOptionsParser
on your code. So encoding settings will be applicable only for your job.yarn jar <your_jar> <class> -Dmapreduce.map.java.opts="your_earlier_existed_java_opts_value_goes_here -Dfile.encoding=utf-8"