Search code examples
hadoopapache-pighadoop-yarnclouderacloudera-cdh

How to set PIG_HEAPSIZE in a cloudera cluster?


I have a pig script which is going out of memory every time I run from Oozie.

Error:

Pig logfile dump:

Pig Stack Trace

ERROR 2998: Unhandled internal error. Java heap space

java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
        at java.lang.StringCoding.encode(StringCoding.java:344)
        at java.lang.StringCoding.encode(StringCoding.java:387)
        at java.lang.String.getBytes(String.java:956)

I have tried set numerous parameters but without any success.

Same pig script runs from command line if I export PIG_HEAPSIZE = 4000.

Thanks for the help!


Solution

  • Can be done easily by adding these two lines in Oozie(Workflow.xml) script:

    <property>
         <name>oozie.launcher.mapred.child.java.opts</name>
         <value>-server -Xmx4G -Djava.net.preferIPv4Stack=true</value>
    </property>
    

    :)