Search code examples
hadoopapache-sparkmemoryworkflowoozie

Oozie workflow with spark application reports out of memory


I’ve tried to execute Oozie workflow with spark program as single step. I've used jar which is successfully executed with spark-submit or spark-shell (the same code):

spark-submit --packages com.databricks:spark-csv_2.10:1.5.0  --master yarn-client --class "SimpleApp"  /tmp/simple-project_2.10-1.1.jar

Application shouldn’t demand lot of resources – load single csv (<10MB) to hive using spark.

  • Spark version: 1.6.0
  • Oozie version: 4.1.0

Workflow is created with Hue, Oozie Workflow Editor:

<workflow-app name="Spark_test" xmlns="uri:oozie:workflow:0.5">
    <start to="spark-589f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="spark-589f">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapreduce.map.java.opts</name>
                    <value>-XX:MaxPermSize=2g</value>
                </property>
            </configuration>
            <master>yarn</master>
            <mode>client</mode>
            <name>MySpark</name>
            <jar>simple-project_2.10-1.1.jar</jar>
              <spark-opts>--packages com.databricks:spark-csv_2.10:1.5.0</spark-opts>
            <file>/user/spark/oozie/jobs/simple-project_2.10-1.1.jar#simple-project_2.10-1.1.jar</file>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

I got following logs after running workflow:

stdout:

Invoking Spark class now >>> Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space

stderr:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Yarn application state monitor" Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), PermGen space

syslog:

2017-03-14 12:31:19,939 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: PermGen space

Please suggest which configuration parameters should be increased.


Solution

  • You have at least 2 options here: 1) increase PermGen size for launcher MR job by adding this to workflow.xml:

    <property>
        <name>oozie.launcher.mapreduce.map.java.opts</name>
        <value>-XX:PermSize=512m -XX:MaxPermSize=512m</value>
    </property>
    

    see details here: http://www.openkb.info/2016/07/memory-allocation-for-oozie-launcher-job.html

    2) preferred way is to use Java 8 instead of outdated Java 7