Search code examples
apache-sparkooziehortonworks-data-platform

Oozie SparkAction failing


On HDP 2.3.4 I'm trying to test out Oozie's SparkAction.

The Spark (1.5.2) Application that I wrote is trivial and solely for testing out Oozie (4.2.0):

val tbl = sqlContext.sql("SELECT * FROM tbl")
val count = tbl.count   
log.info(s"The table has ${count} records.")

This application works when using spark-submit, both in YARN-Client and YARN-Cluster modes.

My job.properties and workflow.xml files are as follows:

job.properties

nameNode=hdfs://myhost.com:8020
jobTracker=myhost.com:8032
queueName=default
projectRoot=user/${user.name}/workflows/sparkaction-test

master=yarn-cluster
mode=cluster
class=com.myCompany.SparkActionTest
hiveSite=hive-site.xml
jars=${nameNode}/${projectRoot}/lib/sparkaction-test_2.10-1.0.jar


oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${projectRoot}
spark.yarn.historyServer.address=http://myhost.com:18080/
spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
spark.eventLog.enabled=true

workflow.xml

<workflow-app name="spark-test-wf" xmlns="uri:oozie:workflow:0.4">
    <start to="spark-test"/>
    <action name="spark-test">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>${mode}</mode>
            <name>Testing Spark Action</name>
            <class>${class}</class>
        <jar>${jars}</jar>
            <spark-opts>--files ${hiveSite}</spark-opts>
         </spark>
        <ok to="end"/>
        <error to="errorcleanup" />
    </action>

    <kill name="errorcleanup">
      <message>Spark Test WF failed. [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name ="end"/>
</workflow-app>

I cleaned out the Oozie sharelib for Spark (moved them into a different directory) and left only the following jars in them:

datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
oozie-sharelib-spark-4.2.0.2.3.4.0-3485.jar
spark-1.5.2.2.3.4.0-3485-yarn-shuffle.jar
spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
spark-examples-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

Unfortunately, the Oozie workflow fails with the following error message:

The workflow is killed after being in running status for about 20 minutes and the error message is as follows: Call From host.xxx.com/x.x.x.x to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

I can ping 0.0.0.0 but can't telnet port 8032.


Solution

  • I finally resolved this by changing the resource manager port (yarn.resourcemanager.address in advanced yarn-site settings in Ambari YARN configs) from 8050 (Hortonworks default) to 8032. It seems Oozie only works on port 8032 for SparkAction