Search code examples
hadoopapache-sparkoozie

Running Spark Jobs via Oozie


Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?

In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.

Thanks.


Solution

  • Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml, workflow.xml and a lib directory containing your Jar files.
    But remember Oozie starts the job with java -cp command, not with spark-submit, so if you have to run it with Oozie, Here is a trick.
    Run your jar with spark-submit in background. Look for that process in process list. It will be running under java -cp command but with some additional Jars, that are added by spark-submit. Add those Jars in CLASS_PATH. and that's it. Now you can run your Spark applications through Oozie.

    1.  nohup spark-submit --class package.to.MainClass /path/to/App.jar &
    2.  ps aux | grep '/path/to/App.jar'
    

    EDITED: You can also use latest Oozie, which has Spark Action also.