Is it possible to run Spark Jobs e.g. Spark-sql jobs via Oozie?
In the past we have used Oozie with Hadoop. Since we are now using Spark-Sql on top of YARN, looking for a way to use Oozie to schedule jobs.
Thanks.
Yup its possible ... The procedure is also same, that you have to provide Oozia a directory structure having coordinator.xml
, workflow.xml
and a lib directory containing your Jar files.
But remember Oozie starts the job with java -cp
command, not with spark-submit
, so if you have to run it with Oozie, Here is a trick.
Run your jar with spark-submit
in background.
Look for that process in process list. It will be running under java -cp
command but with some additional Jars, that are added by spark-submit
. Add those Jars in CLASS_PATH
. and that's it. Now you can run your Spark applications through Oozie.
1. nohup spark-submit --class package.to.MainClass /path/to/App.jar &
2. ps aux | grep '/path/to/App.jar'
EDITED: You can also use latest Oozie, which has Spark Action
also.