Search code examples
apache-sparkoozieoozie-coordinator

Loading dependency jars (different versions of same jar for different actions/jobs) with oozie spark action


My main spark project have dependency on other utils jars.So set of combination could be like:

 1. main_spark-1.0.jar will work with utils_spark-1.0.jar (some jobs use this set)
 2. main_spark-2.0.jar will work with utils_spark-2.0.jar  (and some of the jobs use this set)

The approch which worked for me to handle this scenario is to pass jars with spark-opt as

oozie spark action job1
<jar>main_spark-1.0.jar</jar>
<spark-opt>--jars utils_spark-1.0.jar</spark-opt>

oozie spark action job2
<jar>main_spark-2.0.jar</jar>
<spark-opt>--jars utils_spark-2.0.jar</spark-opt>

I tested this configuration in two different actions and it works. The question I have is

  1. How is it different then loading jars in app lib path(oozie) ?
  2. If both jobs/action run in parallel on same yarn-cluster then Is there any possibility of class loader issue (multiple versions of same jar)?

In my understanding both application will be running in their spark context so it should be ok but any expert advice ?


Solution

  • If both jobs/action run in parallel on same yarn-cluster then Is there any possibility of class loader issue (multiple versions of same jar)?

    No (or at least it is not expected and if happened I'd consider it a bug).

    Submitting a Spark application to a YARN cluster always ends up as a separate set of the driver and executors that all together compose a separate environment from other Spark applications.