I'm looking for a way to execute all scala objects in a package sequentially using spark submit. Working on an ETL job. I have 4 scala objects (let say Object_1, Object_2, Object_3 and Object_4) all in one package,let's say etl. This package is then exported to a .jar file (say all_four.jar)
I know I can execute each object with the following spark submit command
./spark-submit --class etl.x --jars path/to/jar/if/any.jar path/to/exported/jar/all_four.jar arg(0)......arg(n)
where x represents each scala objects in the package.
However, I'm looking for a way to call the package only once and all objects will be executed in the following sequence:
Is there a way to do this with Spark Submit? Or are there better and more efficient ways to pull this off?
One way is to write wrapper object (Execute
) which contains step 1, step 2 & step 3 logics to invoke all in sequence. This wrapper object you can include along with those four objects if you have source code access.
Please find sample wrapper looks like below & You may need to modify according your need.
import etl.{Object_1,Object_2,Object_3,Object_4}
object Execute {
def extract() = {
// Make use of Object_1 & Object_2 logic here.
}
def transform() {
// Make use of Object_3 logic here.
}
def load() {
// Make use of Object_4 logic here.
}
def main(args: Array[String])
{
extract()
transform()
load()
}
}
./spark-submit \
--class Execute \
--jars path/to/jar/if/any.jar path/to/exported/jar/all_four.jar arg(0)......arg(n)