Search code examples
scalaapache-sparkspark-submit

Executing All Objects in A Package Sequentially With Spark Submit


I'm looking for a way to execute all scala objects in a package sequentially using spark submit. Working on an ETL job. I have 4 scala objects (let say Object_1, Object_2, Object_3 and Object_4) all in one package,let's say etl. This package is then exported to a .jar file (say all_four.jar)

  • EXTRACTION - Object_1 and Object_2
  • TRANSFORMATION - Object_3
  • LOAD - Object_4

I know I can execute each object with the following spark submit command

./spark-submit --class etl.x --jars path/to/jar/if/any.jar path/to/exported/jar/all_four.jar arg(0)......arg(n)

where x represents each scala objects in the package.

However, I'm looking for a way to call the package only once and all objects will be executed in the following sequence:

  • Step 1 - Object_1 and Object_2 (Extraction) can be executed concurrently or maybe simultaneously. They just have to be both completed
  • Step 2 - Object_3 (Transformation ) is executed
  • Step 3 - Object_4 (Load) is executed

Is there a way to do this with Spark Submit? Or are there better and more efficient ways to pull this off?


Solution

  • One way is to write wrapper object (Execute) which contains step 1, step 2 & step 3 logics to invoke all in sequence. This wrapper object you can include along with those four objects if you have source code access.

    Please find sample wrapper looks like below & You may need to modify according your need.

    import etl.{Object_1,Object_2,Object_3,Object_4}
    
    object Execute {
    
        def extract() = {
          // Make use of Object_1 & Object_2 logic here.
        }
        
        def transform() {
          // Make use of Object_3 logic here.
        }
        
        def load() {
          // Make use of Object_4 logic here.   
        }
        
        def main(args: Array[String])  
        {        
           extract()
           transform()
           load()      
        } 
        
    }
    
    ./spark-submit \
    --class Execute \
    --jars path/to/jar/if/any.jar path/to/exported/jar/all_four.jar arg(0)......arg(n)