Search code examples
shellhdfssqoopoozieoozie-coordinator

sqoop job shell script execute parallel in oozie


I have a shell script which executes sqoop job. The script is below.

!#/bin/bash

table=$1

sqoop job --exec ${table}

Now when I pass the table name in the workflow I get the sqoop job to be executed successfully.

The workflow is below.

<workflow-app name="Shell_script" xmlns="uri:oozie:workflow:0.5">
<start to="shell"/>
<kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell_script">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>sqoopjob.sh</exec>
        <argument>test123</argument>
        <file>/user/oozie/sqoop/lib/sqoopjob.sh#sqoopjob.sh</file>
    </shell>
    <ok to="End"/>
    <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

The job executes successfully for table test123.

Now I have 300 sqoop jobs same like above. I want to execute 10 sqoop jobs in parallel. All the table names are in a single file.

Now I want to loop to the file and execute 10 sqoop jobs for first 10 tables and so on.

How can I do this? should I prepare 10 workflows? I am literally confused.


Solution

  • As @Samson Scharfrichter mentioned you can start parallel jobs in the shell script. Make a function runJob() in shell and run it in parallel. Use this template:

    #!/bin/bash
    
    runJob() {
    tableName="$1"
    #add other parameters here
    
    #call sqoop here or do something else
    #write command logs
    #etc, etc
    #return 0 on success, return 1 on fail
    
    return 0
    }
    
    #Run parallel processes and wait for their completion
    
    #Add loop here or add more calls
    runJob $table_name &
    runJob $table_name2 &
    runJob $table_name3 &
    #Note the ampersand in above commands says to create parallel process
    
    #Now wait for all processes to complete
    FAILED=0
    
    for job in `jobs -p`
    do
       echo "job=$job"
       wait $job || let "FAILED+=1"
    done
    
    if [ "$FAILED" != "0" ]; then
        echo "Execution FAILED!  ($FAILED)"
        #Do something here, log or send messege, etc
    
        exit 1
    fi
    
    #All processes are completed successfully!
    #Do something here
    echo "Done successfully"