Search code examples
mysqlshellhadoopoozie

How to pass shell script arguments to a oozie


I have a shell script to import data from MySQL to hdfs i.e sqoop shell script. I want to use oozie to schedule the sqoop import jobs.

The script has the following sqoop query

sqoop import --connect ${domain}:${port}/${database} --username ${username} --password ${password} --query "select * from ${table} where  \$CONDITIONS" -m 1  --hive-import --hive-database ${hivedatabase} --hive-table ${table}  --target-dir  /user/hive/warehouse/${hivedatabase}.db/${table} 

I have all these arguments in another .sh file. Now I want to pass these arguments in the workflow.xml file. Or should I pass these arguments in the job.properties file.

The argument ${table} is a variable. There are 1000 tables which I would like to run the same script in parallel.

How can I do that. please can anyone explain


Solution

  • Oozie doesnot support cyclic operation i.e you cannot call same action multiple times in a loop.

    There are multiple ways to your task. I would suggest the below :

    1. Create a property file with all 1000 tables.
    2. Either have a shell script or java code to generate the sqoop query above by replacing the $table for each table in the property file. i.e you will end up with 1000 sqoop executable query.

    Now that you have a shell script or java code that dynamically generates and executes sqoop commands, you could create single shell action or a java action to execute your job via oozie.

    Also, Running too many sqoop jobs in parallel might use up your JVM's RAM and has performance impact.