I have a shell script to import data from MySQL to hdfs
i.e sqoop
shell script. I want to use oozie
to schedule the sqoop import jobs.
The script has the following sqoop query
sqoop import --connect ${domain}:${port}/${database} --username ${username} --password ${password} --query "select * from ${table} where \$CONDITIONS" -m 1 --hive-import --hive-database ${hivedatabase} --hive-table ${table} --target-dir /user/hive/warehouse/${hivedatabase}.db/${table}
I have all these arguments in another .sh
file. Now I want to pass these arguments in the workflow.xml file. Or should I pass these arguments in the job.properties file.
The argument ${table}
is a variable. There are 1000 tables which I would like to run the same script in parallel.
How can I do that. please can anyone explain
Oozie doesnot support cyclic operation i.e you cannot call same action multiple times in a loop.
There are multiple ways to your task. I would suggest the below :
Now that you have a shell script or java code that dynamically generates and executes sqoop commands, you could create single shell action or a java action to execute your job via oozie.
Also, Running too many sqoop jobs in parallel might use up your JVM's RAM and has performance impact.