I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.
The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.
Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!
I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi