Search code examples
hadoopoozieoozie-coordinator

Oozie iterative workflow


I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.

The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.

Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!


Solution

  • I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.

    hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata'   table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp        
    
    if $(hadoop fs -test -e $hdfs_path)
    then
    for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
    do
     echo ${file}
     TABLENAME=$(hadoop fs -cat ${file});
     echo $TABLENAME
     HDFSPATH=$table_temp_path
     sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
    done
    fi