Search code examples
shellhadoophbasesqoopsqoop2

How can we automate the incremental import in Sqoop from DB to HBase using linux script


Using sqoop job we can do the incremental load to HBase using the --lastval

But how we can do the same with shell script and how we will get the --lastval when we automate the script ?

I mean how to store the --lastval and how to pass it to the next time

Thankyou in Advance for the help !!


Solution

  • how to store the --lastval and how to pass it to the next time?

    define --lastval as an alias or export variable in linux or unix. which can be retried from automation script

    once load is finish then change it to recent value, by capturing latest lastval

    export lastupdatedvalue=hive -e 'select last_value from table #selection query based on the logic.

    sqoop import --connect jdbc:mysql://localhost:3306/ydb --table <your table> --username root -P --incremental append --last-value ${lastupdatedvalue}
    

    :

    You can try a sqoop action in Oozie, it should work. Else use shell action in oozie if you prefer scripts. It is possible to schedule it using coordinator function available.

    Also have a look at

    Importing data incrementally

    To import data in increments, you use the --incremental argument with the import command. Sqoop compares the values in a check column against a reference value for the most recent import. These arguments import all rows having an id greater than 100.

    --incremental
    --check-column id
    --last-value 100
    

    If you run an incremental import from the command line, Sqoop prints the last value in a subsequent incremental import. If you run an incremental import from a saved job, Sqoop retains the last value in the saved job. To import only newer rows than those previously imported, use the --exec option. Sqoop imports only rows having an id greater than the specified row id.