Search code examples
mysqlhadoophivesqoop

How to automatically sync a MySQL table with a Hive external table using Sqoop?


I'm already having a MySQL table in my local machine (Linux) it self, and I have a Hive external table with the same schema as the MySQL table.

I'm trying to import data from MySQL table to my Hive external table and I'm using Sqoop for this.

But then the problem is, whenever a new record is being added to the MySQL table, it doesn't update the Hive external table automatically?

This is the Sqoop import command I'm using:

sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -P --split-by id --columns id,name,age,salary --table customer --target-dir /user/chamith/mysqlhivetest/ --fields-terminated-by ","  --hive-import --hive-table test.customers

Am I missing something over here? Or how can this be done?

Any help could be appreciated.


Solution

  • In your case a new row appended to the table. So you need to use incremental append approach.

    When to use append mode?

    • Works for numerical data that is incrementing over time, such as auto-increment keys
    • When importing a table where new rows are continually being added with increasing row id values

    Now what you need to add in command

    -check-column Specifies the column to be examined when determining which rows to import.

    --incremental Specifies how Sqoop determines which rows are new.

    --last-value Specifies the maximum value of the check column from the previous import

    Ideal to perform this is using sqoop job as in this case sqoop metastore remembers the last value automatically

    Step 1 :Intially load data with normal import command.

    Step 2:

    sqoop job --create incrementalImportJob -- import \
    --connect jdbc:mysql://localhost:3306/sqoop 
    --username root 
    -P 
    --split-by id 
    --columns id,name,age,salary 
    --table customer 
     --incremental append \
     --check-column id \
     --last-value 5
    --fields-terminated-by ","  
    --target-dir hdfs://ip:8020/path/to/table/;
    

    Hope this helps..