Search code examples
oracle-databasehadoophivesqoop

Sqoop Incremental import and update


I am trying to import data from sql into a hive database. The goal is to update the changes in the oracle database to hive using sqoop import. The sqoop command is as follows:

sqoop import -D mapred.child.java.opts='\-Djava.security.egd=file:/dev/../dev/urandom' 
--connect jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1545))(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1545)))(CONNECT_DATA=(SERVICE_NAME=)(SERVER=DEDICATED)))'
 --username abcde  
--password 1234rgtds  
--table Customer_Acc 
--columns Name,ID,Address,Date_booking ,Last_update_date
-m 1
 --target-dir /final/table 
 --hive-import 
--hive-table tesupd 
--map-column-hive Name,ID,Address,Date_booking
 --null-string '\\N' 
--null-non-string '\\N' 
--hive-delims-replacement ' ' 
--incremental lastmodified 
--check-column Last_update_date
--last-value "2009-12-31 12:14:28"

The final output should be the data greater than the last value, but in the above case it is appending the data instead of incrementally updating it.

I want the data to be updated rather than appended.


Solution

  • use --merge-key option in your sqoop-import command. This will replace the older records with the latest records.

    Alternately you can use sqoop-merge command as well but it should be done in two steps. First sqoop-import without merge-key and then sqoop-merge