I am trying to import data from sql into a hive database. The goal is to update the changes in the oracle database to hive using sqoop import. The sqoop command is as follows:
sqoop import -D mapred.child.java.opts='\-Djava.security.egd=file:/dev/../dev/urandom'
--connect jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1545))(ADDRESS=(PROTOCOL=TCP)(HOST=)(PORT=1545)))(CONNECT_DATA=(SERVICE_NAME=)(SERVER=DEDICATED)))'
--username abcde
--password 1234rgtds
--table Customer_Acc
--columns Name,ID,Address,Date_booking ,Last_update_date
-m 1
--target-dir /final/table
--hive-import
--hive-table tesupd
--map-column-hive Name,ID,Address,Date_booking
--null-string '\\N'
--null-non-string '\\N'
--hive-delims-replacement ' '
--incremental lastmodified
--check-column Last_update_date
--last-value "2009-12-31 12:14:28"
The final output should be the data greater than the last value, but in the above case it is appending the data instead of incrementally updating it.
I want the data to be updated rather than appended.
use --merge-key
option in your sqoop-import
command. This will replace the older records with the latest records.
Alternately you can use sqoop-merge
command as well but it should be done in two steps. First sqoop-import
without merge-key
and then sqoop-merge