Search code examples
hbaseclouderasqoop

Sqoop does not import NULL values to HBase


I have the following command that imports from Oracle to Hbase:

sqoop-import 
-Dsqoop.hbase.add.row.key=true 
-Dhbase.zookeeper.quorum=zk.localdomain 
--verbose 
--connect jdbc:oracle:thin:@127.0.0.1:1521:xe 
--username admin 
--password admin 
--query "select * from table WHERE \$CONDITIONS" 
--hbase-table hbase_table 
--column-family data 
--hbase-row-key ID 
--hbase-create-table 
--target-dir /user/sqoop/tmp_hbase_table 
--null-string 'empty string' 
--null-non-string '0' 
--split-by ID
-m 1

Code just works fine but columns with NULLs are not being imported to HBase.

I understand that HBase does not accept null values, but if I understand the docs correctly Sqoop should have converted them in "empty string" and "0" respectively.

Is there some magic system property I'm missing?

My Sqoop version in 1.4.6-cdh5.10.1.

Thanks in advance


Solution

  • null-string and null-non-string are the arguments supported only when importing data from SQL to Hive and not for HBase. Check the SQOOP documentaion for confirmation. https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hbase

    Hence the above problem can be solved using COALESCE operator. Instead of using * use column names and use COALESCE for those columns where the SQL field has null values by specifying default values when stored in HBase

    Eg:

       COALESCE(user_name,'xyz') as user_name, \
       COALESCE(password,'123') as password, \
       COALESCE(created_date, '9999-12-31 00:00:00.0000000') as created_date, \
       COALESCE(modified_date,'9999-12-31 00:00:00.0000000') as modified_date, \
       COALESCE(last_login_date,'9999-12-31 00:00:00.0000000') as lastlogin, \
       COALESCE(email_id,'0') as email_id, \
    

    Hope this solves your problem!!