Search code examples
sqlhadoophbasesqoop

SQOOP - Imported Failed: Can not create a Path from a null string


I am loading tables from SQL server to HBase table using SQOOP incremental update. But the null values in SQL tables are not imported into HBase. I understand that Hbase does not support null values and the field which contains null wont be present in Hbase. But my concern is when a particular column has null values for most records is being skipped even in case when the field has a value present in it for some records. Following is the SQL table structure

   CREATE TABLE [dbo].[user_test](
[user_id] [nvarchar](20) NOT NULL,
[user_name] [nvarchar](100) NULL,
[password] [varchar](128) NULL,
[created_date] [datetime2](7) NULL,
[modified_date] [datetime2](7) NULL,
[last_login_date] [datetime2](7) NULL,
[email_id] [nvarchar](100) NULL,
[security_question_id] [int] NULL,
[answered_count] [int] NULL,
[skip_count] [int] NULL,
[role_id] [smallint] NULL,
[use_yn] [char](1) NULL,
[first_login] [char](1) NULL,
[score] [int] NULL,
[secret_answer] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED 
(
[user_id] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
  ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

In the above table, the value for email_id is null in most of the records. But even for the records for which email_id value exists, is not being imported into Hbase table. The sqoop command was successful in getting the appended records in SQL. The SQOOP command is as follows:

   sqoop import 
   --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 

   --username "hadoop" --password "Semco123" 
   --query "SELECT CAST(user_id AS Integer) as 
    user_id,user_name,password,modified_date,last_login_date,email_id,security_question_id,answered_count,skip_count,role_id,use_yn,first_login,score,secret_answer from 

    ETL_interim_DB.dbo.user_test WHERE \$CONDITIONS" 
    --hbase-table test2 
    --column-family cf 
    --hbase-row-key user_id 
    --split-by user_id -m 1 
    --incremental append 
    --check-column user_id 
    --last-value 10

But following error was displayed.

Note: Recompile with -Xlint:deprecation for details.
0    [main] ERROR org.apache.sqoop.tool.ImportTool  - Imported Failed: Can 
not create a Path from a null string

Could anyone please suggest how to import all values present in SQL server into HBase and in case of Null values in SQL what excatly happens while importing them to Hbase tables?


Solution

  • The COALESCE operation worked for me to import null field in SQL to HBase by giving default values. Following is the sqoop command for the same:

        sqoop import 
        --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 
        --username "hadoop" --password "Semco123" 
        --query "SELECT CAST(user_id AS Integer) as user_id
        COALESCE(user_name,'xyz') as user_name, \
        COALESCE(password,'123') as password, \
        COALESCE(created_date, '9999-12-31 00:00:00.0000000') as created_date, \
        COALESCE(modified_date,'9999-12-31 00:00:00.0000000') as modified_date, \
        COALESCE(last_login_date,'9999-12-31 00:00:00.0000000') as lastlogin, \
        COALESCE(email_id,'0') as email_id, \
        COALESCE(security_question_id,-1) as security_question_id, \
        COALESCE(answered_count,-1) as answered_count, \
        COALESCE(skip_count,-1) as skip_count, \
        COALESCE(secret_answer, '0') as secret_answer, \
        COALESCE(role_id,0) as role_id, \
        COALESCE(use_yn,'0') as use_yn, \
        COALESCE(first_login,'0') as firstlogin, \
        COALESCE(score,-1) as score from ETL_interim_DB.dbo.ms_user_detail_test WHERE \$CONDITIONS" \
       --hbase-table test2 
       --column-family cf 
       --hbase-row-key user_id 
       --split-by user_id -m 1 
       --incremental append 
       --check-column user_id 
       --last-value 10