Search code examples
sqoop

SQOOP import error with --query without where $CONDITION


I have a dept_nopk table . this table has two columns, department_id and department_name and this table does not have any primary key. I am trying to import this table in HDFS using following SQOOP command.

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where department_id <> 1000"
--target-dir /user/cloudera/departments
-m 1

but it is giving following error:

java.io.IOException: query[select * from dept_nopk where department_id <> 1000] must contain $CONDITION in where clause

My questions now are:

1) Why am I getting this error when I have already specified that the number of maps are just 1? is it necessary to put $CONDITION in my where clause?

2) The second time I changed my query as follows

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where \$CONDITIONS"
--target-dir /user/cloudera/departments
--where "department_id <> 8000"            
-m 1

I used --where for the department_id<>8000 and specified \$CONDITIONS. Now I did not get any error but in the results it imported rows with department_id =8000 as well, i.e. it totally ignored the --where argument. Why?

3) I tried following and it worked perfectly.

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where department_id <> 8000 and \$CONDITIONS"
--target-dir /user/cloudera/departments          
-m 1 

Could you please explain the reasoning behind the errors described in 1) and 2?


Solution

  • Question 1

    Adding $CONDITION is mandatory in Sqoop even with 1 mapper because Sqoop transform queries for 1 mapper like -

    To fetch metadata:

    select * from dept_nopk where 1 = 0
    

    replaces \$CONDITIONS with 1= 0

    to fetch all data:

    select * from dept_nopk where 1 = 1
    

    replaces \$CONDITIONS with 1= 1

    Question 2

    Check similar question : SQOOP --where is not working with --query