I have a dept_nopk
table . this table has two columns, department_id
and department_name
and this table does not have any primary key. I am trying to import this table in HDFS using following SQOOP command.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where department_id <> 1000"
--target-dir /user/cloudera/departments
-m 1
but it is giving following error:
java.io.IOException: query[select * from dept_nopk where department_id <> 1000] must contain $CONDITION in where clause
My questions now are:
1) Why am I getting this error when I have already specified that the number of maps are just 1? is it necessary to put $CONDITION
in my where
clause?
2) The second time I changed my query as follows
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where \$CONDITIONS"
--target-dir /user/cloudera/departments
--where "department_id <> 8000"
-m 1
I used --where
for the department_id<>8000
and specified \$CONDITIONS
. Now I did not get any error but in the results it imported rows with department_id =8000
as well, i.e. it totally ignored the --where
argument. Why?
3) I tried following and it worked perfectly.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db"
--username useranem
--password pass
--query "select * from dept_nopk where department_id <> 8000 and \$CONDITIONS"
--target-dir /user/cloudera/departments
-m 1
Could you please explain the reasoning behind the errors described in 1) and 2?
Question 1
Adding $CONDITION
is mandatory in Sqoop even with 1 mapper because Sqoop transform queries for 1 mapper like -
To fetch metadata:
select * from dept_nopk where 1 = 0
replaces \$CONDITIONS
with 1= 0
to fetch all data:
select * from dept_nopk where 1 = 1
replaces \$CONDITIONS
with 1= 1
Question 2
Check similar question : SQOOP --where is not working with --query