What is the significance of $conditions clause in sqoop import command?
select col1, col2 from test_table where \$CONDITIONS
Sqoop performs highly efficient data transfers by inheriting Hadoop’s parallelism.
To help Sqoop split your query into multiple chunks that can be transferred in parallel, you need to include the $CONDITIONS placeholder in the where clause of your query.
Sqoop will automatically substitute this placeholder with the generated conditions specifying which slice of data should be transferred by each individual task.
While you could skip $CONDITIONS by forcing Sqoop to run only one job using the --num-mappers 1 param‐ eter, such a limitation would have a severe performance impact.
For example:-
If you run a parallel import, the map tasks will execute your query with different values substituted in for $CONDITIONS. one mapper may execute "select bla from foo WHERE (id >=0 AND id < 10000)", and the next mapper may execute "select bla from foo WHERE (id >= 10000 AND id < 20000)" and so on.