Search code examples
mysqlhadoopsqoop

Can we control $CONDITIONS in Sqoop?


$Conditions break the freeform query in different splits based on the placeholder decided by itself. Say, we have a query which gives result for 1000 records. By default it will be broken by $CONDITIONS into 4 different queries with boundary conditions

(1,250) (251,500) (501,750) and (751,1000).

What can we do to achieve query splits according to our requirements?


Solution

  • You can't choose query partition offsets. You can control two things:

    • --boundary-query <statement> for creating splits.
    • --num-mappers for controlling number of splits.

    and obviously --split-by column.

    Choosing boundaries for each split sounds like a good idea. But it's very costly to get such insights from data.

    How will you know split points?

    By iterating whole data of that particular column and creating some logic to create ideal partition.

    But you can run sqoop job (using default partition) faster than this iteration.

    I guess that's why people are not much interested in this feature.