Search code examples
hadoophivesqoopparquetpartition

sqoop hive import with partitions


I have some sqoop jobs importing into hive that I want to partition, but I can't get it to function. The import will actually work: the table is sqooped, it's visible in hive, there's data but the partition parameters I'm expecting to see don't appear when I describe the table. I HAVE sqooped this table as a csv, created an external parquet table, and inserted the data into that (which works), but I want to be able to avoid the extra steps if possible. here's my current code. Am I missing something or am I trying to do the impossible? thanks!

sqoop import -Doraoop.import.hint=" " \
--options-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:@ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user]\
--num-mappers 10 \
--hive-import \
--query "select DISC_PROF_SK_ID, CLM_RT_DISC_IND, EASY_PAY_PLN_DISC_IND, TO_CHAR(L40_ATOMIC_TS,'YYYY') as YEAR, TO_CHAR(L40_ATOMIC_TS,'MM') as MONTH from ${DataSource[index]}.$TableName where \$CONDITIONS" \
--hive-database [dru_user] \
--hcatalog-partition-keys YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hive-table $TableName'testing' \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile \

Solution

  • You can replace the options-file by --password-file. However that will not solve the partition problem. For the partition problem you can try creating the partition-ed table $TableName partitioned first before the import.

    sqoop import -Doraoop.import.hint=" "               \
      --password-file /home/[user]/pass.txt             \
      --verbose                                         \
      --connect jdbc:oracle:thin:@ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051                              \
      --username [user]                                 \
      --num-mappers 10                                  \
      --hive-import                                     \
      --query "SELECT disc_prof_sk_id, 
           clm_rt_disc_ind, 
           easy_pay_pln_disc_ind, 
           To_char(l40_atomic_ts,'YYYY') AS year, 
           To_char(l40_atomic_ts,'MM')   AS month 
        FROM   ${DataSource[index]}.$tablename 
        WHERE  \$conditions"                            \
      --hcatalog-database [dru_user]                    \
      --hcatalog-partition-key     YEAR                 \
      --hcatalog-partition-values '2015'                \
      --target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName                                                   \
      --hcatalog-table $TableName                       \
      --split-by ${SplitBy[index]}                      \
      --delete-target-dir                               \
      --direct                                          \
      --null-string '\\N'                               \
      --null-non-string '\\N'                           \
      --as-parquetfile