Search code examples
hivesqoopbigdata

Sqoop Hive import


I am trying to import a table from MySql database to hive table to understand how hive import works.The table name is device which I have already imported in HDFS in home directory in HDFS. I created a table in hive using below statement.

create table device_hive (device_num int,device_name varchar(255));

Now I am executing below sqoop import statement to get the data from device table in Mysql database to Hive

sqoop import --connect jdbc:mysql://localhost/loudacre --table device
--username training --password training --hive-import --hive-table device_hive 
--hive-database hadoopexam --hive-table device_hive --fields-terminated-by '\001' 
--table device --columns "device_num,device_name"

Its failing stating that output directory device is already exists.The location in error message points to the device folder in HDFS which I imported using sqoop earlier.

My question is why sqoop is going to base directory and checking for that folder. It's a hive-import so shouldn't sqoop just go to hive/warehouse directory? I delete that folder from HDFS and it works fine. Any suggestions.


Solution

  • According to the documentation

    By default, imports go to a new target location. If the destination directory already exists in HDFS, Sqoop will refuse to import and overwrite that directory’s contents.

    So if you already imported the table into HDFS, the error that the device directory already exists would be the normal behavior. When you add the --hive-import command, Sqoop copies the data into HDFS (in your case, in the device directory) and updates the Hive metastore with the data's layout.

    You could use the --hive-overwrite option in order to tell Sqoop to overwrite the table:

    If the Hive table already exists, you can specify the --hive-overwrite option to indicate that existing table in hive must be replaced.

    I hope this helps.