Search code examples
hadoopsqoopparquet

sqoop import as parquet file to target dir, but can't find the file


I have been using sqoop to import data from mysql to hive, the command I used are below:

sqoop import --connect jdbc:mysql://localhost:3306/datasync \
    --username root --password 654321 \
    --query 'SELECT id,name FROM test WHERE $CONDITIONS' --split-by id \
    --hive-import --hive-database default --hive-table a \
    --target-dir /tmp/yfr --as-parquetfile

The Hive table is created and the data is inserted, however I can not find the parquet file.

Does anyone know?

Best regards,

Feiran


Solution

  • Sqoop import to hive works in 2 steps:

    • Fetching data from RDBMS to HDFS
    • Create hive table if not exists and Load data into hive table

    In your case,

    firstly, data is stored at --target-dir i.e. /tmp/yfr

    Then, it is loaded into Hive table a using

    LOAD DATA INPTH ... INTO TABLE.. command.

    As mentioned in the comments, data is moved to hive warehouse directory that's why there is no data in --target-dir.