Search code examples
hadoophivehiveqlsqoop

Sqoop: How to export table from hive union resulting


My use case is that I try to merge two tables in hive (external table) using union all, but the problem is that this union generates two directories (SUB_DIR_1 and SUB_DIR_2) instead files (SUCCES_, 0000, 0001), it complicates the Sqoop export using (export-dir).

CREATE EXTERNAL TABLE test(id INT, name STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION /user/foo/test;

INSERT OVERWRITE TABLE test SELECT * FROM (
  SELECT * FROM test-1
  UNION ALL
  SELECT * FROM test-2
);

Test-1

 id    name
 1     coco
 2     bango

Test-2

 id    name
 3     goo
 4     boo

Test

 id     name
 1     coco
 2     bango
 3     goo
 4     boo

Sqoop command:

 sqoop export –connect jdbc:mysql://db.example.com/foo --table test --export-dir /user/foo/test

When I execute the Sqoop command I have the error:

/user/foo/test.SUB_DIR_1 is not file


Solution

  • As possible workaround you can trigger reducer step when loading table by adding DISTRIBUTE BY:

    INSERT OVERWRITE TABLE test 
    SELECT * 
    FROM 
    (
      SELECT * FROM test-1
      UNION ALL
      SELECT * FROM test-2
    ) 
    DISTRIBUTE BY ID;
    

    This will load into table location without sub-directories.