Search code examples
hdfsimpala

IMPALA: Create new partitioned table based on unpartitioned table


I want to create a new partitioned table based on another unpartitioned table. The new table should be partitioned by a column of the old table. Then I want to load all the old data in the new table.

CREATE TABLE new_table PARTITIONED BY (id) STORED AS PARQUET AS SELECT * FROM old_table

Like stated here*, id is expected to be the LAST column but it's the first column in old_table. old_table contains lots of columns, so I don't want to list all columns. What can I do?

    *-- We expect this CTAS to fail because non-key column S
-- comes after key columns YEAR and MONTH in the select list.
create table partitions_maybe partitioned by (year, month)
  as select year, month, s from partitions_no;
ERROR: AnalysisException: Partition column name mismatch: year != month

Solution

  • If you don't mind to replicate the column information at record level, you can do this

    CREATE TABLE new_table PARTITIONED BY (id_partition) STORED AS PARQUET AS SELECT *, id as id_partition FROM old_table
    

    You will not be able to do it in a different way in Impala. In Hive you have some options to select all the columns but others using regular expressions.