I want to create a new partitioned table based on another unpartitioned table. The new table should be partitioned by a column of the old table. Then I want to load all the old data in the new table.
CREATE TABLE new_table PARTITIONED BY (id) STORED AS PARQUET AS SELECT * FROM old_table
Like stated here*, id is expected to be the LAST column but it's the first column in old_table. old_table contains lots of columns, so I don't want to list all columns. What can I do?
*-- We expect this CTAS to fail because non-key column S
-- comes after key columns YEAR and MONTH in the select list.
create table partitions_maybe partitioned by (year, month)
as select year, month, s from partitions_no;
ERROR: AnalysisException: Partition column name mismatch: year != month
If you don't mind to replicate the column information at record level, you can do this
CREATE TABLE new_table PARTITIONED BY (id_partition) STORED AS PARQUET AS SELECT *, id as id_partition FROM old_table
You will not be able to do it in a different way in Impala. In Hive you have some options to select all the columns but others using regular expressions.