I'm trying to create a parquet table with partition from existing Kudu table in Impala without having to re-specify the columns with data types again.
CREATE TABLE IF NOT EXISTS db_name.parquet_table
PARTITIONED BY (`year` SMALLINT, `month` TINYINT, `day` TINYINT)
STORED AS PARQUET
As SELECT * FROM db_name.kudu_table limit 0
I get an error message saying
ParseException: Syntax error in line 4:undefined: As SELECT * FROM db_name.parquet_table limit 0 ^ Encountered: AS Expected CAUSED BY: Exception: Syntax error
But When I try to create the parquet table without partition it works using:
CREATE TABLE IF NOT EXISTS db_name.parquet_table STORED AS PARQUET
AS SELECT * FROM db_name.kudu_table LIMIT 0
But When I try to add partition using:
ALTER TABLE db_name.parquet_table ADD PARTITION(`year`=0,`month`=0,`day`=0)
I get the following error message:
AnalysisException: Table is not partitioned: db_name.parquet_table
You could try to load data with
STATIC PARTITIONING
One way to load data into a partitioned table is to use static partitioning, in which you manually define the different partitions. (The other way is to have the partitions automatically defined when you load the data, this is called Dynamic Partitioning.)
With static partitioning, you create a partition manually, using an ALTER TABLE … ADD PARTITION
statement,
and then load the data into the partition.
With your example I would try this.
First. Create the partitioned table
CREATE TABLE IF NOT EXISTS db_name.parquet_table(field1 [TYPE],...,fieldN [TYPE]) --Fields that not belongs to the partitioned columns
PARTITIONED BY (year SMALLINT, month TINYINT, day TINYINT)
STORED AS PARQUET;
Second. ALTER TABLE
statement to create the partition.
ALTER TABLE db_name.parquet_table ADD PARTITION(year = 0,month = 0,day = 0);
Notice how the partition column name and the specific value that defines this partition, are both specified in the ADD PARTITION
clause.
This creates a partition directory inside the table directory.
After the partition is created, you can add data into the partition using an INSERT … SELECT
statement
INSERT OVERWRITE TABLE db_name.parquet_table
PARTITION(year = 0, month = 0, day = 0)
SELECT field1,..., fieldn --do not include partitioned columns
FROM db_name.kudu_table
WHERE year = 0 AND month = 0 AND day = 0;
With static partitioning, you need to repeat these two steps for each partition:
first create the partition, then add data.
You can actually use any method to load the data; you need not use an INSERT
statement.
You could instead use hdfs dfs
commands or a LOAD DATA INPATH
command.
But however you load the data, you have to be careful to ensure that data is stored in the correct partition subdirectories.