Search code examples
hadoophivecreate-tablehive-partitionshiveddl

How to create partitioned hive table on dynamic hdfs directories


I am having difficulty in getting hive to discover partitions which are created in HDFS

Here's the directory structure in HDFS

warehouse/database/table_name/A
warehouse/database/table_name/B
warehouse/database/table_name/C
warehouse/database/table_name/D

A,B,C,D being values from a column type

when I create a hive table using the following syntax

CREATE EXTERNAL TABLE IF NOT EXISTS 
table_name(`name` string, `description` string) 
PARTITIONED BY (`type` string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION 'hdfs:///tmp/warehouse/database/table_name'

I am unable to see any records when I query the table.

But when I create directories in HDFS as below

warehouse/database/table_name/type=A
warehouse/database/table_name/type=B
warehouse/database/table_name/type=C
warehouse/database/table_name/type=D

It works and discovers partitions when I check using show partitions table_name

Is there some configuration in hive to able to detect dynamic directories as partitions?


Solution

  • Creating external table on top of some directory is not enough, partitions needs to be mounted also. Discover partitions feature added in Hive 4.0.0. Use MSCK REPAIR TABLE for earlier versions:

    MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
    

    or it's equivalent on EMR:

    ALTER TABLE table_name RECOVER PARTITIONS;
    

    And when you creating dynamic partitions using insert overwrite, partition metadata is being created automatically and partition folders are in the form key=value.