hadoop hive hdfs hive-partitions hiveddl

Hive - static partitioning - difference between creating the partition directory directly vs using alter table statement

Are there any internal/performance difference between the below two statements for creating static partitioning in hive, I have tried both ways and both of them are working without any issues after loading the data into partition

dfs -mkdir /user/cloudera/sqoop_import/avroData/orders_part/order_month=2014-02;
alter table orders_part add partition(order_month='2014-02');

Solution

This command: dfs -mkdir /user/cloudera/sqoop_import/avroData/orders_part/order_month=2014-02; does not create partition, it creates a directory. This directory is not mounted as a table partition yet. Partition is a directory plus a metadata containing information about partition (key value+partition directory) stored in metastore. You can check it easily using show partitions orders_part; command after executing mkdir. This directory will not be in the partitions list.

alter table orders_part add partition(order_month='2014-02'); Creates a directory order_month=2014-02 and mounts it as a partition.

Partitions can be created dynamically using

insert overwrite table orders_part partition(order_month) 
select ...

command. In this case directories will be created automatically and mounted as partitions.

Consider this: You can make a partition not necessarily located in directory equal to 'key=value'. For example: alter table orders_part add partition(order_month='2014-02') location '/user/cloudera/sqoop_import/avroData/orders_part/mydir' ; Note the partition directory is now '/user/cloudera/sqoop_import/avroData/orders_part/mydir'.