hive hiveql partition hive-partitions hiveddl

Changing the partition spec of a hive table and move data

I have a external hive table employee which is partitioned by extract_timestamp (yyyy-mm-dd hh:mm:ss) as below.

empid   empname     extract_time
1       abc         2019-05-17 00:00:00
2       def         2019-05-18 14:21:00

I am trying to remove the partition by extract_time and change it to year,month and day partition. I am following the below method for this.

1. Create a new table employee_new with partitions year month and day

 create external table employee_new
(empid int,
 empname string
)
partitioned by (year int,month int,day int)
location '/user/emp/data/employee_new.txt';

2. insert overwrite into employee_new by selecting data from employee table

insert overwrite into employee_new as select*,year(extract_time),month(extract_time)
,day(extract_time) 
from employee

3. Drop employee and employee_new and create employee table on top of /user/emp/data/employee_new.txt

Please let me know if this method is efficient and if there are any better ways to do the same.

Solution

Partition by date yyyy-MM-dd only, if possible, if upstream process can write hour files to daily folders. For such a small table partitioning by year, month and day separately seems overkill. It will be still too many folders. If table is partitioned by date yyyy-MM-dd, partition pruning will work for your usage scenario because you are querying by day or year or month.

To filter by year in this case you will provide

where date >= '2019-01-01' and date < '2020-01-01' condition,

to filter by month:

where date >= '2019-01-01' and date < '2020-02-01'

and day: where date = '2019-01-01'

Filesystem listing will work much faster.

And if it is not possible to redesign upstream process to write to yyyy-MM-dd folders then your new design as you described in the question (yyyy/MM/dd folders) is the only solution.