Search code examples
mergehivepartitioninghadoop-partitioninghive-partitions

How to merge existing hourly partitions to daily partition in hive


My requirement is to merge existing hourly partitions to daily partition for all days.

My partition column is like:

2019_06_22_00, 2019_06_22_01, 2019_06_22_02, 2019_06_22_03..., 2019_06_22_23 => 2019_06_22
2019_06_23_00, 2019_06_23_01, 2019_06_23_02, 2019_06_23_03..., 2019_06_23_23 => 2019_06_23

Solution

  • The easy way is to extract date from current partition column and load into new table.

    Create new table:

    create table new (
    ...
    ) 
    partitioned by (partition_date date);
    

    Then insert overwrite from old table:

    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    
    insert overwrite table new partition(partition_date )
    select
    col1,
    col2,
    ...
    coln,
    --extract hours if you need this column
    substr('old_partition_col',12,2) hour,
    --partition column is the last one
    date(concat_ws('-',substr(old_partition_col,1,4),substr(old_partition_col,6,2),substr(old_partition_col,9,2))) as partition_date 
    from old_table;
    

    Alternatively you can extract date using unix_timestamp and from_unixtime functions:

    from_unixtime(unix_timestamp(old_partition_col,'yyyy_MM_dd_HH'),'yyyy-MM-dd') as partition_date
    

    Then drop old table and rename new.