Search code examples
hivecomparisondate-formathadoop-partitioning

Drop partitions in Hive with different date format in the same partition column


I have 2 types of value in the partition column of string datatype:

  1. yyyyMMdd
  2. yyyy-MM-dd

E.g. there are partition column values 20200301, 2020-03-05, 2020-05-07, 20200701, etc.

I need to drop partitions less than 20200501 with a DDL statement like

alter table tblnm drop partition(partcol < 20200501);

When I drop partitions using yyyy-MM-dd format, only 2020-03-05 partition is dropped. But when I drop partitions using yyyyMMdd format, 20200301 is dropped as well as all the partitions containing hyphen (-).

How to compare value by ignoring hyphen or by ignoring data that contains hyphen? I can replace hyphen in the alter table query if necessary.


Solution

  • The reason why all the partitions with hyphen get dropped is the comparison of strings in java: every 2020-XX-XX string is less than every 2020XXXX string.

    To restrict partitions to those without hyphen you should add a lower bound to the DDL in the format yyyy0101:

    alter table tblnm drop partition(partcol < 20200501, partcol >= 20200101);
    

    Beware that the partitions for the previous years won't be dropped, but you're free to run something like

    alter table tblnm drop partition(partcol <= 20191231, partcol >= 20190101);
    

    whenever you need.