Search code examples
sqldatehivecastingtimestamp

Finding the date before/after a date stored in int representation (Hive SQL)


I need to select between a range of dates centered on a specific date, but my partition column of dates is stored as int.

For example, selecting between the dates before and after '20210901' (20210831 to 20210902)

Is there any way to do that which is less convoluted than what I came up with below?

between cast(date_format(date_sub(date_format(from_unixtime(unix_timestamp(cast('20210901' as string),'yyyyMMdd')),'yyyy-MM-dd'),1),'yyyyMMdd') as int) and \
cast(date_format(date_add(date_format(from_unixtime(unix_timestamp(cast('20210901' as string),'yyyyMMdd')),'yyyy-MM-dd'),1),'yyyyMMdd');

Solution

  • unix_timestamp + from_unixtime conversion needed only when you have some date format which can not be transformed to yyyy-MM-dd format using string functions only for example 1 Jan 21 . SimpleDateFormat class used under thehood of unix_timestamp and from_unixtime is rather heavy and not simple at all, even regexp_replace is easier transformation.

    Use string manipulation if possible:

    select part_col between int(replace(date_sub(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-',''))
                        and int(replace(date_add(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-',''))
    

    Even Int() conversion is not necessary, Hive will convert implicitly:

    select part_col between replace(date_sub(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-','')
                        and replace(date_add(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-','')
    

    If you can provide argument in DATE format yyyy-MM-dd, it would be simpler:

    select part_col between replace(date_sub('2021-09-01',1),'-','')
                        and replace(date_add('2021-09-01',1),'-','')