Search code examples
hadoophiveunix-timestamp

how to verify and match different date formats in hive


I have below dates in my hive table :

Jan 2014
Oct-13
8-Nov
8-Oct
30-Nov-11

I need to convert them in the 'yyyy-MM-dd' format. I have used from_unixtime(unix_timestamp(change_log_date ,'yyyyMMdd'), 'yyyy-MM-dd') to covert date format which is working fine for 30-Nov-11 however since I have different date formats in the data so how to write generic code which will check date format and convert it into 'yyyy-MM-dd'. I need to put 0 for day/month/year if its not present.

for eg. I need to convert 8-Oct into '0000-10-08'

need help


Solution

  • If you know all possible date format, you can use case statement to apply the correct format for each line:

    CASE WHEN change_log_date rlike '[0-9]{2}-[a-zA-Z]{3}-[0-9]{2}' 
         THEN from_unixtime(unix_timestamp(change_log_date ,'yyyyMMdd'), 'yyyy-MM-dd')
         WHEN change_log_date rlike --other regex 
         THEN -- and so on
    

    If you are more familiar with java or python, you can easily write your piece of code and use it as Hive function.