Search code examples
sqlhivetimestamphiveqlunix-timestamp

HIVE converting unix timestamp for calculation


I'm trying to perform subtraction among timestamps and would like to convert the timestamps in forms that can be converted to minutes.

I used regexp_replace to convert timestamp in such form:

2020-06-20T17:25:59:378Z

The following code will convert it to seconds

unix_timestamp(regexp_replace(value,'(.*?)T(.*?):([^:]*?)Z$','$1 $2\\.$3'))  

I have other two timestamps that I wish to convert to seconds, such as:

2020-03-19 15:45:33
03-19-2020 11:07:25:103

How should I convert these two timestamp to seconds by using regexp_replace() or any other functions?

Thank you!


Solution

  • First of all, unix_timestamp returns seconds passed from unix epoch. It does ignore milliseconds. This is why if you want epoch time in seconds, you can extract only 'yyyy-MM-dd HH:mm:ss'.

    Second, If you have all these different formats in single dataset and you want to convert them all, you can use case statement to check pattern and convert accordingly:

    with your_data as ( --This is your data example
    select stack(3,
                 '2020-06-20T17:25:59:378Z',
                 '2020-03-19 15:45:33',
                 '03-19-2020 11:07:25:103'
                ) as str
    )
    
    select case when str rlike '^(\\d{4}-\\d{2}-\\d{2})[T ](\\d{2}:\\d{2}:\\d{2})' --matches first two strings
                 then unix_timestamp(regexp_replace(str,'^(\\d{4}-\\d{2}-\\d{2})[T ](\\d{2}:\\d{2}:\\d{2})','$1 $2'))
                when str rlike '^(\\d{2})-(\\d{2})-(\\d{4})[T ](\\d{2}:\\d{2}:\\d{2})' --matches third string, allows T or space after date
                 then unix_timestamp(regexp_replace(str,'^(\\d{2})-(\\d{2})-(\\d{4})[T ](\\d{2}:\\d{2}:\\d{2})','$3-$1-$2 $4'))
            end result_unix_timestamp
    from your_data
    

    Returns:

    result_unix_timestamp
    1592673959
    1584632733
    1584616045
    

    You can add more patterns to the case with corresponding conversion and in such way convert all possible cases. And of course, not necessarily all cases should use regex_replace for conversion. Though regex allows to identify and parse most complex strings.

    Also you can try to convert using one pattern and if it returns null then try to convert using another pattern and so on:

    coalesce(unix_timestamp(regexp_replace(str,'^(\\d{4}-\\d{2}-\\d{2})[T ](\\d{2}:\\d{2}:\\d{2})','$1 $2')),
             unix_timestamp(regexp_replace(str,'^(\\d{2})-(\\d{2})-(\\d{4})[T ](\\d{2}:\\d{2}:\\d{2})','$3-$1-$2 $4'))
            )