Search code examples
hadoophivesparklyr

Hive: how to convert millisecond timestamps?


I am trying to use the HIVE UDFs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions) from Sparklyr to read-in properly some timestamps.

Unfortunately, I have not been able to parse correctly the following timestamp:

unix_timestamp('2011-03-01T00:00:04.226Z', 'yyyy-MM-ddThh:mm:ss.SSS' ) 

returns NAs..

Any ideas? What is the right pattern here? THanks!


Solution

  • You need to quote T and Z

    hive> select unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" );
    OK
    1298959204
    

    Or try this if you are not afraid to be clumsy:

    select unix_timestamp(cast(regexp_replace('2011-03-01T00:00:04.226Z', '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})Z', '$1-$2-$3 $4:$5:$6.$7' ) as timestamp))
    

    To convert from EST to UTC, use the following:

    hive> select to_utc_timestamp(unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" )*1000, 'EST');
    OK
    2011-03-01 05:00:04
    

    Multiplication with 1000 is required because, from Hive Language Manual:

    Fractional values are considered as seconds. Integer values are considered as milliseconds.. E.g to_utc_timestamp(2592000.0,'PST'), to_utc_timestamp(2592000000,'PST') and to_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all return the timestamp 1970-01-31 00:00:00