I am trying to use the HIVE UDFs
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions)
from Sparklyr
to read-in properly some timestamps.
Unfortunately, I have not been able to parse correctly the following timestamp:
unix_timestamp('2011-03-01T00:00:04.226Z', 'yyyy-MM-ddThh:mm:ss.SSS' )
returns NA
s..
Any ideas? What is the right pattern here? THanks!
You need to quote T
and Z
hive> select unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" );
OK
1298959204
Or try this if you are not afraid to be clumsy:
select unix_timestamp(cast(regexp_replace('2011-03-01T00:00:04.226Z', '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})Z', '$1-$2-$3 $4:$5:$6.$7' ) as timestamp))
To convert from EST
to UTC
, use the following:
hive> select to_utc_timestamp(unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" )*1000, 'EST');
OK
2011-03-01 05:00:04
Multiplication with 1000
is required because, from Hive Language Manual:
Fractional values are considered as seconds. Integer values are considered as milliseconds.. E.g to_utc_timestamp(2592000.0,'PST'), to_utc_timestamp(2592000000,'PST') and to_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all return the timestamp 1970-01-31 00:00:00