Search code examples
hadooptimestamphdfsimpala

Comparing millisecond timestamps in HDFS


I have 2 timestamp columns stored in an HDFS that I can access through Impala, hive, etc...

The timestamps that I need to compare may look like this example:

2014-04-08 00:23:21.687000000

2014-04-08 00:23:21.620000000

With differences in the milliseconds, and need to build a new column that in this example should have a value of 0.067000

I've tried using impala's built in time functions but none of them seem to make the cut. I've tried:

  1. casting the string to a timestamp and then substracting the 2 values. This returns an error "AnalysisException: Arithmetic operation requires numeric operands"
  2. using the unix_timestamp function. This truncates the values to an int that represent seconds, so subsecond values are lost.

Solution

  • While writting this question I found the answer :)

    The way to do it was using a double cast.

    Cast(cast(time_stamp) as timestamp) as double)

    this makes the times_stamp into a number without truncating sub-second values.

    Once there it becomes a trivial arithmetic operation.