Search code examples
hivetimestampunix-timestamp

Transform every row in a column to date, using first unix_timestamp


I have rows with the following format and I would like to transform then into valid Hive timestamps. Format in my data:

28/04/2017 00:00:00|20550|22/05/2017 00:00:00|

I'm only interested in the first and third column, separated with |, in MY case the format is, then:

dd/MM/yy HH:mm:ss

I've discovered this can't be used as timestamp in Hive.

I find myself unable to transform all that first and third column to the proper format using queries similar to:

select from_unixtime(unix_timestamp('28/04/2017','dd/MM/yy HH:mm:ss'),'yyyy-MM-dd') from `20170428_f_pers_pers`

I'm trying different instances of that query but since I can't access the documentation (internet is capped here at work), I can't see how to properly use this two functions, from_unixtime and unix_timestamp

I've made the following assumptions:

  1. I can reorder the days and years. If this isn't true, I have no idea how to transform my original data into proper Hive format
  2. When I do this select, it affects the whole column. Further, after doing this with success I should be able to change the format of the whole column from string to timestamp (maybe I have to create a new column for that, not sure)

I do not care about doing both columns at once, but right now when I do the query showed first I get as many nulls as data has my table, and I'm unsure my assumptions are even partially right since every example I come accross is simpler (they do not change days and years arround, for instance).

I would like to know how to apply the query to a specific column, since I haven't understood how to do that from the examples studied so far. I do not see them using any type of column ID for that, which is weird to me, using data from the column to change the column itself.

Thanks in advance.

edit: I am now trying something like

select from_unixtime(unix_timestamp(f_Date, 'dd/MM/yyyy HH:mm:ss')) from `myTable`

But I get from HUE the following error:

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Solution

  • The format should be completely covered by the input string.
    In other words -
    The format can be equal in length to the the input string or shorter, but not longer.

    28/04/2017 00:00:00
    |||||||||||||||||||
    dd/MM/yyyy HH:mm:ss
    

    select  from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy HH:mm:ss')) 
    

    2017-04-28 00:00:00


    28/04/2017 00:00:00
    ||||||||||
    dd/MM/yyyy
    

    select  from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy')) 
    

    2017-04-28 00:00:00


    The result can be converted from string to timestamp using cast

    select cast (from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy HH:mm:ss')) as timestamp)