Search code examples
sqlhivehql

Shift rows of date column according to a group category in Hive


I have these columns with location and dates. I want to shift the date column down within each location and not the entire column using Hive HQL. I've tried to use LAG() but there is no support to Hive on it and also datediff() but it does not brings me the shifting date, only date differences.

Location    Dates
0000076-0   01/01/2020
0000076-0   28/01/2020
0000076-0   28/02/2020
0000076-0   01/04/2020
0000076-0   28/04/2020
0000076-0   28/05/2020
0000076-0   01/07/2020
000478-2    01/01/2020
000478-2    01/02/2020
000478-2    26/02/2020
000478-2    01/04/2020
000478-2    26/04/2020
000478-2    26/05/2020
000478-2    01/07/2020
0005655-3   01/01/2020
0005655-3   07/02/2020
0005655-3   09/03/2020
0005655-3   07/04/2020
0005655-3   07/05/2020
0005655-3   08/06/2020
0005655-3   01/07/2020
0010828-6   01/01/2020
0010828-6   31/01/2020
0010828-6   29/02/2020
0010828-6   01/04/2020
0010828-6   01/05/2020

Desired output

Location    Dates       Dates_shifted_down_per_location
0000076-0   01/01/2020  null
0000076-0   28/01/2020  01/01/2020     
0000076-0   28/02/2020  28/01/2020     
0000076-0   01/04/2020  28/02/2020     
0000076-0   28/04/2020  01/04/2020     
0000076-0   28/05/2020  28/04/2020     
0000076-0   01/07/2020  28/05/2020     
000478-2    01/01/2020  null     
000478-2    01/02/2020  01/01/2020     
000478-2    26/02/2020  01/02/2020     
000478-2    01/04/2020  26/02/2020     
000478-2    26/04/2020  01/04/2020     
000478-2    26/05/2020  26/04/2020     
000478-2    01/07/2020  26/05/2020     
0005655-3   01/01/2020  null     
0005655-3   07/02/2020  01/01/2020     
0005655-3   09/03/2020  07/02/2020     
0005655-3   07/04/2020  09/03/2020     
0005655-3   07/05/2020  07/04/2020     
0005655-3   08/06/2020  07/05/2020     
0005655-3   01/07/2020  08/06/2020     
0010828-6   01/01/2020  null     
0010828-6   31/01/2020  01/01/2020     
0010828-6   29/02/2020  31/01/2020     
0010828-6   01/05/2020  29/02/2020      
0010828-6   01/06/2020  01/05/2020

Solution

  • Hive supports lag(). The code you want is:

    select t.*,
           lag(date) over (partition by location order by date) as prev_date
    from t;