I have these columns with location and dates. I want to shift the date column down within each location and not the entire column using Hive HQL. I've tried to use LAG()
but there is no support to Hive on it and also datediff()
but it does not brings me the shifting date, only date differences.
Location Dates
0000076-0 01/01/2020
0000076-0 28/01/2020
0000076-0 28/02/2020
0000076-0 01/04/2020
0000076-0 28/04/2020
0000076-0 28/05/2020
0000076-0 01/07/2020
000478-2 01/01/2020
000478-2 01/02/2020
000478-2 26/02/2020
000478-2 01/04/2020
000478-2 26/04/2020
000478-2 26/05/2020
000478-2 01/07/2020
0005655-3 01/01/2020
0005655-3 07/02/2020
0005655-3 09/03/2020
0005655-3 07/04/2020
0005655-3 07/05/2020
0005655-3 08/06/2020
0005655-3 01/07/2020
0010828-6 01/01/2020
0010828-6 31/01/2020
0010828-6 29/02/2020
0010828-6 01/04/2020
0010828-6 01/05/2020
Desired output
Location Dates Dates_shifted_down_per_location
0000076-0 01/01/2020 null
0000076-0 28/01/2020 01/01/2020
0000076-0 28/02/2020 28/01/2020
0000076-0 01/04/2020 28/02/2020
0000076-0 28/04/2020 01/04/2020
0000076-0 28/05/2020 28/04/2020
0000076-0 01/07/2020 28/05/2020
000478-2 01/01/2020 null
000478-2 01/02/2020 01/01/2020
000478-2 26/02/2020 01/02/2020
000478-2 01/04/2020 26/02/2020
000478-2 26/04/2020 01/04/2020
000478-2 26/05/2020 26/04/2020
000478-2 01/07/2020 26/05/2020
0005655-3 01/01/2020 null
0005655-3 07/02/2020 01/01/2020
0005655-3 09/03/2020 07/02/2020
0005655-3 07/04/2020 09/03/2020
0005655-3 07/05/2020 07/04/2020
0005655-3 08/06/2020 07/05/2020
0005655-3 01/07/2020 08/06/2020
0010828-6 01/01/2020 null
0010828-6 31/01/2020 01/01/2020
0010828-6 29/02/2020 31/01/2020
0010828-6 01/05/2020 29/02/2020
0010828-6 01/06/2020 01/05/2020
Hive supports lag()
. The code you want is:
select t.*,
lag(date) over (partition by location order by date) as prev_date
from t;