Search code examples
sqlhiveimpala

find the consecutive values in impala


I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.

id date value
1 8/10/2021 1
1 8/11/2021 0
1 8/12/2021 0
1 8/13/2021 0
1 8/14/2021 5
2 8/10/2021 2
2 8/11/2021 3
2 8/12/2021 0
2 8/13/2021 0
2 8/14/2021 6
3 8/10/2021 3
3 8/11/2021 4
3 8/12/2021 0
3 8/13/2021 0
3 8/14/2021 0

output

id date value Flag
1 8/10/2021 1 Y
1 8/11/2021 0 Y
1 8/12/2021 0 Y
1 8/13/2021 0 Y
1 8/14/2021 5 Y
2 8/10/2021 2 N
2 8/11/2021 3 N
2 8/12/2021 0 N
2 8/13/2021 0 N
2 8/14/2021 6 N
3 8/10/2021 3 Y
3 8/11/2021 4 Y
3 8/12/2021 0 Y
3 8/13/2021 0 Y
3 8/14/2021 0 Y

Thank you.


Solution

  • Using window count() function you can count 0's in the frame [current row, 2 following] (ordered by date) - three consecutive rows frame calculated for each row:

    count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.

    If count happens to equal 3 then it means 3 consecutive 0's found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.

    To propagate 'Y' flag to the whole id group use max(...) over (partition by id)

    Demo with your data example (tested on Hive):

    with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
    select 1 id,'2021-08-10' date_, 1 value union all
    select 1,'2021-08-11',0 union all
    select 1,'2021-08-12',0 union all
    select 1,'2021-08-13',0 union all
    select 1,'2021-08-14',5 union all
    select 2,'2021-08-10',2 union all
    select 2,'2021-08-11',3 union all
    select 2,'2021-08-12',0 union all
    select 2,'2021-08-13',0 union all
    select 2,'2021-08-14',6 union all
    select 3,'2021-08-10',3 union all
    select 3,'2021-08-11',4 union all
    select 3,'2021-08-12',0 union all
    select 3,'2021-08-13',0 union all
    select 3,'2021-08-14',0
    ) --End of data example, use your table instead of this CTE
    
    select id, date_, value, 
           max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
    from
    (
    select id, date_, value, 
     count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
    from mydata
    )s
      order by id, date_  --remove ordering if not necessary
                          --added it to get result in the same order
    

    Result:

    id  date_       value   flag    
    1   2021-08-10  1       Y
    1   2021-08-11  0       Y
    1   2021-08-12  0       Y
    1   2021-08-13  0       Y
    1   2021-08-14  5       Y
    2   2021-08-10  2       N
    2   2021-08-11  3       N
    2   2021-08-12  0       N
    2   2021-08-13  0       N
    2   2021-08-14  6       N
    3   2021-08-10  3       Y
    3   2021-08-11  4       Y
    3   2021-08-12  0       Y
    3   2021-08-13  0       Y
    3   2021-08-14  0       Y