I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.
id | date | value |
---|---|---|
1 | 8/10/2021 | 1 |
1 | 8/11/2021 | 0 |
1 | 8/12/2021 | 0 |
1 | 8/13/2021 | 0 |
1 | 8/14/2021 | 5 |
2 | 8/10/2021 | 2 |
2 | 8/11/2021 | 3 |
2 | 8/12/2021 | 0 |
2 | 8/13/2021 | 0 |
2 | 8/14/2021 | 6 |
3 | 8/10/2021 | 3 |
3 | 8/11/2021 | 4 |
3 | 8/12/2021 | 0 |
3 | 8/13/2021 | 0 |
3 | 8/14/2021 | 0 |
output
id | date | value | Flag |
---|---|---|---|
1 | 8/10/2021 | 1 | Y |
1 | 8/11/2021 | 0 | Y |
1 | 8/12/2021 | 0 | Y |
1 | 8/13/2021 | 0 | Y |
1 | 8/14/2021 | 5 | Y |
2 | 8/10/2021 | 2 | N |
2 | 8/11/2021 | 3 | N |
2 | 8/12/2021 | 0 | N |
2 | 8/13/2021 | 0 | N |
2 | 8/14/2021 | 6 | N |
3 | 8/10/2021 | 3 | Y |
3 | 8/11/2021 | 4 | Y |
3 | 8/12/2021 | 0 | Y |
3 | 8/13/2021 | 0 | Y |
3 | 8/14/2021 | 0 | Y |
Thank you.
Using window count() function you can count 0's in the frame [current row, 2 following] (ordered by date) - three consecutive rows frame calculated for each row:
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
.
If count happens to equal 3 then it means 3 consecutive 0's found, case expression produces Y
for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end
.
To propagate 'Y' flag to the whole id group use max(...) over (partition by id)
Demo with your data example (tested on Hive):
with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
select 1 id,'2021-08-10' date_, 1 value union all
select 1,'2021-08-11',0 union all
select 1,'2021-08-12',0 union all
select 1,'2021-08-13',0 union all
select 1,'2021-08-14',5 union all
select 2,'2021-08-10',2 union all
select 2,'2021-08-11',3 union all
select 2,'2021-08-12',0 union all
select 2,'2021-08-13',0 union all
select 2,'2021-08-14',6 union all
select 3,'2021-08-10',3 union all
select 3,'2021-08-11',4 union all
select 3,'2021-08-12',0 union all
select 3,'2021-08-13',0 union all
select 3,'2021-08-14',0
) --End of data example, use your table instead of this CTE
select id, date_, value,
max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
from
(
select id, date_, value,
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
from mydata
)s
order by id, date_ --remove ordering if not necessary
--added it to get result in the same order
Result:
id date_ value flag
1 2021-08-10 1 Y
1 2021-08-11 0 Y
1 2021-08-12 0 Y
1 2021-08-13 0 Y
1 2021-08-14 5 Y
2 2021-08-10 2 N
2 2021-08-11 3 N
2 2021-08-12 0 N
2 2021-08-13 0 N
2 2021-08-14 6 N
3 2021-08-10 3 Y
3 2021-08-11 4 Y
3 2021-08-12 0 Y
3 2021-08-13 0 Y
3 2021-08-14 0 Y