I have a data.frame with dates and fails. And now I would like to count the consecutive fails of each period together with the date of the start and end date.
For instance,
data <- data.frame(date = seq.Date(as.Date("2019-01-01"), by = "days",length.out = 14),
fail = c(1,1,0,0,0,1,1,1,1,0,1,0,0,0))
date fail
1 2019-01-01 1
2 2019-01-02 1
3 2019-01-03 0
4 2019-01-04 0
5 2019-01-05 0
6 2019-01-06 1
7 2019-01-07 1
8 2019-01-08 1
9 2019-01-09 1
10 2019-01-10 0
11 2019-01-11 1
12 2019-01-12 0
13 2019-01-13 0
and the result should be
duration start end
2 2019-01-01 2019-01-02
4 2019-01-06 2019-01-09
1 2019-01-11 2019-01-11
I tried to get the desired results using the difference of index of fails. However, I'm struggling to get the result. About any help would be much appreciated.
One dplyr
option could be:
data %>%
group_by(rleid = with(rle(fail), rep(seq_along(lengths), lengths))) %>%
filter(fail == 1) %>%
summarise(n = n(),
start = min(date),
end = max(date)) %>%
ungroup() %>%
select(-rleid)
n start end
<int> <date> <date>
1 2 2019-01-01 2019-01-02
2 4 2019-01-06 2019-01-09
3 1 2019-01-11 2019-01-11