Search code examples
rdataframeaggregatesequencecounting

count the consecutive fails in a row together with its start and end date


I have a data.frame with dates and fails. And now I would like to count the consecutive fails of each period together with the date of the start and end date.

For instance,

data <- data.frame(date = seq.Date(as.Date("2019-01-01"), by = "days",length.out = 14), 
                     fail = c(1,1,0,0,0,1,1,1,1,0,1,0,0,0))

         date fail
1  2019-01-01    1
2  2019-01-02    1
3  2019-01-03    0
4  2019-01-04    0
5  2019-01-05    0
6  2019-01-06    1
7  2019-01-07    1
8  2019-01-08    1
9  2019-01-09    1
10 2019-01-10    0
11 2019-01-11    1
12 2019-01-12    0
13 2019-01-13    0

and the result should be

  duration      start        end
        2 2019-01-01 2019-01-02
        4 2019-01-06 2019-01-09
        1 2019-01-11 2019-01-11

I tried to get the desired results using the difference of index of fails. However, I'm struggling to get the result. About any help would be much appreciated.


Solution

  • One dplyr option could be:

    data %>%
     group_by(rleid = with(rle(fail), rep(seq_along(lengths), lengths))) %>%
     filter(fail == 1) %>%
     summarise(n = n(),
               start = min(date),
               end = max(date)) %>%
     ungroup() %>%
     select(-rleid)
    
          n start      end       
      <int> <date>     <date>    
    1     2 2019-01-01 2019-01-02
    2     4 2019-01-06 2019-01-09
    3     1 2019-01-11 2019-01-11