Search code examples
rsubsetsequence

Identifying/describing sequences of consecutive days with certain value within a vector


I have a large dataset containing daily values indicating whether that particular day in the year was especially hot or not (indicated by 1 or 0). I aim to identify sequences of 3 or more especially hot days and create a new dataset that contains the length and the start and end date of each.

I'm a bit stuck on how to go about this.

An example of my dataset:

hotday <- c(0,1,0,1,1,1,0,0,1,1,1,1,0)
dates <- seq.Date(from=as.Date("1990-06-01"), by="day",length.out = length(hotday))
df <- data.frame(dates,hotday)
df
        dates hotday
1  1990-06-01      0
2  1990-06-02      1
3  1990-06-03      0
4  1990-06-04      1
5  1990-06-05      1
6  1990-06-06      1
7  1990-06-07      0
8  1990-06-08      0
9  1990-06-09      1
10 1990-06-10      1
11 1990-06-11      1
12 1990-06-12      1
13 1990-06-13      0

The output I would like to achieve should look as follows:

   startdate    enddate length
1 1990-06-04 1990-06-06      3
2 1990-06-09 1990-06-12      4

Thank you for the help, I am willing to take any approach or suggestion.


Solution

  • If you prefer tidyverse syntax you could do

    library(dplyr) 
    
    df %>% 
      mutate(run = cumsum(c(1, abs(diff(hotday))))) %>%
      filter(hotday == 1) %>%
      group_by(run) %>%
      summarize(startdate = first(dates), enddate = last(dates), length = n()) %>%
      ungroup() %>%
      select(-run) %>%
      filter(length >= 3)
    #> # A tibble: 2 x 3
    #>   startdate  enddate    length
    #>   <date>     <date>      <int>
    #> 1 1990-06-04 1990-06-06      3
    #> 2 1990-06-09 1990-06-12      4
    

    Created on 2022-09-30 with reprex v2.0.2