I have a large dataset containing daily values indicating whether that particular day in the year was especially hot or not (indicated by 1 or 0). I aim to identify sequences of 3 or more especially hot days and create a new dataset that contains the length and the start and end date of each.
I'm a bit stuck on how to go about this.
An example of my dataset:
hotday <- c(0,1,0,1,1,1,0,0,1,1,1,1,0)
dates <- seq.Date(from=as.Date("1990-06-01"), by="day",length.out = length(hotday))
df <- data.frame(dates,hotday)
df
dates hotday
1 1990-06-01 0
2 1990-06-02 1
3 1990-06-03 0
4 1990-06-04 1
5 1990-06-05 1
6 1990-06-06 1
7 1990-06-07 0
8 1990-06-08 0
9 1990-06-09 1
10 1990-06-10 1
11 1990-06-11 1
12 1990-06-12 1
13 1990-06-13 0
The output I would like to achieve should look as follows:
startdate enddate length
1 1990-06-04 1990-06-06 3
2 1990-06-09 1990-06-12 4
Thank you for the help, I am willing to take any approach or suggestion.
If you prefer tidyverse syntax you could do
library(dplyr)
df %>%
mutate(run = cumsum(c(1, abs(diff(hotday))))) %>%
filter(hotday == 1) %>%
group_by(run) %>%
summarize(startdate = first(dates), enddate = last(dates), length = n()) %>%
ungroup() %>%
select(-run) %>%
filter(length >= 3)
#> # A tibble: 2 x 3
#> startdate enddate length
#> <date> <date> <int>
#> 1 1990-06-04 1990-06-06 3
#> 2 1990-06-09 1990-06-12 4
Created on 2022-09-30 with reprex v2.0.2