Search code examples
rdplyrpurrrlubridate

Filter grouped data on dependent conditions


I have a dataframe of dates (date) and temperatures (degrees). I am attempting to filter this data by year where the temperature reaches 5 or less but later reaches 15 or higher. The time spans vary but are a subset of months or weeks across several years. I am using dplyr, purr and lubridate, but open to other packages.

Reproducible example:

date<- as.Date(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04",
"2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04",
"2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04",
"2023-01-01", "2023-01-02", "2023-01-03", "2023-01-04"))

degrees<- c(10,5,10,15,20,6,10,15,0,0,10,20,10,8,20,15)

df<-data.frame(date,degrees)

The output I am expecting is the following.

date degrees
2020-01-01 10
2020-01-02 5
2020-01-03 10
2020-01-04 15
2022-01-01 0
2022-01-02 0
2022-01-03 10
2022-01-04 20

Solution

  • Steps:

    1. Create a year column
    2. Find rows where the current row's degrees value is at least 15, and there's been one no more than 5 in the previous rows (using cumsum), grouped by year. Check if this is true for any row within a year
    3. Filter to years where this is true
    4. Remove the year and m (mask) columns
    library(tidyverse)
    
    df |> 
      mutate(year = year(date)) |>
      mutate(m = any(cumsum(degrees<= 5) > 0 & degrees >= 15), .by = year) |> 
      filter(m) |>
      select(date, degrees)