Search code examples
rdplyr

Maximum value over bi-monthly sliding window


I have a two-year daily time series that I would like to extract the maximum values from between two-month wide sliding windows (i.e., Jan-Feb, Feb-Mar,..., Nov-Dec). I can get close to an answer by using the packages tidyverse and slider however, the answer is not exact because not all months have the same number of days. Is there a way to extract the maximum value from a two-month wide sliding window that does not rely on setting a value for the number of days to look before and after?

library(tidyverse)
library(slider)

# Example Daily Timeseries
set.seed(333)

df <- data.frame(
  date = seq(as.Date('2020-01-01'),
                 as.Date('2021-12-31'),
                 '1 day'),
  value = round(rnorm(731,100,33))
)

# Calculate maximum in 2-month (60-day) wide sliding window
# Must supply number of days before and after but months have different number of days
df %>%
  mutate(two_month_max = slider::slide_index_dbl(value, date, max, .before = 30, .after = 30)) %>%
  filter(day(date) == 1) %>%
  select(!value)
#>          date two_month_max
#> 1  2020-01-01           164
#> 2  2020-02-01           164
#> 3  2020-03-01           160
#> 4  2020-04-01           170
#> 5  2020-05-01           181
#> 6  2020-06-01           181
#> 7  2020-07-01           150
#> 8  2020-08-01           193
#> 9  2020-09-01           193
#> 10 2020-10-01           212
#> 11 2020-11-01           212
#> 12 2020-12-01           187
#> 13 2021-01-01           179
#> 14 2021-02-01           179
#> 15 2021-03-01           167
#> 16 2021-04-01           159
#> 17 2021-05-01           169
#> 18 2021-06-01           184
#> 19 2021-07-01           184
#> 20 2021-08-01           194
#> 21 2021-09-01           194
#> 22 2021-10-01           163
#> 23 2021-11-01           161
#> 24 2021-12-01           168

Created on 2024-05-08 with reprex v2.1.0

EXTRA INFO: I have formatted the output above to resemble that of the El Niño/Southern Oscillation (ENSO) - Multivariate ENSO Index Version 2 (MEI) which these maximum values will be appended to at another time.

library(tidyverse)
library(rsoi)

# Import Multivariate ENSO Index Version 2 (MEI) ----
mei <- download_mei()

mei %>%
  arrange(-desc(Date))
#> # A tibble: 552 × 5
#>     Year Month Date         MEI Phase        
#>    <int> <fct> <date>     <dbl> <ord>        
#>  1  1979 DJ    1979-01-01  0.47 Neutral Phase
#>  2  1979 JF    1979-02-01  0.29 Neutral Phase
#>  3  1979 FM    1979-03-01 -0.05 Neutral Phase
#>  4  1979 MA    1979-04-01  0.21 Neutral Phase
#>  5  1979 AM    1979-05-01  0.27 Neutral Phase
#>  6  1979 MJ    1979-06-01 -0.11 Neutral Phase
#>  7  1979 JJ    1979-07-01 -0.11 Neutral Phase
#>  8  1979 JA    1979-08-01  0.47 Neutral Phase
#>  9  1979 AS    1979-09-01  0.38 Neutral Phase
#> 10  1979 SO    1979-10-01  0.23 Neutral Phase
#> # ℹ 542 more rows

Created on 2024-05-08 with reprex v2.1.0


Solution

  • Here's an all tidyverse take (no zoo or slider).

    library(tidyverse) # dplyr || lubridate || purrr
     
    # `reframe` returns an arbitrary number of rows per group [1] -------------
    new_df <- df %>% 
      reframe(
        start_date = unique(floor_date(date, "months")),
        interval   = as.interval(months(2) - days(1), start_date),
        max_value  = map_dbl(interval, \(x) max(keep(value, date %within% x))))
    

    Output:

    > new_df
       start_date                       interval max_value
    1  2020-01-01 2020-01-01 UTC--2020-02-29 UTC       164
    2  2020-02-01 2020-02-01 UTC--2020-03-31 UTC       160
    3  2020-03-01 2020-03-01 UTC--2020-04-30 UTC       170
    4  2020-04-01 2020-04-01 UTC--2020-05-31 UTC       181
    5  2020-05-01 2020-05-01 UTC--2020-06-30 UTC       181
    6  2020-06-01 2020-06-01 UTC--2020-07-31 UTC       150
    7  2020-07-01 2020-07-01 UTC--2020-08-31 UTC       193
    8  2020-08-01 2020-08-01 UTC--2020-09-30 UTC       193
    9  2020-09-01 2020-09-01 UTC--2020-10-31 UTC       212
    10 2020-10-01 2020-10-01 UTC--2020-11-30 UTC       212
    11 2020-11-01 2020-11-01 UTC--2020-12-31 UTC       187
    12 2020-12-01 2020-12-01 UTC--2021-01-31 UTC       179
    13 2021-01-01 2021-01-01 UTC--2021-02-28 UTC       179
    14 2021-02-01 2021-02-01 UTC--2021-03-31 UTC       167
    15 2021-03-01 2021-03-01 UTC--2021-04-30 UTC       159
    16 2021-04-01 2021-04-01 UTC--2021-05-31 UTC       169
    17 2021-05-01 2021-05-01 UTC--2021-06-30 UTC       184
    18 2021-06-01 2021-06-01 UTC--2021-07-31 UTC       184
    19 2021-07-01 2021-07-01 UTC--2021-08-31 UTC       194
    20 2021-08-01 2021-08-01 UTC--2021-09-30 UTC       194
    21 2021-09-01 2021-09-01 UTC--2021-10-31 UTC       163
    22 2021-10-01 2021-10-01 UTC--2021-11-30 UTC       161
    23 2021-11-01 2021-11-01 UTC--2021-12-31 UTC       168
    24 2021-12-01 2021-12-01 UTC--2022-01-31 UTC       168
    

    [1] reframereference here.

    Toy data:

    # Toy data ----------------------------------------------------------------
    set.seed(333)
    
    df <- data.frame(
      date = seq(as.Date('2020-01-01'), as.Date('2021-12-31'), '1 day'),
      value = round(rnorm(731,100,33)))
    

    Created on 2024-05-10 with reprex v2.1.0