I have a two-year daily time series that I would like to extract the maximum values from between two-month wide sliding windows (i.e., Jan-Feb, Feb-Mar,..., Nov-Dec). I can get close to an answer by using the packages tidyverse
and slider
however, the answer is not exact because not all months have the same number of days. Is there a way to extract the maximum value from a two-month wide sliding window that does not rely on setting a value for the number of days to look before and after?
library(tidyverse)
library(slider)
# Example Daily Timeseries
set.seed(333)
df <- data.frame(
date = seq(as.Date('2020-01-01'),
as.Date('2021-12-31'),
'1 day'),
value = round(rnorm(731,100,33))
)
# Calculate maximum in 2-month (60-day) wide sliding window
# Must supply number of days before and after but months have different number of days
df %>%
mutate(two_month_max = slider::slide_index_dbl(value, date, max, .before = 30, .after = 30)) %>%
filter(day(date) == 1) %>%
select(!value)
#> date two_month_max
#> 1 2020-01-01 164
#> 2 2020-02-01 164
#> 3 2020-03-01 160
#> 4 2020-04-01 170
#> 5 2020-05-01 181
#> 6 2020-06-01 181
#> 7 2020-07-01 150
#> 8 2020-08-01 193
#> 9 2020-09-01 193
#> 10 2020-10-01 212
#> 11 2020-11-01 212
#> 12 2020-12-01 187
#> 13 2021-01-01 179
#> 14 2021-02-01 179
#> 15 2021-03-01 167
#> 16 2021-04-01 159
#> 17 2021-05-01 169
#> 18 2021-06-01 184
#> 19 2021-07-01 184
#> 20 2021-08-01 194
#> 21 2021-09-01 194
#> 22 2021-10-01 163
#> 23 2021-11-01 161
#> 24 2021-12-01 168
Created on 2024-05-08 with reprex v2.1.0
EXTRA INFO: I have formatted the output above to resemble that of the El Niño/Southern Oscillation (ENSO) - Multivariate ENSO Index Version 2 (MEI) which these maximum values will be appended to at another time.
library(tidyverse)
library(rsoi)
# Import Multivariate ENSO Index Version 2 (MEI) ----
mei <- download_mei()
mei %>%
arrange(-desc(Date))
#> # A tibble: 552 × 5
#> Year Month Date MEI Phase
#> <int> <fct> <date> <dbl> <ord>
#> 1 1979 DJ 1979-01-01 0.47 Neutral Phase
#> 2 1979 JF 1979-02-01 0.29 Neutral Phase
#> 3 1979 FM 1979-03-01 -0.05 Neutral Phase
#> 4 1979 MA 1979-04-01 0.21 Neutral Phase
#> 5 1979 AM 1979-05-01 0.27 Neutral Phase
#> 6 1979 MJ 1979-06-01 -0.11 Neutral Phase
#> 7 1979 JJ 1979-07-01 -0.11 Neutral Phase
#> 8 1979 JA 1979-08-01 0.47 Neutral Phase
#> 9 1979 AS 1979-09-01 0.38 Neutral Phase
#> 10 1979 SO 1979-10-01 0.23 Neutral Phase
#> # ℹ 542 more rows
Created on 2024-05-08 with reprex v2.1.0
Here's an all tidyverse
take (no zoo
or slider
).
library(tidyverse) # dplyr || lubridate || purrr
# `reframe` returns an arbitrary number of rows per group [1] -------------
new_df <- df %>%
reframe(
start_date = unique(floor_date(date, "months")),
interval = as.interval(months(2) - days(1), start_date),
max_value = map_dbl(interval, \(x) max(keep(value, date %within% x))))
Output:
> new_df
start_date interval max_value
1 2020-01-01 2020-01-01 UTC--2020-02-29 UTC 164
2 2020-02-01 2020-02-01 UTC--2020-03-31 UTC 160
3 2020-03-01 2020-03-01 UTC--2020-04-30 UTC 170
4 2020-04-01 2020-04-01 UTC--2020-05-31 UTC 181
5 2020-05-01 2020-05-01 UTC--2020-06-30 UTC 181
6 2020-06-01 2020-06-01 UTC--2020-07-31 UTC 150
7 2020-07-01 2020-07-01 UTC--2020-08-31 UTC 193
8 2020-08-01 2020-08-01 UTC--2020-09-30 UTC 193
9 2020-09-01 2020-09-01 UTC--2020-10-31 UTC 212
10 2020-10-01 2020-10-01 UTC--2020-11-30 UTC 212
11 2020-11-01 2020-11-01 UTC--2020-12-31 UTC 187
12 2020-12-01 2020-12-01 UTC--2021-01-31 UTC 179
13 2021-01-01 2021-01-01 UTC--2021-02-28 UTC 179
14 2021-02-01 2021-02-01 UTC--2021-03-31 UTC 167
15 2021-03-01 2021-03-01 UTC--2021-04-30 UTC 159
16 2021-04-01 2021-04-01 UTC--2021-05-31 UTC 169
17 2021-05-01 2021-05-01 UTC--2021-06-30 UTC 184
18 2021-06-01 2021-06-01 UTC--2021-07-31 UTC 184
19 2021-07-01 2021-07-01 UTC--2021-08-31 UTC 194
20 2021-08-01 2021-08-01 UTC--2021-09-30 UTC 194
21 2021-09-01 2021-09-01 UTC--2021-10-31 UTC 163
22 2021-10-01 2021-10-01 UTC--2021-11-30 UTC 161
23 2021-11-01 2021-11-01 UTC--2021-12-31 UTC 168
24 2021-12-01 2021-12-01 UTC--2022-01-31 UTC 168
[1] reframe
reference here.
Toy data:
# Toy data ----------------------------------------------------------------
set.seed(333)
df <- data.frame(
date = seq(as.Date('2020-01-01'), as.Date('2021-12-31'), '1 day'),
value = round(rnorm(731,100,33)))
Created on 2024-05-10 with reprex v2.1.0