I am trying to calculate averages of a value in a time series between many start and end dates.
A simple set up:
time_series <- tibble(dates=seq.Date(from=ymd('20240101'), to=ymd('20241231'), 'day'),
value = runif(366,max=10))
target_dates <-tibble(id=c('A','B','C'),
start=ymd(c('20240104','20240504','20240704')),
end =ymd(c('20240201','20240524','20240924'))) %>%
mutate(ndays=as.integer(end-start))
I'd like to calculate the mean of time_series$value between each target_dates$start and target_dates$end.
I'm unable to figure how to do this from two separate tibbles.
My first thought was this (and variations on this attempt).
target_dates %>%
group_by(start) %>%
summarise(avg = mean(time_series %>% slice(first(which(dates==start)):slice(last(which(dates==end)))) %>% select(values)))
But, that didn't work.
This also came to mind but does not work b/c because the between function does not work on vectors. I could write a for loop, but that's not elegant.
time_series %>%
summarise(avg = mean(value[between(target_dates$start,target_dates$end)]))
The column ndays is calculated because I tried using slide_idx_double from the slider library but could not figure out how to add the time series data to a statement that started with
time_series %>%
summarise(avg = slide_index_dbl(WHAT GOES HERE??,start,.after=ndays,mean))
What I'm expecting is a tibble/data frame to be created in which the average of the time series time_series tibble value column is calculated between the target_dates start and end dates.
Help is appreciated.
See if joining works with your data
library(dplyr)
right_join(time_series, target_dates, join_by(between(dates, start, end))) %>%
summarize(Mean = mean(value), .by = c(id, start, end))
# A tibble: 3 × 4
id start end Mean
<chr> <date> <date> <dbl>
1 A 2024-01-04 2024-02-01 5.23
2 B 2024-05-04 2024-05-24 5.09
3 C 2024-07-04 2024-09-24 4.96
Note, using set.seed
to get a consistent data set
set.seed(1)
time_series <- tibble(dates=seq.Date(from=ymd('20240101'),
to=ymd('20241231'), 'day'),
value = runif(366,max=10))
target_dates <-tibble(id=c('A','B','C'),
start=ymd(c('20240104','20240504','20240704')),
end =ymd(c('20240201','20240524','20240924'))) %>%
mutate(ndays=as.integer(end-start))