Search code examples
rgroupingrolling-computation

rolling average across grouped dates


I have a time series of half-hourly observations. I want a running average of the range of each day's measurements. I group by date and get the correct daily ranges, but then everything I can think of to get the running average only works within the daily groups instead of across them -- row-by-row instead of day-by-day -- and since each day only has one range the value I get is always that range again. Here's one example:

library(tidyverse)
library(zoo)
set.seed(3)
dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24))
df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE))
df <- df %>%
  mutate(date = as.Date(dts)) %>%
  group_by(date) %>%
  mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>%
  mutate(range = tMax - tMin) %>%
  mutate(rollRange = rollapply(range, 3, mean, fill=NA))

In the real data there are always many more observations each day than days in the rolling window so there are only NAs for the top of each day, and thereafter identical to range. An added complication is that for other reasons there are a random number of rows per day so I can't just make my window obs/day * days. Do I have to summarize() this out to a separate data frame, only to merge it back in?

With apologies, here's the result of the reprex plus the column desired showing something like what I'm looking for:

   dts                  temp date        tMax  tMin range rollRange desired
   <dttm>              <int> <date>     <int> <int> <int>     <dbl>   <dbl>
 1 2024-08-15 13:00:00    17 2024-08-15    19    17     2        NA   NA   
 2 2024-08-15 20:00:00    19 2024-08-15    19    17     2        NA   NA   
 3 2024-08-16 02:00:00    20 2024-08-16    20    12     8        NA    4   
 4 2024-08-16 04:00:00    16 2024-08-16    20    12     8         8    4   
 5 2024-08-16 06:00:00    12 2024-08-16    20    12     8         8    4   
 6 2024-08-16 20:00:00    14 2024-08-16    20    12     8         8    4   
 7 2024-08-16 21:00:00    16 2024-08-16    20    12     8        NA    4   
 8 2024-08-17 00:00:00    15 2024-08-17    17    15     2        NA    5.33
 9 2024-08-17 08:00:00    17 2024-08-17    17    15     2        NA    5.33
10 2024-08-18 06:00:00    19 2024-08-18    19    13     6        NA    4.33
11 2024-08-18 10:00:00    13 2024-08-18    19    13     6        NA    4.33
12 2024-08-19 16:00:00    10 2024-08-19    15    10     5        NA    6.66
13 2024-08-19 19:00:00    12 2024-08-19    15    10     5         5    6.66
14 2024-08-19 20:00:00    15 2024-08-19    15    10     5        NA    6.66
15 2024-08-20 06:00:00    13 2024-08-20    18    13     5        NA    6.66
16 2024-08-20 08:00:00    18 2024-08-20    18    13     5        NA    6.66
17 2024-08-21 00:00:00    19 2024-08-21    19    10     9        NA    6   
18 2024-08-21 01:00:00    16 2024-08-21    19    10     9         9    6   
19 2024-08-21 04:00:00    19 2024-08-21    19    10     9         9    6   
20 2024-08-21 08:00:00    10 2024-08-21    19    10     9         9    6   
21 2024-08-21 19:00:00    18 2024-08-21    19    10     9         9    6   
22 2024-08-21 21:00:00    10 2024-08-21    19    10     9        NA    6   
23 2024-08-22 05:00:00    18 2024-08-22    18    14     4        NA   NA   
24 2024-08-22 09:00:00    14 2024-08-22    18    14     4        NA   NA  

Solution

  • Yes, using rollmean on the summary and joining back seems easiest.

    roll <- df %>%   # df has already been grouped
      summarize(rollRange2 = first(range)) %>%
      mutate(rollRange2 = rollmean(rollRange2, 3, fill = NA))
    
    df %>%
      left_join(roll, join_by(date))