Search code examples
rdplyrzoo

Creating a trailing rolling average without NAs at the beginning of the output


I'm working with a nested dataframe using dplyr. I need to mutate a second nested column with a copy of the data where all numeric columns are replaced with a trailing rolling average of the last 5 rows.

How do I either specify to zoo that I don't want the first five NAs or create a custom function that gives the desired output?

I've tried both rollmean with the partial = TRUE argument and rollapply with a custom function with na.rm = TRUE from the zoo package, but the first 4 rows are turned into NAs, which I don't want.

library(tidyverse)
library(zoo)

example <- 
  tibble("index" = c(rep(1, 5), rep(2, 5)), "data_a" = c(1:3, 1:2, 1:3, 1:2), "data_b" = c(2:4, 2:3, 2:4, 2:3)) %>%
  group_by(index) %>%
  nest()

example_ra <- example %>%
  mutate(roll_mean = map(data, ~ mutate(.x, across(
    where(is.numeric),
    ~rollmean(
      .,
      k = 5,
      fill = NA,
      partial = TRUE,
      align = "right"
    )
  ))))


My desired output (as a second list-column named roll_mean) is:

Input B Input A
1 2
2 3
3 4
1 2
2 3
Output B Output A
1 2
1.5 2.5
2 3
1.75 2.75
1.8 2.8

I get:

Output B Output A
NA NA
NA NA
NA NA
NA NA
1.8 2.8

Thanks :)


Solution

  • There are several problems:

    • rollmean does not have a partial= argument. Use rollapplyr. Note r on the end to avoid needing the right= argument.
    • it seems that nested functions specified by formulas are not supported. Use function

    With these changes

    example_ra <- example %>%
      mutate(roll_mean = map(data, ~ mutate(.x, across(
        .cols = where(is.numeric),
        .fns = function(x) rollapplyr(x, width = 5, FUN = mean, partial = TRUE)
      ))))