Search code examples
rdplyrmeanzoorollapply

Calculate mean using rollapply only if certain percent of data is available


I have a column of hourly data and want to use rollapply to calculate the 24-hour rolling average for every hour. My data contains NA's and I only want to calculate the rolling average if 75% of the data for one 24-hour period is available, otherwise I wish for the 24-rolling average to be considered NA.

  df %>%
        mutate(rolling_avg = rollapply(hourly_data, 24, FUN = mean ,align = "right", fill = NA ))

How can I modify the above code to accomplish this?


Solution

  • Define a function to do exactly what you stated:

    f <- function( v ) {
      if( sum(is.na(v)) > length(v)*0.25 ) return(NA)
      mean(v, na.rm = TRUE)
    }
    

    Then use it in place of mean:

    df %>% mutate(rolling_avg = rollapply(hourly_data, 24, FUN = f, 
                                         align = "right", fill = NA ))