Search code examples
rdataframedplyrlubridate

Output mean as NA if total counts of missing values exceeds a limit in R


I am trying to take mean of values belonging to particular group. There can be any number of missing values corresponding to that particular group. What I want is that output mean should be NA if number of missing values exceed some limit (e.g., 3) otherwise output the mean of the values ignoring those NA values.

Below is the code I tried with a sample data:

df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")

out1 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(.))) # or
out2 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(., na.rm=T)))

Solution

  • This will help in taking mean of all columns except those in groups.

    df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
    
    library(tidyverse)
    
    n <- 3
    df %>% group_by(Year, Week) %>% summarise(across(everything(), .fns = list(Mean = ~mean(.x, na.rm =T),
                                                                               na_vals = ~sum(is.na(.x))),
                                                     .names = "{.col}.{.fn}"
                                                     ), .groups = 'drop') %>%
      mutate(across(ends_with('.Mean'), ~ifelse(get(str_replace(cur_column(), '.Mean', '.na_vals'))>= n,
                                                NA, .))) %>%
      select(!ends_with('.na_vals')) %>%
      rename_with(~str_remove(., '.Mean'), ends_with('.Mean'))
    
    # A tibble: 2 x 3
       Year  Week value
      <dbl> <dbl> <dbl>
    1  2017    44  NA  
    2  2017    45  19.5
    

    Created on 2021-05-10 by the reprex package (v2.0.0)