I am trying to take mean of values belonging to particular group. There can be any number of missing values corresponding to that particular group. What I want is that output mean should be NA if number of missing values exceed some limit (e.g., 3) otherwise output the mean of the values ignoring those NA values.
Below is the code I tried with a sample data:
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
out1 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(.))) # or
out2 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(., na.rm=T)))
This will help in taking mean of all columns except those in groups.
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
library(tidyverse)
n <- 3
df %>% group_by(Year, Week) %>% summarise(across(everything(), .fns = list(Mean = ~mean(.x, na.rm =T),
na_vals = ~sum(is.na(.x))),
.names = "{.col}.{.fn}"
), .groups = 'drop') %>%
mutate(across(ends_with('.Mean'), ~ifelse(get(str_replace(cur_column(), '.Mean', '.na_vals'))>= n,
NA, .))) %>%
select(!ends_with('.na_vals')) %>%
rename_with(~str_remove(., '.Mean'), ends_with('.Mean'))
# A tibble: 2 x 3
Year Week value
<dbl> <dbl> <dbl>
1 2017 44 NA
2 2017 45 19.5
Created on 2021-05-10 by the reprex package (v2.0.0)