Search code examples
rdplyracrosstidyselect

Calculate the rowwise mean when a maximum number of NA values is given for a set of columns using dplyr


Example dataset...

> tribble(
+   ~colA, ~colB, ~colC, ~colD, ~colE,
+   1, 2, 3, 4, 5,
+   2, 3, NA, 4, 5,
+   3, NA, NA, NA, 4,
+   4, NA, NA, 5, 6
+ )
# A tibble: 4 × 5
   colA  colB  colC  colD  colE
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3     4     5
2     2     3    NA     4     5
3     3    NA    NA    NA     4
4     4    NA    NA     5     6

How can I create a new column giving the mean of columns B, C, D and E if only two (at most) NAs are present? In this case, the third row mean should be NA as it has 3 NAs. I have put colA because I want to be able to use tidyselect to choose which variables are included.

So far I have this...

dat %>% 
  rowwise() %>% 
  mutate(
    mean = if_else(
      c_across(colB, colC, colD, colE), 
      condition = sum(is.na(.)) <= 2, 
      true = mean(., na.rm = T), 
      false = NA
      )
    )

But I get this error message...

Error in `mutate()`:
! Problem while computing `mean = if_else(...)`.
ℹ The error occurred in row 1.
Caused by error in `if_else()`:
! `false` must be a double vector, not a logical vector.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Problem while computing `mean = if_else(...)`.
ℹ argument is not numeric or logical: returning NA
ℹ The warning occurred in row 1. 

In an ideal world, I would have a function for taking the rowwise mean for a set of columns and a given number of allowed NAs that I could repurpose.


Solution

  • We can use across to select column of interest.

    library(dplyr)
    
    dat %>% 
      mutate(mean = ifelse(rowSums(is.na(across(-colA))) > 2, 
                           NA, 
                           rowMeans(across(-colA), na.rm = T)))
    
    # A tibble: 4 × 6
       colA  colB  colC  colD  colE  mean
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1     1     2     3     4     5   3.5
    2     2     3    NA     4     5   4  
    3     3    NA    NA    NA     4  NA  
    4     4    NA    NA     5     6   5.5