Search code examples
rmeancase-when

R | Mutate with condition for multiple columns


I want to calculate the mean in a row if at least three out of six observations in the row are != NA. If four or more NA´s are present, the mean should show NA.

Example which gives me the mean, ignoring the NA´s:

require(dplyr)

a <- 1:10
b <- a+10
c <- a+20
d <- a+30
e <- a+40
f <- a+50

df <- data.frame(a,b,c,d,e,f)

df[2,c(1,3,4,6)] <- NA
df[5,c(1,4,6)] <- NA
df[8,c(1,2,5,6)] <- NA


df <- df %>% mutate(mean = rowMeans(df[,1:6], na.rm=TRUE))

I thought about the use of

case_when

but i´m not sure how to use it correctly:

df <- df %>% mutate(mean = case_when( ~ rowMeans(df[,1:6], na.rm=TRUE), TRUE ~ NA))

Solution

  • You can try a base R solution saving the number of non NA values in a new variable and then use ifelse() for the mean:

    #Data
    a <- 1:10
    b <- a+10
    c <- a+20
    d <- a+30
    e <- a+40
    f <- a+50
    
    df <- data.frame(a,b,c,d,e,f)
    
    df[2,c(1,3,4,6)] <- NA
    df[5,c(1,4,6)] <- NA
    df[8,c(1,2,5,6)] <- NA
    #Code
    #Count number of non NA
    df$count <- rowSums( !is.na( df [,1:6]))
    #Compute mean
    df$Mean <- ifelse(df$count>=3,rowMeans(df [,1:6],na.rm=T),NA)
    

    Output:

        a  b  c  d  e  f count     Mean
    1   1 11 21 31 41 51     6 26.00000
    2  NA 12 NA NA 42 NA     2       NA
    3   3 13 23 33 43 53     6 28.00000
    4   4 14 24 34 44 54     6 29.00000
    5  NA 15 25 NA 45 NA     3 28.33333
    6   6 16 26 36 46 56     6 31.00000
    7   7 17 27 37 47 57     6 32.00000
    8  NA NA 28 38 NA NA     2       NA
    9   9 19 29 39 49 59     6 34.00000
    10 10 20 30 40 50 60     6 35.00000