Search code examples
rif-statementdplyrmutated

Ifelse statement order with is.na using R Dplyr Mutate


I created some sample data below to help illustrate my question.

 library(dplyr)
 col1 = paste(rep('var',5),seq(1:5), sep = "")
 Value = c(1,1,0,NA,NA)
 p1 <- data.frame(col1,Value)

> p1
 col1 Value
 var1     1
 var2     1
 var3     0
 var4    NA
 var5    NA

When is.na(Value) is placed first in the ifelse statement, mutate works as expected.

> p1 %>% mutate(NewCol = ifelse(is.na(Value), "TestYes",
                         ifelse(Value == 1, "Test1Yes",
                         ifelse(Value == 0, "Test0Yes","No"))))
 col1 Value   NewCol
 var1     1 Test1Yes
 var2     1 Test1Yes
 var3     0 Test0Yes
 var4    NA  TestYes
 var5    NA  TestYes

When I place is.na(Value) as the second ifelse statement, it doesnt work. But the third ifelse statement still works checking for Value == 0. The second ifelse statement with is.na(Value) is skipped over.

> p1 %>% mutate(NewCol = ifelse(Value == 1, "Test1Yes",
                         ifelse(is.na(Value), "TestYes",
                         ifelse(Value == 0, "Test0Yes","No"))))
  col1 Value   NewCol
  var1     1 Test1Yes
  var2     1 Test1Yes
  var3     0 Test0Yes
  var4    NA     <NA>
  var5    NA     <NA>

Am I missing something in the code or is there a reason why is.na needs to be placed first in the ifelse statements?


Solution

  • When comparing with == NA values return NA. When the first statement returns an NA value it doesn't go and check the next ifelse statement. To go to the next ifelse statement it needs a FALSE value.

    p1$Value == 1
    #[1]  TRUE  TRUE FALSE    NA    NA
    

    A workaround would be to use %in% instead of == which returns FALSE for NA values.

    p1$Value %in% 1
    #[1]  TRUE  TRUE FALSE FALSE FALSE
    
    library(dplyr)
    
    p1 %>% mutate(NewCol = ifelse(Value %in% 1, "Test1Yes",
                                  ifelse(is.na(Value), "TestYes",
                                         ifelse(Value %in% 0, "Test0Yes","No"))))
    
    #  col1 Value   NewCol
    #1 var1     1 Test1Yes
    #2 var2     1 Test1Yes
    #3 var3     0 Test0Yes
    #4 var4    NA  TestYes
    #5 var5    NA  TestYes
    

    You can also get the desired behaviour using case_when statement instead of nested ifelse.

    p1 %>% 
      mutate(NewCol = case_when(Value == 1 ~ "Test1Yes",
                                is.na(Value) ~ "TestYes",
                                Value == 0 ~ "Test0Yes",
                                TRUE ~ "No"))
    
    #  col1 Value   NewCol
    #1 var1     1 Test1Yes
    #2 var2     1 Test1Yes
    #3 var3     0 Test0Yes
    #4 var4    NA  TestYes
    #5 var5    NA  TestYes