I created some sample data below to help illustrate my question.
library(dplyr)
col1 = paste(rep('var',5),seq(1:5), sep = "")
Value = c(1,1,0,NA,NA)
p1 <- data.frame(col1,Value)
> p1
col1 Value
var1 1
var2 1
var3 0
var4 NA
var5 NA
When is.na(Value) is placed first in the ifelse statement, mutate works as expected.
> p1 %>% mutate(NewCol = ifelse(is.na(Value), "TestYes",
ifelse(Value == 1, "Test1Yes",
ifelse(Value == 0, "Test0Yes","No"))))
col1 Value NewCol
var1 1 Test1Yes
var2 1 Test1Yes
var3 0 Test0Yes
var4 NA TestYes
var5 NA TestYes
When I place is.na(Value) as the second ifelse statement, it doesnt work. But the third ifelse statement still works checking for Value == 0. The second ifelse statement with is.na(Value) is skipped over.
> p1 %>% mutate(NewCol = ifelse(Value == 1, "Test1Yes",
ifelse(is.na(Value), "TestYes",
ifelse(Value == 0, "Test0Yes","No"))))
col1 Value NewCol
var1 1 Test1Yes
var2 1 Test1Yes
var3 0 Test0Yes
var4 NA <NA>
var5 NA <NA>
Am I missing something in the code or is there a reason why is.na needs to be placed first in the ifelse statements?
When comparing with ==
NA
values return NA
. When the first statement returns an NA
value it doesn't go and check the next ifelse
statement. To go to the next ifelse
statement it needs a FALSE
value.
p1$Value == 1
#[1] TRUE TRUE FALSE NA NA
A workaround would be to use %in%
instead of ==
which returns FALSE
for NA
values.
p1$Value %in% 1
#[1] TRUE TRUE FALSE FALSE FALSE
library(dplyr)
p1 %>% mutate(NewCol = ifelse(Value %in% 1, "Test1Yes",
ifelse(is.na(Value), "TestYes",
ifelse(Value %in% 0, "Test0Yes","No"))))
# col1 Value NewCol
#1 var1 1 Test1Yes
#2 var2 1 Test1Yes
#3 var3 0 Test0Yes
#4 var4 NA TestYes
#5 var5 NA TestYes
You can also get the desired behaviour using case_when
statement instead of nested ifelse
.
p1 %>%
mutate(NewCol = case_when(Value == 1 ~ "Test1Yes",
is.na(Value) ~ "TestYes",
Value == 0 ~ "Test0Yes",
TRUE ~ "No"))
# col1 Value NewCol
#1 var1 1 Test1Yes
#2 var2 1 Test1Yes
#3 var3 0 Test0Yes
#4 var4 NA TestYes
#5 var5 NA TestYes