I am creating dummy variables where missing values are 1 and non-missing values are 0. The missing values are NA
, i.e.:
NA
NA
Positive
NA
Negative
My code for one variable at a time successfully created the dummy variable:
library(dplyr)
#create new dummy variable
df <- mutate(df, newvar = ifelse(is.na(var1), 1,0))
#check
sum(df$newvar == 1)
I have 4 string variables and want to create a new dummy variable where missing values in any of the variables are 1, and non-missing values are 0. I tried reusing the above code:
mylist <- c("var1", "var2", "var3", "var4")
for(i in mylist){
df <- mutate(df, newvar = ifelse(is.na(i), 1,0))
}
I know that I am incorrectly using the for
loop, but is this the correct approach, or should I be doing something different?
We can use mutate
with across
library(dplyr) # version >= 1.0.0
df <- df %>%
mutate(across(all_of(mylist), ~ +(is.na(.)), .names = '{col}_newvar'))
if we have an earlier version, use mutate_at
df %>%
mutate_at(vars(mylist), ~ +(is.na(.)))
If we need to create a new column that flags if there are any missing value in those columns in the 'mylist'
df1 <- df %>%
mutate(newvar = +(rowSums(is.na(select(., all_of(mylist)))) > 0))