r string for-loop missing-data dummy-variable

Create variable that captures if there are missing fields in 4 string variables

I am creating dummy variables where missing values are 1 and non-missing values are 0. The missing values are NA, i.e.:

NA
NA
Positive
NA
Negative

My code for one variable at a time successfully created the dummy variable:

library(dplyr)

#create new dummy variable
df <- mutate(df, newvar = ifelse(is.na(var1), 1,0))

#check
sum(df$newvar == 1)

I have 4 string variables and want to create a new dummy variable where missing values in any of the variables are 1, and non-missing values are 0. I tried reusing the above code:

mylist <- c("var1", "var2", "var3", "var4")

for(i in mylist){
  df <- mutate(df, newvar = ifelse(is.na(i), 1,0))
}

I know that I am incorrectly using the for loop, but is this the correct approach, or should I be doing something different?

Solution

We can use mutate with across

library(dplyr) # version >= 1.0.0  
df <- df %>%
          mutate(across(all_of(mylist), ~ +(is.na(.)), .names = '{col}_newvar'))

if we have an earlier version, use mutate_at

df %>%
   mutate_at(vars(mylist), ~  +(is.na(.)))

If we need to create a new column that flags if there are any missing value in those columns in the 'mylist'

df1 <- df %>%
    mutate(newvar = +(rowSums(is.na(select(., all_of(mylist)))) > 0))