Search code examples
rstringfor-loopmissing-datadummy-variable

Create variable that captures if there are missing fields in 4 string variables


I am creating dummy variables where missing values are 1 and non-missing values are 0. The missing values are NA, i.e.:

NA
NA
Positive
NA
Negative

My code for one variable at a time successfully created the dummy variable:

library(dplyr)

#create new dummy variable
df <- mutate(df, newvar = ifelse(is.na(var1), 1,0))

#check
sum(df$newvar == 1)

I have 4 string variables and want to create a new dummy variable where missing values in any of the variables are 1, and non-missing values are 0. I tried reusing the above code:

mylist <- c("var1", "var2", "var3", "var4")

for(i in mylist){
  df <- mutate(df, newvar = ifelse(is.na(i), 1,0))
}

I know that I am incorrectly using the for loop, but is this the correct approach, or should I be doing something different?


Solution

  • We can use mutate with across

    library(dplyr) # version >= 1.0.0  
    df <- df %>%
              mutate(across(all_of(mylist), ~ +(is.na(.)), .names = '{col}_newvar'))
    

    if we have an earlier version, use mutate_at

    df %>%
       mutate_at(vars(mylist), ~  +(is.na(.)))
    

    If we need to create a new column that flags if there are any missing value in those columns in the 'mylist'

    df1 <- df %>%
        mutate(newvar = +(rowSums(is.na(select(., all_of(mylist)))) > 0))