Search code examples
rdplyrtidyeval

Passing a character string to mutate


I have already looked on SO for an answer to this question, but didn't manage to find a solution to my problem.

I have a dataframe with several columns, each of which has at least one NA. Names of these columns are stored in character vector vars_na. For each of those, I would like to create a dummy variable taking value 0 if the value for that observation is missing, and 1 otherwise.

Below there is a reproducible toy example and the code I used up to now:

# creation of toy dataset
iris[1:5, 1] <- rep(NA, 5)
iris[1:10, 4] <- rep(NA, 10)
vars_na <- c("Sepal.Length", "Petal.Width")

for(var in vars_na){
  iris <- iris %>% 
    mutate(dummy = ifelse(is.na(!!var), 0, 1)) %>% 
    rename_at(c("dummy"), list(~paste0("dummyna_", var)))
# 'rename_at' is just to differentiate between the several dummies created, 
# and it works correctly
}

The problem is that the newly created dummies result in being vector full of 1's, so they do not consider missing values correctly; indeed:

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species dummyna_Sepal.Length dummyna_Petal.Width
1           NA         3.5          1.4          NA  setosa                    1                   1
2           NA         3.0          1.4          NA  setosa                    1                   1
3           NA         3.2          1.3          NA  setosa                    1                   1
4           NA         3.1          1.5          NA  setosa                    1                   1
5           NA         3.6          1.4          NA  setosa                    1                   1
6          5.4         3.9          1.7          NA  setosa                    1                   1

but I would like to obtain

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species dummyna_Sepal.Length dummyna_Petal.Width
1           NA         3.5          1.4          NA  setosa                    0                   0
2           NA         3.0          1.4          NA  setosa                    0                   0
3           NA         3.2          1.3          NA  setosa                    0                   0
4           NA         3.1          1.5          NA  setosa                    0                   0
5           NA         3.6          1.4          NA  setosa                    0                   0
6          5.4         3.9          1.7          NA  setosa                    1                   0

The code is simple and I believed it should work. What am I doing wrong instead? Thanks in advance.


Solution

  • The problem is that since var is a character, something like is.na(!!var) ends up as something like is.na("Sepal.Length"), which is always false.

    You can use rlang::sym* to transform characters to symbols that can be evaluated by mutate for example:

    for (var in vars_na) {
      var_sym <- rlang::sym(var)
      new_name <- rlang::sym(paste0(var, "_na"))
    
      iris <- iris %>%
        mutate(!!new_name := as.integer(!is.na(!!var_sym)))
    }
    

    *The rlang package serves at the basis for most of the non-standard evaluation dplyr supports, see tidy evaluation.