Search code examples
rdplyrrlangtidyeval

dplyr 0.7 tidy eval: convert character variables to factors


I have a dataset with many variables, some of them are character variables, which I would like to convert to factors. Since there are many variables to convert, I would like to do this using the new tidy eval functionality from dplyr_0.7. Here is a minimal example from my data:

data <- data.frame(factor1 = c("K", "V"), 
                   factor2 = c("E", "K"), 
                   other_var = 1:2, 
                   stringsAsFactors = FALSE)

I have a named list containing a data.frame for each variable which I want to convert. These data.frames in the list all have the same structure which can be seen in this example:

codelist_list <- list(factor1 = data.frame(Code = c("K", "V"), 
                                           Bezeichnung = c("Kauf", "Verkauf"), 
                                           stringsAsFactors = FALSE),
                      factor2 = data.frame(Code = c("E", "K"), 
                                           Bezeichnung = c("Eigengeschaeft", "Kundengeschaeft"), 
                                           stringsAsFactors = FALSE))

What I do not want to do is to define the factors like this for each variable:

mutate(df, factor1 = factor(factor1, 
                            levels = codelist_list[["factor1"]][["Code"]],
                            labels = codelist_list[["factor1"]][["Bezeichnung"]]))

What I have tried so far is the following:

convert_factors <- function(variable, df) {
  factor_variable <- enquo(variable)
  df %>% 
    mutate(!!quo_name(factor_variable) := factor(!!quo_name(factor_variable), 
                                                 levels = codelist_list[[variable]][["Code"]],
                                                 labels = codelist_list[[variable]][["Bezeichnung"]]))
}

In a first step, I want to check if my function convert_factors() works properly by calling convert_factors("factor1", data) which returns

  factor1 factor2 other_var
1    <NA>       E         1
2    <NA>       K         2

The variable does not show the value labels, but is replaced by NA instead.

The ultimate goal would be to map over all variables which I want to convert. Here, I tried map(c("factor1", "factor2"), convert_factors, df = data), which returned

Error in (function (x, strict = TRUE) : the argument has already been evaluated

I tried to follow the instructions from http://dplyr.tidyverse.org/articles/programming.html, but this is all I came up with.

Does anyone know where the problem is (and hopefully explain my error to me).


Solution

  • You could approach this with mutate_at, using the . coding within funs to apply a function to multiple columns at once.

    This approach still involves using tidyeval to pull the correct list from codelist_list while referring to the variable via ..

    mutate_at(data, c("factor1", "factor2"), 
              funs( factor(., levels = codelist_list[[quo_name(quo(.))]][["Code"]],
                          labels = codelist_list[[quo_name(quo(.))]][["Bezeichnung"]]) ) )
    
      factor1         factor2 other_var
    1    Kauf  Eigengeschaeft         1
    2 Verkauf Kundengeschaeft         2
    

    If you wanted to make a function to pass to mutate_at, you can do so, with a few slight changes.

    convert_factors = function(variable) {
         var2 = enquo(variable)
         factor(variable, levels = codelist_list[[quo_name(var2)]][["Code"]],
                labels = codelist_list[[quo_name(var2)]][["Bezeichnung"]]) 
    }
    
    mutate_at(data, c("factor1", "factor2"), convert_factors)
    
     factor1         factor2 other_var
    1    Kauf  Eigengeschaeft         1
    2 Verkauf Kundengeschaeft         2