I have a dataset with many variables, some of them are character variables, which I would like to convert to factors. Since there are many variables to convert, I would like to do this using the new tidy eval functionality from dplyr_0.7
. Here is a minimal example from my data:
data <- data.frame(factor1 = c("K", "V"),
factor2 = c("E", "K"),
other_var = 1:2,
stringsAsFactors = FALSE)
I have a named list containing a data.frame
for each variable which I want to convert. These data.frame
s in the list all have the same structure which can be seen in this example:
codelist_list <- list(factor1 = data.frame(Code = c("K", "V"),
Bezeichnung = c("Kauf", "Verkauf"),
stringsAsFactors = FALSE),
factor2 = data.frame(Code = c("E", "K"),
Bezeichnung = c("Eigengeschaeft", "Kundengeschaeft"),
stringsAsFactors = FALSE))
What I do not want to do is to define the factors like this for each variable:
mutate(df, factor1 = factor(factor1,
levels = codelist_list[["factor1"]][["Code"]],
labels = codelist_list[["factor1"]][["Bezeichnung"]]))
What I have tried so far is the following:
convert_factors <- function(variable, df) {
factor_variable <- enquo(variable)
df %>%
mutate(!!quo_name(factor_variable) := factor(!!quo_name(factor_variable),
levels = codelist_list[[variable]][["Code"]],
labels = codelist_list[[variable]][["Bezeichnung"]]))
}
In a first step, I want to check if my function convert_factors()
works properly by calling convert_factors("factor1", data)
which returns
factor1 factor2 other_var
1 <NA> E 1
2 <NA> K 2
The variable does not show the value labels, but is replaced by NA
instead.
The ultimate goal would be to map
over all variables which I want to convert. Here, I tried map(c("factor1", "factor2"), convert_factors, df = data)
, which returned
Error in (function (x, strict = TRUE) : the argument has already been evaluated
I tried to follow the instructions from http://dplyr.tidyverse.org/articles/programming.html, but this is all I came up with.
Does anyone know where the problem is (and hopefully explain my error to me).
You could approach this with mutate_at
, using the .
coding within funs
to apply a function to multiple columns at once.
This approach still involves using tidyeval
to pull the correct list from codelist_list
while referring to the variable via .
.
mutate_at(data, c("factor1", "factor2"),
funs( factor(., levels = codelist_list[[quo_name(quo(.))]][["Code"]],
labels = codelist_list[[quo_name(quo(.))]][["Bezeichnung"]]) ) )
factor1 factor2 other_var
1 Kauf Eigengeschaeft 1
2 Verkauf Kundengeschaeft 2
If you wanted to make a function to pass to mutate_at
, you can do so, with a few slight changes.
convert_factors = function(variable) {
var2 = enquo(variable)
factor(variable, levels = codelist_list[[quo_name(var2)]][["Code"]],
labels = codelist_list[[quo_name(var2)]][["Bezeichnung"]])
}
mutate_at(data, c("factor1", "factor2"), convert_factors)
factor1 factor2 other_var
1 Kauf Eigengeschaeft 1
2 Verkauf Kundengeschaeft 2