Search code examples
rdplyrtidyverseforcats

Rename all levels of factor variable, is there a tidyverse way to do it?


I occasionally need to rename all the levels of factor variable. I know how to achieve this with R base like: levels(factor_variable) <- levels(new_variable). But I would really like to have a way to do this kind of thing using tidyverse. I look in dplyr and forcats but I do not found anything to solve it. I would like to be able to do what I achieve in example 1, but working with the %>% operator.

Example 1, with R base (which works)

my_levels <- letters
sample_data <- data.frame(factor_data = factor(sample(my_levels,size = 500,replace = T) ,
                                               levels = my_levels),
                          Any_other_data = rnorm(500))

my_new_levels <- rnorm(length(letters))

levels(sample_data$factor_data) <- levels(factor(my_new_levels))

Example 2, one thing I try and not work with tidyverse

library(tidyverse)

my_levels <- letters
sample_data <- tibble(factor_data = factor(sample(my_levels,size = 500,replace = T) ,
                                               levels = my_levels),
                          Any_other_data = rnorm(500))

my_new_levels <- rnorm(length(letters))

# Get error
sample_data <- sample_data %>%
  mutate(levels(factor_data) = levels(factor(my_new_levels)))
# Get error
sample_data <- sample_data %>%
  mutate(factor_data = recode(factor_data, levels(factor_data) = levels(factor(my_new_levels))))

I also try with recode, but it is not only manually (each value at a time), but it also does not accept the %>%operator. This is som things I try to see what happened:

sample_data <- sample_data %>%
  recode(factor_data, a = '-2.5')

sample_data <- sample_data %>%
  recode_factor(factor_data, a = '-2.5')

recode(sample_data$factor_data, levels(sample_data$factor_data) = levels(factor(my_new_levels)))

recode(sample_data$factor_data, a = '-2.5')
recode_factor(sample_data$factor_data, a = '-2.5')

Solution

  • You can easily do that with a named vector and forcats::fct_recode():

    library(tidyverse)
    set.seed(42)
    
    my_levels <- letters
    sample_data <- data.frame(factor_data = factor(sample(my_levels,size = 500,replace = T) ,
                                                   levels = my_levels),
                              Any_other_data = rnorm(500))
    
    my_new_levels <- rnorm(length(letters))
    
    # create a named vector with the new levels
    named_level_vector <- levels(sample_data$factor_data)
    names(named_level_vector) <- my_new_levels
    
    # use mutate and fct_recode with that vector
    
    sample_data <- sample_data %>% 
      mutate(new_factor_data = forcats::fct_recode(factor_data, !!!named_level_vector))
    
    head(sample_data)
    #>   factor_data Any_other_data    new_factor_data
    #> 1           q     0.48236947  0.223521215874458
    #> 2           e     0.99294364  -1.12828853519737
    #> 3           a    -1.24639550  -2.55382485095083
    #> 4           y    -0.03348752   1.67099730539817
    #> 5           j    -0.07096218 -0.318990710826149
    #> 6           d    -0.75892065  -1.17990419995829
    

    Created on 2020-06-11 by the reprex package (v0.3.0)