Search code examples
rdplyrpurrrforcats

Iterating over reference levels using relevel() or fct_level()


I have several categorical variables in a dataset where I must manually change the reference level. I wish to find a way to iterate over the collections of variables and reference levels in order to avoid copying and pasting dozens of times.

I've attempted using a combination of mutate_at and relevel, trying to feed through a vector containing my desired reference levels. This didn't work. I have been unable to find any other questions that address the iteration part of my problem specifically. Please refer to toy example below.

library(tidyverse)

mtcars <- as_tibble(mtcars)

#this step is for the autofill functionality within `select`
#handy for larger collections of variables
cars_factors <- mtcars %>% select(cyl, gear, carb) %>% names()

factor_lvls <- mtcars %>% 
  mutate_at(cars_factors, factor) %>% 
  select(cars_factors) 

#Before releveling, levels are taken in ascending numerical order
factor_lvls %>% map(unique)
#> $cyl
#> [1] 6 4 8
#> Levels: 4 6 8
#> 
#> $gear
#> [1] 4 3 5
#> Levels: 3 4 5
#> 
#> $carb
#> [1] 4 1 2 3 6 8
#> Levels: 1 2 3 4 6 8

#changing reference levels
factor_lvls$cyl <- relevel(factor_lvls$cyl, ref = "8")
factor_lvls$gear <- relevel(factor_lvls$gear, ref = "5")
factor_lvls$carb <- relevel(factor_lvls$carb, ref = "3")

#note, reference level order has changed. the first level now reflects ref levels above
factor_lvls %>% map(unique)
#> $cyl
#> [1] 6 4 8
#> Levels: 8 4 6
#> 
#> $gear
#> [1] 4 3 5
#> Levels: 5 3 4
#> 
#> $carb
#> [1] 4 1 2 3 6 8
#> Levels: 3 1 2 4 6 8

#my attempt
factor_lvls %>% mutate_at(cars_factors, fct_relevel(., c("8", "5", "3")))
#> Error: `f` must be a factor (or character vector or numeric vector).

Created on 2019-07-02 by the reprex package (v0.2.1)

My intent is to perform the desired action of changing reference levels without explicitly copying and pasting the code for each step. This example's levels are numbers, but my actual problem contains a mix of numbers and strings, so I can't rely on ascending order to get the reference levels right.


Solution

  • Since you essentially want to apply a different function to each column, i think using map2 here makes more sense. How about

    map2_df(factor_lvls %>% select(cars_factors),  c("8", "5", "3"), ~fct_relevel(.x, .y))
    

    This will set the order of the levels for you. This will create a new data.frame rather than update one, so if you wanted to merge that into some other table, you could just bind_cols() the data together.