Search code examples
rdataframedplyr

Converting multiple columns to factors and releveling with mutate(across)


dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
                   Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
                   Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))  

GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")

I have a dataframe that looks something like the above (but with many other columns I don't want to change).

The columns I am interested in changing contains lists of letter grades, but are currently character vectors and not in the right order.

I need to convert each of these columns into factors with the correct order. I've been able to get this to work using the code below:

factordat <-
    dat %>%
      mutate(Comp1Letter = factor(Comp1Letter, levels = GradeLevels)) %>%
      mutate(Comp2Letter = factor(Comp2Letter, levels = GradeLevels)) %>%
      mutate(Comp3Letter = factor(Comp3Letter, levels = GradeLevels)) 

However this is super verbose and chews up a lot of space.

Looking at some other questions, I've tried to use a combination of mutate() and across(), as seen below:

factordat <-
  dat %>%
    mutate(across(c(Comp1Letter, Comp2Letter, Comp3Letter) , factor(levels = GradeLetters))) 

However when I do this the vectors remain character vectors.

Could someone please tell me what I'm doing wrong or offer another option?


Solution

  • You can do across as an anonymous function like this:

    dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
                       Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
                       Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))  
    
    GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")
    
    dat %>%
      tibble::as_tibble() %>%
        dplyr::mutate(dplyr::across(c(Comp1Letter, Comp2Letter, Comp3Letter) , ~forcats::parse_factor(., levels = GradeLevels)))
    
    # # A tibble: 8 × 3
    #   Comp1Letter Comp2Letter Comp3Letter
    #   <fct>       <fct>       <fct>      
    # 1 A           B           D          
    # 2 B           C           A          
    # 3 D           E           C          
    # 4 F           U           D          
    # 5 U           A           F          
    # 6 A*          C           D          
    # 7 B           A*          C          
    # 8 C           E           A     
    

    You were close, all that was left to be done was make the factor function anonymous. That can be done either with ~ and . in tidyverse or function(x) and x in base R.