Search code examples
rdplyrdata-analysistidyverseforcats

R: convert to factor with order of levels same with case_when


When doing data analysis, I sometimes need to recode values to factors in order to carry out groups analysis. I want to keep the order of factor same as the order of conversion specified in case_when. In this case, the order should be "Excellent" "Good" "Fail". How can I achieve this without tediously mention it again as in levels=c('Excellent', 'Good', 'Fail')?

Thank you very much.


library(dplyr, warn.conflicts = FALSE)             
                                                   
set.seed(1234)                                     
score <- runif(100, min = 0, max = 100)     
   
Performance <- function(x) {                       
  case_when(                                         
    is.na(x) ~ NA_character_,                          
    x > 80   ~ 'Excellent',                            
    x > 50   ~ 'Good',                                 
    TRUE     ~ 'Fail'                                  
  ) %>% factor(levels=c('Excellent', 'Good', 'Fail'))
}                                                  
                                                   
performance <- Performance(score)                  
levels(performance)                                
#> [1] "Excellent" "Good"      "Fail"
table(performance)                                 
#> performance
#> Excellent      Good      Fail 
#>        15        30        55

Solution

  • My Solution

    Finally, I came up with a solution. For those who are interested, here is my solution. I wrote a function fct_case_when (pretend being a function in forcats). It is just a wrapper of case_when with factor output. The order of levels is the same as the argument order.


    fct_case_when <- function(...) {
      args <- as.list(match.call())
      levels <- sapply(args[-1], function(f) f[[3]])  # extract RHS of formula
      levels <- levels[!is.na(levels)]
      factor(dplyr::case_when(...), levels=levels)
    }
    

    Now, I can use fct_case_when in place of case_when, and the result will be the same as the previous implementation (but less tedious).


    Performance <- function(x) {                       
      fct_case_when(                                         
        is.na(x) ~ NA_character_,                          
        x > 80   ~ 'Excellent',                            
        x > 50   ~ 'Good',                                 
        TRUE     ~ 'Fail'                                  
      )
    }      
    performance <- Performance(score)                  
    levels(performance)                       
    #> [1] "Excellent" "Good"      "Fail"
    table(performance)                
    #> performance
    #> Excellent      Good      Fail 
    #>        15        30        55