When doing data analysis, I sometimes need to recode values to factors in order to carry out groups analysis. I want to keep the order of factor same as the order of conversion specified in case_when
. In this case, the order should be "Excellent" "Good" "Fail"
. How can I achieve this without tediously mention it again as in levels=c('Excellent', 'Good', 'Fail')
?
Thank you very much.
library(dplyr, warn.conflicts = FALSE)
set.seed(1234)
score <- runif(100, min = 0, max = 100)
Performance <- function(x) {
case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x > 50 ~ 'Good',
TRUE ~ 'Fail'
) %>% factor(levels=c('Excellent', 'Good', 'Fail'))
}
performance <- Performance(score)
levels(performance)
#> [1] "Excellent" "Good" "Fail"
table(performance)
#> performance
#> Excellent Good Fail
#> 15 30 55
Finally, I came up with a solution. For those who are interested, here is my solution. I wrote a function fct_case_when
(pretend being a function in forcats
). It is just a wrapper of case_when
with factor output. The order of levels is the same as the argument order.
fct_case_when <- function(...) {
args <- as.list(match.call())
levels <- sapply(args[-1], function(f) f[[3]]) # extract RHS of formula
levels <- levels[!is.na(levels)]
factor(dplyr::case_when(...), levels=levels)
}
Now, I can use fct_case_when
in place of case_when
, and the result will be the same as the previous implementation (but less tedious).
Performance <- function(x) {
fct_case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x > 50 ~ 'Good',
TRUE ~ 'Fail'
)
}
performance <- Performance(score)
levels(performance)
#> [1] "Excellent" "Good" "Fail"
table(performance)
#> performance
#> Excellent Good Fail
#> 15 30 55