Search code examples
rdplyrforcats

Changing the factor level based on the value in a grouped variable using the dplyr and forcats packages


I am trying to change the levels of a factor based on some values coming from another variable. I will show it on an example. I have such a table:

library(tidyverse)

set.seed(1)
df = tibble(
  group = factor(rep(c("a", "b", "c", "d"), each = 5)),
  x = c(rnorm(5, 0, 1), rnorm(5, 0, 2), rnorm(5, 0, 1.5), rnorm(5, 0, 3))
)

I would like to change the level of the group factor in decreasing value of the standard deviation of the variable x.

I managed to get it like this:

lev = df %>% group_by(group) %>% 
  summarise(sd = sd(x)) %>% 
  arrange(desc(sd))

df = df %>% mutate(group = fct_relevel(group, as.character(lev$group)))

However, I don't like this solution because it requires creating an auxiliary lev table, which I would like to avoid. Does anyone know how to achieve this effect in a more simple and transparent way typical for dplyr semantics.


Solution

  • What you are looking for is forcats::fct_reorder():

    df = df %>% mutate(group = fct_reorder(group, x, sd, .desc = TRUE))
    df %>% group_by(group) %>% summarise(sd=sd(x))