Search code examples
rdplyrtidyverseforcats

Reorder factors by a group statistic


I know this should be straightforward, but it always bites me.
Suppose I have a factor:

library(dplyr)
library(forcats)
fruits <- as.factor(c("apples", "oranges", "oranges", "pears", "pears", "pears"))
df <- as.data.frame(fruits)

I want to reorder the factors according to their frequency (or some other statistics) so that pears>oranges>apples. How do I do that without explicitly calling df %>% group_by(fruits) %>% summarise(freq = n()) %>% fct_reorder(fruits, freq, .desc = TRUE)?


Solution

  • We may need to use that in mutate.

    library(dplyr)
    library(forcats)
    out <- df %>% 
       group_by(fruits) %>% 
       summarise(freq = n(), .groups = 'drop') %>% 
       mutate(fruits = fct_reorder(fruits, freq, .desc = TRUE))
    

    -checking the order of levels

    levels(out$fruits)
    [1] "pears"   "oranges" "apples" 
    levels(df$fruits)
    [1] "apples"  "oranges" "pears"  
    

    If we want to do this on the original dataset, instead of summarise, use add_count to create a frequency column, and apply fct_reorder

    df <- df %>% 
        add_count(fruits) %>% 
        mutate(fruits = fct_reorder(fruits, n, .desc = TRUE)) %>% 
        select(-n)
    

    NOTE: group_by in 1.0.6 - dplyr version doesn't have a .desc argument. The .desc is found in fct_reorder


    In base R, we can do this with table

    out1 <- table(fruits)
    factor(fruits, levels = names(out1[order(-out1)]))
    [1] apples  oranges oranges pears   pears   pears  
    Levels: pears oranges apples