Search code examples
rrefactoringtidyverseforcats

fct_reorder by function for only one group


I have a df of public and private schools within counties, and each has an assigned value. I want to use forcats::fct_reorder to rearrange the counties by the median value, but only for the private schools. Using default forcats::fct_reorder arranges by total median, which is less useful for what I'm doing.

Reprex here:

# make df
set.seed(1)
df <-
  data.frame(
    county = rep(c("Bexar","Travis","Tarrant","Aransas"), each=20),
    type = rep(c("public","private"), each=10)
  ) %>%
  mutate(value = case_when(type == "public" ~ runif(80,0,1),
                           type == "private" ~ runif(80, 0, 10))) 
# private values are way higher than public


# relevel by median value
df %>%
  mutate(county = forcats::fct_reorder(county, value, .fun=median)) %>% 
  # this rearranges counties by total median, but I only want to arrange by median of the private schools
  
  # plot
  ggplot(aes(x=county, y = value, color = type)) +
  geom_point(position = position_dodge(
    width=.75
  )) +
  geom_boxplot(alpha=.5)

Desired output would order them by increasing median of private schools only: Aransas, Travis, Tarrant, Bexar.

thanks!


Solution

  • library(tidyverse)
    
    set.seed(1)
    
    df <-
      data.frame(
        county = rep(c("Bexar","Travis","Tarrant","Aransas"), each=20),
        type = rep(c("public","private"), each=10)
      ) %>%
      mutate(value = case_when(type == "public" ~ runif(80,0,1),
                               type == "private" ~ runif(80, 0, 10))) 
    
    private_medians <-
      df %>%
      filter(type == "private") %>%
      group_by(county) %>%
      summarise(median = median(value)) %>%
      arrange(median)
    private_medians
    #> # A tibble: 4 x 2
    #>   county  median
    #>   <chr>    <dbl>
    #> 1 Aransas   3.91
    #> 2 Travis    4.39
    #> 3 Tarrant   5.68
    #> 4 Bexar     6.24
    
    # add other counties at the end in case they do not appear in the private subset
    levels <- private_medians$county %>% union(df$county %>% unique())
    
    df %>%
      mutate(county = county %>% factor(levels = levels)) %>%
      ggplot(aes(x=county, y = value, color = type)) +
      geom_point(position = position_dodge(
        width=.75
      )) +
      geom_boxplot(alpha=.5)
    

    Created on 2021-10-18 by the reprex package (v2.0.1)