Search code examples
rggplot2

ggplot2: How to reorder stacked bar charts by proportions of fill variable


I'm working with the "NYC Property Sales" dataset which is available on kaggle: https://www.kaggle.com/new-york-city/nyc-property-sales?select=nyc-rolling-sales.csv

After cleaning the dataset, I produced the following barplot with this code:

nyc_clean %>%
  filter(year == 2017,
         borough == "Manhatten") %>% 
  add_count(neighborhood) %>% 
  mutate(neighborhood = fct_reorder(neighborhood, n) %>% fct_rev()) %>% 
  filter(as.numeric(neighborhood) <= 13) %>% 
  distinct(borough, block, lot, .keep_all = TRUE) %>%
  pivot_longer(c("residential_units", "commercial_units"),
               names_to = "type",
               values_to = "count") %>%
  mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)), 
                                    mean, na.rm = TRUE)) %>% 
  ggplot(aes(neighborhood, count, fill = type)) + 
  geom_col(position = "fill") +
  scale_y_continuous(labels = percent) +
  coord_flip() +
  theme_light()

barplot1

I want to reorder the barplot so that the proportion of residential units is in a descending order (from top to bottom). In the code above, I tried to reorder the neighborhoods with fct_reorder but it doesn't have any effect on the plot.

As a reproducible example, consider this dataset:

df <- tibble(neighborhood = c(rep("Chelsea", 4), rep("Tribeca", 4),
                                 rep("Flatiron", 4)),
                type = c("residential_unit", "commercial_unit", "residential_unit",
                         "commercial_unit", "residential_unit", "commercial_unit",
                         "residential_unit", "commercial_unit", "residential_unit",
                         "commercial_unit", "residential_unit", "commercial_unit"),
                count = c(8, 3, 9, 1, 5, 4, 6, 3, 12, 2, 10, 1))

When trying to reorder the plot, the bars are ordered equally messy as in my output above:

df %>% 
  mutate(neighborhood = fct_reorder(neighborhood, as.numeric(as.factor(type)), 
                                    mean, na.rm = TRUE)) %>% 
  ggplot(aes(neighborhood, count, fill = type)) + 
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  coord_flip() +
  theme_light()

barplot2

Any ideas on what I'm missing here?


Solution

  • Hopefully this makes up for lack of concision with clarity:

    df %>% 
      left_join(   # Add res_share for each neighborhood 
        df %>% 
          mutate(share = count / sum(count), .by = neighborhood) %>%
          filter(type == "residential_unit") %>%
          select(neighborhood, res_share = share)
        ) %>%
      mutate(neighborhood = fct_reorder(neighborhood, res_share)) %>% 
      ggplot(aes(neighborhood, count, fill = type)) + 
      geom_col(position = "fill") +
      scale_y_continuous(labels = scales::percent) +
      coord_flip() +
      theme_light()
    

    (Edited in 2024 to use the dplyr 1.1.0+ .by syntax, which is cleaner than the group_by(neighborhood) %>% ... %>% ungroup() syntax I had used originally.