Search code examples
rggplot2tidytext

In ggplot, how to order bars from high-to-low, within-facet, and still relocate one specific bar to position of choice?


I want to create a bar plot with bars ordered from high to low, and preserve such ordering also when wrapping into facets. Fortunately, the package {tidytext} has a function reorder_within() that does exactly this. However, I couldn't find a way to apply such within-facet high-to-low reordering and manually reposition specific bars.

Example

Below I'm adopting code from this blog.
Let's say that we want to count baby names per decade.

Step 1 -- the data

We're going to visualize top_names, which is a subset of the babynames dataset.

library(babynames)
library(dplyr, warn.conflicts = FALSE)

top_names <- 
  babynames %>%
  filter(between(year, 1950, 1990)) %>%
  mutate(decade = (year %/% 10) * 10) %>%
  group_by(decade) %>%
  count(name, wt = n, sort = TRUE) %>%
  ungroup()

top_names
#> # A tibble: 123,205 x 3
#>    decade name         n
#>     <dbl> <chr>    <int>
#>  1   1950 James   846042
#>  2   1950 Michael 839459
#>  3   1960 Michael 836934
#>  4   1950 Robert  832336
#>  5   1950 John    799658
#>  6   1950 David   771242
#>  7   1960 David   736583
#>  8   1960 John    716284
#>  9   1970 Michael 712722
#> 10   1960 James   687905
#> # ... with 123,195 more rows

Created on 2021-08-11 by the reprex package (v2.0.0)

Step 2 -- prepare data for plot

library(tidytext)
library(ggplot2)

data_for_plot <- 
  top_names %>%
  group_by(decade) %>%
  top_n(15) %>%
  ungroup() %>%
  mutate(decade = as.factor(decade),
         name = reorder_within(name, n, decade))
#> Selecting by n

data_for_plot
#> # A tibble: 75 x 3
#>    decade name                n
#>    <fct>  <fct>           <int>
#>  1 1950   James___1950   846042
#>  2 1950   Michael___1950 839459
#>  3 1960   Michael___1960 836934
#>  4 1950   Robert___1950  832336
#>  5 1950   John___1950    799658
#>  6 1950   David___1950   771242
#>  7 1960   David___1960   736583
#>  8 1960   John___1960    716284
#>  9 1970   Michael___1970 712722
#> 10 1960   James___1960   687905
#> # ... with 65 more rows

Step 3 -- Visualize

p_so_far_so_good <- data_for_plot %>%
  ggplot(aes(name, n, fill = decade)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

p_so_far_so_good

plotted_nicely


So far, so good!

Now, in each facet (i.e., "decade"), I want to lump together all bars except for the top 15. While the top 15 should remain as-is, the rest should be grouped as "other". Furthermore, I want to visualize this new categorization in a way that preserves the high-to-low order of bars, but nevertheless tacks the "other" bar at the bottom. I've already posted such a question, but the solution given there isn't working for the within-facet situation.

My unsuccessful attempt #1

Some wrangling to create the "other" category per group.

data_for_plot_with_other <-
  top_names %>%
  group_by(decade) %>%
  arrange(decade, desc(n)) %>%
  mutate(name = ifelse(row_number() < 5, name, "other")) %>%
  group_by(decade, name) %>%
  summarise(across(n, sum)) %>%
  arrange(name == "other", -n, .by_group = TRUE) %>%
  ungroup() %>%
  mutate(decade = as.factor(decade),
         name = reorder_within(name, n, decade))

> data_for_plot_with_other %>% print(n = 20)
## # A tibble: 75 x 3
##    decade name                   n
##    <fct>  <fct>              <int>
##  1 1950   James___1950      846042
##  2 1950   Michael___1950    839459
##  3 1950   Robert___1950     832336
##  4 1950   John___1950       799658
##  5 1950   David___1950      771242
##  6 1950   Mary___1950       627098
##  7 1950   William___1950    592423
##  8 1950   Linda___1950      565481
##  9 1950   Richard___1950    536393
## 10 1950   Patricia___1950   460643
## 11 1950   Thomas___1950     455154
## 12 1950   Susan___1950      438419
## 13 1950   Deborah___1950    431302
## 14 1950   Mark___1950       383076
## 15 1950   other___1950    30863329 ## see how "other" closes the decade = 1950 group
## 16 1960   Michael___1960    836934
## 17 1960   David___1960      736583
## 18 1960   John___1960       716284
## 19 1960   James___1960      687905
## 20 1960   Robert___1960     653556
# ... with 55 more rows

visualize

data_for_plot_with_other %>%
  ggplot(aes(name, n, fill = factor(decade))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

not_what_i_want

This is not what I want.enter image description here

What I do want is something like the following: demo

My unsuccessful attempt #2

In the spirit of this answer, I tried also releveling the factor name. I built a helper function that detects the substring other_ and moves it to the end (of group).

move_to_end <- function(x, match_to_pattern = "other_") {
  which_idx <- grep(pattern = match_to_pattern, x = x)
  c(x[-which_idx], x[which_idx])
}

data_for_plot_with_other %>%
  group_by(decade) %>%
  mutate(across(name, ~fct_relevel(.x, move_to_end))) %>%
  ## then the visualization as before
  ggplot(aes(name, n, fill = factor(decade))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~decade, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0))

same

Still, the same output.

In sum

How can I get the bars ordered high-to-low within facets and have the "other" bar always at the very bottom?


Solution

  • I think I would do this using:

    • forcats::fct_lump_n() to create the "Other" category within each decade, using the w argument to weight by the frequencies we have counted up
    • forcats::fct_relevel() to move the newly created "Other___" levels to the beginning of the big overall factor
    library(tidyverse)
    library(babynames)
    
    top_names <- 
        babynames %>%
        filter(between(year, 1950, 1990)) %>%
        mutate(decade = (year %/% 10) * 10) %>%
        group_by(decade) %>%
        count(name, wt = n, sort = TRUE, name = "total") %>%
        ungroup()
    
    
    library(tidytext)
    
    data_for_plot <- 
        top_names %>%
        group_by(decade) %>%
        mutate(name = fct_lump_n(name, n = 15, w = total)) %>%
        group_by(decade, name) %>%
        mutate(total = sum(unique(total))) %>%
        ungroup() %>%
        distinct(decade, name, total) %>%
        mutate(decade = as.factor(decade),
               name = reorder_within(name, total, decade),
               name = fct_relevel(name, paste0("Other___", unique(decade))))
    
    data_for_plot %>%
        ggplot(aes(total, name, fill = decade)) +
        geom_col(show.legend = FALSE) +
        facet_wrap(~decade, scales = "free_y") +
        scale_y_reordered() +
        scale_x_continuous(expand = c(0,0))
    

    Created on 2021-08-12 by the reprex package (v2.0.1)