I want to create a bar plot with bars ordered from high to low, and preserve such ordering also when wrapping into facets. Fortunately, the package {tidytext}
has a function reorder_within()
that does exactly this. However, I couldn't find a way to apply such within-facet high-to-low reordering and manually reposition specific bars.
Below I'm adopting code from this blog.
Let's say that we want to count baby names per decade.
We're going to visualize top_names
, which is a subset of the babynames
dataset.
library(babynames)
library(dplyr, warn.conflicts = FALSE)
top_names <-
babynames %>%
filter(between(year, 1950, 1990)) %>%
mutate(decade = (year %/% 10) * 10) %>%
group_by(decade) %>%
count(name, wt = n, sort = TRUE) %>%
ungroup()
top_names
#> # A tibble: 123,205 x 3
#> decade name n
#> <dbl> <chr> <int>
#> 1 1950 James 846042
#> 2 1950 Michael 839459
#> 3 1960 Michael 836934
#> 4 1950 Robert 832336
#> 5 1950 John 799658
#> 6 1950 David 771242
#> 7 1960 David 736583
#> 8 1960 John 716284
#> 9 1970 Michael 712722
#> 10 1960 James 687905
#> # ... with 123,195 more rows
Created on 2021-08-11 by the reprex package (v2.0.0)
library(tidytext)
library(ggplot2)
data_for_plot <-
top_names %>%
group_by(decade) %>%
top_n(15) %>%
ungroup() %>%
mutate(decade = as.factor(decade),
name = reorder_within(name, n, decade))
#> Selecting by n
data_for_plot
#> # A tibble: 75 x 3
#> decade name n
#> <fct> <fct> <int>
#> 1 1950 James___1950 846042
#> 2 1950 Michael___1950 839459
#> 3 1960 Michael___1960 836934
#> 4 1950 Robert___1950 832336
#> 5 1950 John___1950 799658
#> 6 1950 David___1950 771242
#> 7 1960 David___1960 736583
#> 8 1960 John___1960 716284
#> 9 1970 Michael___1970 712722
#> 10 1960 James___1960 687905
#> # ... with 65 more rows
p_so_far_so_good <- data_for_plot %>%
ggplot(aes(name, n, fill = decade)) +
geom_col(show.legend = FALSE) +
facet_wrap(~decade, scales = "free_y") +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
p_so_far_so_good
Now, in each facet (i.e., "decade"), I want to lump together all bars except for the top 15. While the top 15 should remain as-is, the rest should be grouped as "other". Furthermore, I want to visualize this new categorization in a way that preserves the high-to-low order of bars, but nevertheless tacks the "other" bar at the bottom. I've already posted such a question, but the solution given there isn't working for the within-facet situation.
Some wrangling to create the "other" category per group.
data_for_plot_with_other <-
top_names %>%
group_by(decade) %>%
arrange(decade, desc(n)) %>%
mutate(name = ifelse(row_number() < 5, name, "other")) %>%
group_by(decade, name) %>%
summarise(across(n, sum)) %>%
arrange(name == "other", -n, .by_group = TRUE) %>%
ungroup() %>%
mutate(decade = as.factor(decade),
name = reorder_within(name, n, decade))
> data_for_plot_with_other %>% print(n = 20)
## # A tibble: 75 x 3
## decade name n
## <fct> <fct> <int>
## 1 1950 James___1950 846042
## 2 1950 Michael___1950 839459
## 3 1950 Robert___1950 832336
## 4 1950 John___1950 799658
## 5 1950 David___1950 771242
## 6 1950 Mary___1950 627098
## 7 1950 William___1950 592423
## 8 1950 Linda___1950 565481
## 9 1950 Richard___1950 536393
## 10 1950 Patricia___1950 460643
## 11 1950 Thomas___1950 455154
## 12 1950 Susan___1950 438419
## 13 1950 Deborah___1950 431302
## 14 1950 Mark___1950 383076
## 15 1950 other___1950 30863329 ## see how "other" closes the decade = 1950 group
## 16 1960 Michael___1960 836934
## 17 1960 David___1960 736583
## 18 1960 John___1960 716284
## 19 1960 James___1960 687905
## 20 1960 Robert___1960 653556
# ... with 55 more rows
data_for_plot_with_other %>%
ggplot(aes(name, n, fill = factor(decade))) +
geom_col(show.legend = FALSE) +
facet_wrap(~decade, scales = "free_y") +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
This is not what I want.
What I do want is something like the following:
In the spirit of this answer, I tried also releveling the factor name
.
I built a helper function that detects the substring other_
and moves it to the end (of group).
move_to_end <- function(x, match_to_pattern = "other_") {
which_idx <- grep(pattern = match_to_pattern, x = x)
c(x[-which_idx], x[which_idx])
}
data_for_plot_with_other %>%
group_by(decade) %>%
mutate(across(name, ~fct_relevel(.x, move_to_end))) %>%
## then the visualization as before
ggplot(aes(name, n, fill = factor(decade))) +
geom_col(show.legend = FALSE) +
facet_wrap(~decade, scales = "free_y") +
coord_flip() +
scale_x_reordered() +
scale_y_continuous(expand = c(0,0))
Still, the same output.
How can I get the bars ordered high-to-low within facets and have the "other" bar always at the very bottom?
I think I would do this using:
forcats::fct_lump_n()
to create the "Other" category within each decade, using the w
argument to weight by the frequencies we have counted upforcats::fct_relevel()
to move the newly created "Other___" levels to the beginning of the big overall factorlibrary(tidyverse)
library(babynames)
top_names <-
babynames %>%
filter(between(year, 1950, 1990)) %>%
mutate(decade = (year %/% 10) * 10) %>%
group_by(decade) %>%
count(name, wt = n, sort = TRUE, name = "total") %>%
ungroup()
library(tidytext)
data_for_plot <-
top_names %>%
group_by(decade) %>%
mutate(name = fct_lump_n(name, n = 15, w = total)) %>%
group_by(decade, name) %>%
mutate(total = sum(unique(total))) %>%
ungroup() %>%
distinct(decade, name, total) %>%
mutate(decade = as.factor(decade),
name = reorder_within(name, total, decade),
name = fct_relevel(name, paste0("Other___", unique(decade))))
data_for_plot %>%
ggplot(aes(total, name, fill = decade)) +
geom_col(show.legend = FALSE) +
facet_wrap(~decade, scales = "free_y") +
scale_y_reordered() +
scale_x_continuous(expand = c(0,0))
Created on 2021-08-12 by the reprex package (v2.0.1)