On Wikipedia, there is a fantastic population pyramid that shows gender surplus. How could I recreate this in R using ggplot2, &/or plotly?
It's essentially a dual-stacked bar plot, which has been oriented by 90 degrees.
# Here is some population data
library(wpp2019)
# Male
data(popM)
# Female
data(popF)
In the code below most of the work is in the data shaping, while the ggplot code is relatively straightforward.
library(wpp2019)
library(tidyverse)
data(popM)
data(popF)
list(Male=popM, Female=popF) %>%
imap(~.x %>%
filter(name=="World") %>%
select(age, !!.y:=`2020`)) %>%
reduce(full_join) %>%
mutate(age = factor(age, levels=unique(age)),
`Female surplus` = pmax(Female - Male, 0),
`Male surplus` = pmax(Male - Female, 0),
Male = Male - `Male surplus`,
Female = Female - `Female surplus`) %>%
pivot_longer(-age) %>%
mutate(value = case_when(grepl("Male", name) ~ -value,
TRUE ~ value),
name = factor(name, levels=c("Female surplus", "Female",
"Male surplus", "Male"))) %>%
ggplot(aes(value, age, fill=name)) +
geom_col() +
geom_vline(xintercept=0, colour="white") +
scale_x_continuous(label=function(x) ifelse(x < 0, -x, x),
breaks=scales::pretty_breaks(6)) +
labs(x=NULL, y=NULL, fill=NULL) +
scale_fill_discrete(type=RColorBrewer::brewer.pal(name="RdBu", n=4)[c(1,2,4,3)],
breaks=c("Male surplus", "Male", "Female","Female surplus")) +
theme_bw() +
theme(legend.position="bottom")
As another option, you can place the vertical axis labels between the bars. This version also uses faceting, so we can easily label the facets by gender. Then in the legend we only need to label the surplus portions of the bars.
library(ggpol)
library(ggthemes)
list(Male=popM, Female=popF) %>%
imap(~.x %>%
filter(name=="World") %>%
select(age, !!.y:=`2020`)) %>%
reduce(full_join) %>%
mutate(age = factor(age, levels=unique(age)),
`Female surplus` = pmax(Female - Male, 0),
`Male surplus` = pmax(Male - Female, 0),
Male = Male - `Male surplus`,
Female = Female - `Female surplus`) %>%
pivot_longer(-age) %>%
mutate(facet = factor(ifelse(grepl("Female", name), "Female", "Male"),
c("Male","Female")),
value = case_when(grepl("Male", name) ~ -value,
TRUE ~ value),
name = factor(name, levels=c("Female surplus", "Female",
"Male surplus", "Male"))) %>%
ggplot(aes(value, age, fill=name)) +
geom_col() +
geom_vline(xintercept=0, colour="white") +
scale_x_continuous(label=function(x) ifelse(x < 0, -x, x),
breaks=scales::pretty_breaks(3),
expand=c(0,0)) +
labs(x=NULL, y=NULL, fill=NULL) +
facet_share(vars(facet), scales="free_x") +
scale_fill_discrete(type=RColorBrewer::brewer.pal(name="RdBu", n=4)[c(1,2,4,3)],
breaks=c("Male surplus", "Female surplus")) +
theme_clean() +
theme(legend.position="bottom",
legend.background=element_blank(),
legend.key.height=unit(4,"mm"),
legend.margin=margin(t=0),
plot.background=element_blank(),
strip.text=element_text(face="bold", size=rel(0.9)))