Search code examples
rggplot2plotplotlypopulation

Population Pyramid with Gender surplus in R


On Wikipedia, there is a fantastic population pyramid that shows gender surplus. How could I recreate this in R using ggplot2, &/or plotly?

It's essentially a dual-stacked bar plot, which has been oriented by 90 degrees.

# Here is some population data

library(wpp2019)
# Male
data(popM)
# Female
data(popF)

Population Pyramid with Gender surplus

Wikipedia: Demographics of the United States


Solution

  • In the code below most of the work is in the data shaping, while the ggplot code is relatively straightforward.

    library(wpp2019)
    library(tidyverse)
    data(popM)
    data(popF)
    
    list(Male=popM, Female=popF) %>% 
      imap(~.x %>% 
             filter(name=="World") %>% 
             select(age, !!.y:=`2020`)) %>% 
      reduce(full_join) %>% 
      mutate(age = factor(age, levels=unique(age)),
             `Female surplus` = pmax(Female - Male, 0),
             `Male surplus` = pmax(Male - Female, 0),
             Male = Male - `Male surplus`,
             Female = Female - `Female surplus`) %>% 
      pivot_longer(-age) %>%
      mutate(value = case_when(grepl("Male", name) ~ -value, 
                               TRUE ~ value),
             name = factor(name, levels=c("Female surplus", "Female", 
                                          "Male surplus", "Male"))) %>% 
      ggplot(aes(value, age, fill=name)) +
        geom_col() + 
        geom_vline(xintercept=0, colour="white") + 
        scale_x_continuous(label=function(x) ifelse(x < 0, -x, x),
                           breaks=scales::pretty_breaks(6)) +
        labs(x=NULL, y=NULL, fill=NULL) +
        scale_fill_discrete(type=RColorBrewer::brewer.pal(name="RdBu", n=4)[c(1,2,4,3)],
                            breaks=c("Male surplus", "Male", "Female","Female surplus")) +
        theme_bw() +
        theme(legend.position="bottom")
    

    As another option, you can place the vertical axis labels between the bars. This version also uses faceting, so we can easily label the facets by gender. Then in the legend we only need to label the surplus portions of the bars.

    library(ggpol)
    library(ggthemes)
    
    list(Male=popM, Female=popF) %>% 
      imap(~.x %>% 
             filter(name=="World") %>% 
             select(age, !!.y:=`2020`)) %>% 
      reduce(full_join) %>% 
      mutate(age = factor(age, levels=unique(age)),
             `Female surplus` = pmax(Female - Male, 0),
             `Male surplus` = pmax(Male - Female, 0),
             Male = Male - `Male surplus`,
             Female = Female - `Female surplus`) %>% 
      pivot_longer(-age) %>%
      mutate(facet = factor(ifelse(grepl("Female", name), "Female", "Male"),
                            c("Male","Female")),
             value = case_when(grepl("Male", name) ~ -value, 
                               TRUE ~ value),
             name = factor(name, levels=c("Female surplus", "Female", 
                                          "Male surplus", "Male"))) %>% 
      ggplot(aes(value, age, fill=name)) +
        geom_col() + 
        geom_vline(xintercept=0, colour="white") + 
        scale_x_continuous(label=function(x) ifelse(x < 0, -x, x),
                           breaks=scales::pretty_breaks(3),
                           expand=c(0,0)) +
        labs(x=NULL, y=NULL, fill=NULL) +
        facet_share(vars(facet), scales="free_x") +
        scale_fill_discrete(type=RColorBrewer::brewer.pal(name="RdBu", n=4)[c(1,2,4,3)],
                            breaks=c("Male surplus", "Female surplus")) +
        theme_clean() +
        theme(legend.position="bottom",
              legend.background=element_blank(),
              legend.key.height=unit(4,"mm"),
              legend.margin=margin(t=0), 
              plot.background=element_blank(),
              strip.text=element_text(face="bold", size=rel(0.9)))