Search code examples
rggplot2geom-text

Adding data labels to geom_bar when using proportions


I had previously asked about using geom_bar to plot the proportions of a binary variable.

This solution worked well. Now I need to add data labels to the plot. However, geom_text requires a y and label, and I am not sure how to do that using the following syntax

df %>%
  mutate(Year = as.character(Year),
         Remission = as.factor(Remission)) %>%
  ggplot(aes(x=Year, fill = Remission)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels=scales::percent) +
  labs(y = "Proportion")

Is it possible to add data labels to this kind of stacked bar chart?

Secondary question: given that it is a proportion, the top and bottom labels provide the same information. Is it possible to only label the "lower" bar?


Solution

  • Although ggplot is good at performing common summary operations, people sometimes tie themselves in knots trying to get ggplot to do data wrangling that is actually straightforward to do en route to ggplot. Simply create the proportions and labels as columns in the data you are passing.

    library(tidyverse)
    
    df %>%
      mutate(Year = as.character(Year),
             Remission = as.factor(Remission)) %>%
      group_by(Year, Remission) %>%
      count() %>%
      group_by(Year) %>%
      mutate(Proportion = n/sum(n), 
             label = ifelse(Remission == 1, scales::percent(Proportion), "")) %>%
      ggplot(aes(x = Year, y = Proportion, fill = Remission)) +
      geom_col() +
      geom_text(position = position_fill(vjust = 0.5), aes(label = label),
                size = 7) +
      scale_y_continuous(labels=scales::percent) +
      labs(y = "Proportion") +
      scale_fill_brewer(palette = "Pastel1") +
      theme_minimal(base_size = 20)
    

    enter image description here


    Data from previous question in reproducible format

    df <- structure(list(Client_id = c(2L, 4L, 7L, 8L, 12L), Year = c(2016L, 
    2017L, 2017L, 2016L, 2016L), Remission = c(0L, 1L, 0L, 1L, 1L
    )), class = "data.frame", row.names = c(NA, -5L))