Search code examples
rggplot2dplyr

Add percentage labels to geom_col()


Working with the Pima Indians data set. Goal is to plot diabetes (yes or no) against each of the features. Then provide total number and percentage in the barchart.

Here is the head of the data:

> head(MASS::Pima.te, n = 10)
   npreg glu bp skin  bmi   ped age type
1      6 148 72   35 33.6 0.627  50  Yes
2      1  85 66   29 26.6 0.351  31   No
3      1  89 66   23 28.1 0.167  21   No
4      3  78 50   32 31.0 0.248  26  Yes
5      2 197 70   45 30.5 0.158  53  Yes
6      5 166 72   19 25.8 0.587  51  Yes
7      0 118 84   47 45.8 0.551  31  Yes
8      1 103 30   38 43.3 0.183  33   No
9      3 126 88   41 39.3 0.704  27   No
10     9 119 80   35 29.0 0.263  29  Yes

The data is 332 rows and 8 columns.

All is going well up to the percentage section. Two errors:

  1. The math to calculate percentage is incorrect
  2. The color for the percentage value is incorrect, I'd like the text color to be white.
MASS::Pima.te |>
  dplyr::mutate(dplyr::across(-type, as.numeric)) |>
  tidyr::pivot_longer(-type, names_to = "var", values_to = "value") |>
  dplyr::summarise(value = sum(value), percentage = round(sum(value) / nrow(MASS::Pima.te), 2), .by = c(type, var)) |>
  ggplot2::ggplot(ggplot2::aes(x = type, y = value)) +
  ggplot2::geom_col() +
  ggplot2::geom_text(
    ggplot2::aes(label = value), vjust = -.2
  ) +
  ggplot2::geom_text(
    ggplot2::aes(label = paste0(percentage,"%"), vjust = 3, color = "white"
  )) +
  ggplot2::scale_y_continuous(expand = c(0, 0, .2, 0)) +
  ggplot2::facet_wrap(~var, scales = "free") +
  ggplot2::labs(title = "Numerical values against y")

This is what it looks like up to this point:enter image description here


Solution

  • color should not be in inside aes(). Summarise the sum for YES/NO before calculating the percentage

    library(tidyverse)
    
    MASS::Pima.te %>% 
      pivot_longer(!type) %>% 
      summarise(across(value, sum), .by = c(type, name)) %>% 
      mutate(perc = proportions(value), .by = c(name)) %>% 
      ggplot(aes(x = type, y = value)) + 
      geom_col() + 
      geom_text(aes(label = value), 
                vjust = -.5) + 
      geom_text(aes(label = scales::percent(perc),
                    vjust = 1.5),
                color = "white") + 
      facet_wrap(~ name, scales = "free") +
      scale_y_continuous(expand = expansion(mult = c(0.1, 0.25)))
    

    enter image description here