Search code examples
rggplot2graphtidyverse

Little hack for ggplot -- an easy way to add a text with the real means and standard deviation when using lines or bars


I just want to add some text of real means and sd to my plots when I'm working with one outcome or multiple outcomes. see the pictures below for reference. Code is below. If any updated package handles with that, please let me know. Please note that I want to have the real means and sd (values). Thanks

enter image description here

(real values instead of mean(sd), please) enter image description here

library(tidyverse)
df = data.frame(
  year = c("2019","2020", "2021","2022"),
  math = rnorm(100,10,2),
  science = rnorm(100,5,1)
)
df %>%
  ggplot(., aes(x = year, y = math, 
                group = 1)) +
  stat_summary(geom = "line",
               fun = "mean",
               width=0.2,
               size=1.2) +
  stat_summary(
    geom='errorbar',
    fun = "mean",
    width=0.2,
    size=1.2
  ) +
  theme_bw()
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> Warning in stat_summary(geom = "line", fun = "mean", width = 0.2, size = 1.2):
#> Ignoring unknown parameters: `width`


df %>%
  pivot_longer(-year) %>% 
  ggplot(., aes(x = year, y = value, color = name, group = name)) +
  stat_summary(geom = "line",
               fun = "mean",
               width=0.2,
               size=1.2) +
  stat_summary(
    geom='errorbar',
    fun = "mean",
    width=0.2,
    size=1.2
  ) +
  theme_bw()
#> Warning in stat_summary(geom = "line", fun = "mean", width = 0.2, size = 1.2):
#> Ignoring unknown parameters: `width`

Created on 2024-07-06 with reprex v2.1.0


Solution

  • The code below doesn't rely on ggplot to compute the aggregated statistics, it use dplyr's summarise instead.
    Then geom_* and not stat_summary add the wanted layers to the plot.

    On both plots add a geom_errorbar layer.

    library(tidyverse)
    
    set.seed(2024)
    df <- data.frame(
      year = c("2019","2020", "2021","2022"),
      math = rnorm(100,10,2),
      science = rnorm(100,5,1)
    )
    
    df %>%
      summarise(
        y = mean(math),
        se = sd(math),
        ymin = y - se,
        ymax = y + se,
        .by = year
      ) %>%
      mutate(year = as.integer(as.character(year))) %>%
      ggplot(aes(year, y)) +
      geom_line(linewidth = 1.2) +
      geom_errorbar(aes(ymin = ymin, ymax = ymax), linewidth = 1.2) +
      geom_label(aes(label = paste0(round(y, 1), " (", round(se, 1), ")")), 
                 fill = "yellow", vjust = -0.5) +
      theme_bw()
    

    
    
    df %>%
      pivot_longer(-year) %>% 
      summarise(
        y = mean(value),
        se = sd(value),
        ymin = y - se,
        ymax = y + se,
        .by = c(year, name),
      ) %>%
      mutate(year = as.integer(as.character(year))) %>%
      ggplot(aes(year, y)) +
      geom_line(aes(color = name), linewidth = 1.2) +
      geom_errorbar(aes(ymin = ymin, ymax = ymax, color = name), linewidth = 1.2) +
      geom_label(aes(label = paste0(round(y, 1), " (", round(se, 1), ")")), 
                 fill = "yellow", vjust = -0.5) +
      theme_bw()
    

    Created on 2024-07-07 with reprex v2.1.0