Search code examples
rggplot2comparisonboxplot

Display the statistical test + the number of individuals (n = ) with stat_compare_means()


I currently use stat_compare_means() to display the comparison test on a boxplot with ggplot2. However, I would like the function to display the number of individuals (n = ) of the samples between the name of the test and the given p-value. Is there a way to add that information into stat_compare_means() ? I could easily display it with geom_text() or annotate(), but it isn't quite convenient since I want it to be written right before the p-value.

data(iris)
ggplot(iris, aes(x=Species, y=Sepal.Length)) +
  geom_boxplot() +               
  geom_jitter(width = 0.1, shape=21, colour="black", fill="grey95", stroke=0.5, size=2) +
  stat_compare_means(label.y.npc = 0.93, method = "kruskal.test", size = 5, family = "serif") +
  theme_classic()

enter image description here


Solution

  • We could have the total n by adjusting the label aesthetic of stat_compare_means:

    ggplot(iris, aes(x=Species, y=Sepal.Length)) +
      geom_boxplot() +               
      geom_jitter(width = 0.1, shape = 21, colour = "black", fill = "grey95", 
                  stroke = 0.5, size = 2) +
      stat_compare_means(label.y.npc = 0.93, method = "kruskal.test", 
                         size = 5, family = "serif",
                         aes(label = paste0(after_stat(method), 
                                            ', n =', nrow(iris),
                                            after_stat(p.format)))) +
      theme_classic()
    

    enter image description here

    Personally, I think it makes more sense to label the x axis with the number of observations in each group:

    library(tidyverse)
    
    iris %>%
      group_by(Species) %>%
      mutate(Species = paste0(Species, '\n(n = ', n(), ')')) %>%
      ggplot(aes(x = Species, y = Sepal.Length)) +
      geom_boxplot() +               
      geom_jitter(width = 0.1, shape = 21, colour = "black", fill = "grey95", 
                  stroke = 0.5, size = 2) +
      stat_compare_means(label.y.npc = 0.93, method = "kruskal.test", 
                         size = 5, family = "serif") +
      theme_classic(base_size = 16, base_family = 'serif')
    

    enter image description here