Search code examples
rsasboxplottukey

Adding Comparison Annotations to Boxplots


Boxplot with Tukey Comparison Bars

I had run an ANOVA test and found that at least one of the means among my groups were statistically significantly different. As follow-up analysis, I ran a Tukey test and found which groups differed in means, and I'm hoping to illustrate this in a boxplot as presented in this paper, Effects of intermittent Pringle's manoeuvre on cirrhotic compared with normal liver: https://academic.oup.com/bjs/article/97/7/1062/6150536?login=true

I can generate the boxplot, but I want to add bar(s) that illustrates which groups significantly differ in mean with the asterik, as highlighted in the image. Anyone know how I could approach this, ideally in SAS or R?

In SAS, I've used PROC SGPLOT to generate the boxplot, and in R, I know I can use geom_boxplot, but as for any additional annotations, I'm not sure what options are available to accomplish this.


Solution

  • In Base R you can generate these lines manually if you want more granular control, or for a quick and seemingly automated approach use ggstatsplot::ggbetweenstats (among other approaches, I'm sure):

    Data

    df <- data.frame(DAST = 1:300,
                     Category = rep(c("Normal", "Chronic Hepatitis", "Liver Cirrhosis"), each = 100))
    

    ggbetweenstats approach

    see ?ggstatsplot::ggbetweenstats for a wide range of options on how to customize

    library(ggplot2)
    library(ggstatsplot)
    
    ggstatsplot::ggbetweenstats(df, x = Category, y = DAST)
    

    enter image description here

    Base R approach

    Colored lines for clarity

    # vertical spacing between bars
    v_spacing <- c(max(df$DAST) + seq(20, 50, length.out = 3))
    
    plot(x = as.factor(df$Category), y = df$DAST,
         xlab = NA, ylab = "D-AST", frame = FALSE)
    # horizontal lines - position 1 = Chronic Hepatitis, 2 = Liver Cirrhosis, 3 = Normal
      # bars map between positions 1-2, 1-3, 2-3
    segments(x0 = c(1,1,2), 
             x1 = c(2,3,3),
             y0 = v_spacing,
             xpd = TRUE,
             col = c("red", "green", "blue"))
    
    # vertical lines
    segments(x0 = c(1, 2, 1, 3, 2, 3), 
             y0 = rep(v_spacing, each = 2),
             y1 = rep(v_spacing, each = 2) - 5,
             xpd = TRUE,
             col = rep(c("red", "green", "blue"), each = 2))
    
    # Denote significance
    text("*", 
         x = c(mean(1:2), mean(c(1,3)), mean(2:3)),
         y = v_spacing + 5,
         xpd = TRUE,
         col = c("red", "green", "blue"))
    

    enter image description here