Search code examples
rggplot2boxplot

Adding compact letter display to a geom_boxplot


I would like to add a compact letter display result to a geom_boxplot. The CLD is generated from a post-hoc dunn test. I'd like to add the letters above the whiskers, the same as would be done with error bars on a geom_bar (e.g.,

`geom_errorbar(aes(ymin = mean_result - se_result, ymax = mean_result + se_result, 
position = position_dodge(.9), width=.1, linewidth = .5, col = 'black')`). 

I've a look at this but it doesn't work with my dataset.

Here's some dummy code

 # SOME DUMMY DATA
 
df <- data.frame (Treatment = as.factor(rep (c('A', 'B', 'C', 'D'), each = 48)), result 
= rnorm(192, mean = c(0.8, 0.3, 0.65, 1), sd = c(0.02, 0.07, 0.9, 0.3)))
 
 # PERFORM POST-HOC
 kw_test <-dunnTest(result ~ Treatment , data = df, method="bonferroni",kw=TRUE)$res
 
 
 # ASSIGN LETTERS
 kw_let<- cldList(P.adj ~ Comparison, data = kw_test,threshold  = 0.05)
 letters <- data.frame(Treatment = unique(df$Treatment), Letter = kw_let[2])



 p <- ggplot(df, aes(x = Treatment, y = result ))+
  geom_boxplot()+
  geom_jitter()+
  coord_cartesian(ylim = c(-1,3))

Heres' where I get stuck. I need to work out the whisker height for each box and then add the CLD results above each one, the same distance above the top of the whisker. The following code was supposed to work out the whisker height for each box, but it comes back the same. I also get an error when I try to add the geom_text

top_whiskers  <- data.frame(df %>%
                          group_by(Treatment) %>%
                          summarise(top_whisker = quantile(df$result, 0.75) + 1.5 * 

IQR(df$result)))

letters$whisker <- top_whiskers


p +  geom_text(data=kw_let, aes(x=Treatment, y = whisker.top_whisker , label = 
Letter),vjust= -.5)

Any help greatly appreciated


Solution

  • One option would be to use boxplot.stats to get the extreme value of the whisker:

    library(ggplot2)
    
    names(kw_let)[1] <- "Treatment"
    
    letters <- kw_let |>
      merge(
        aggregate(
          result ~ Treatment,
          data = df, FUN = \(x) boxplot.stats(x)$stats[5]
        )
      )
    
    ggplot(df, aes(x = Treatment, y = result)) +
      geom_boxplot() +
      geom_jitter() +
      geom_text(
        data = letters,
        aes(label = Letter),
        vjust = 0,
        nudge_y = .1,
        color = "red"
      ) +
      coord_cartesian(ylim = c(-1, 3))
    

    Or using dplyr:

    letters <- kw_let |>
      dplyr::left_join(
        dplyr::summarise(
          df,
          result = boxplot.stats(result)$stats[5], .by = Treatment
        )
      )
    

    DATA

    # SOME DUMMY DATA
    set.seed(123)
    
    df <- data.frame(
      Treatment = as.factor(rep(c("A", "B", "C", "D"), each = 48)),
      result = rnorm(192, mean = c(0.8, 0.3, 0.65, 1), sd = c(0.02, 0.07, 0.9, 0.3))
    )
    
    # PERFORM POST-HOC
    kw_test <- FSA::dunnTest(result ~ Treatment,
      data = df,
      method = "bonferroni", kw = TRUE
    )$res
    
    # ASSIGN LETTERS
    kw_let <- rcompanion::cldList(P.adj ~ Comparison,
      data = kw_test, threshold = 0.05
    )