Search code examples
rggplot2boxplot

Positioning counts of boxplots in ggplot2


Current Code

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
}

box_z_nl <- ggplot(nl_data, aes(x = unique_word, y = z_score, fill = subreddit, color = type)) +
  geom_boxplot() +
  scale_y_continuous(trans = scales::pseudo_log_trans(base = 10)) +
  ggtitle("English-Speaking Subreddits") +
  xlab("Hedge/Booster") +
  ylab("Z Score (log10)") +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  scale_color_manual(values = custom_palette) +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median,
               position = position_dodge(width = 0.75))

Output

Current output. I would like the counts to be displayed somewhere readable, for example to the right of the furthest outlier. How could I achieve this?


Solution

  • "Somewhere readable" is completely under your control, and you are currently placing the text in the middle of the boxplot.

    I'll mimic this with mtcars.

    ggplot(mtcars, aes(x = factor(cyl), y = disp, fill = factor(gear), color = factor(am))) +
      geom_boxplot() +
      coord_flip() +
      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
      stat_summary(fun.data = give.n, geom = "text", fun.y = median, position = position_dodge(width = 0.75))
    

    original grob

    Option 1: use an offset instead of a scale:

    give.n2 <- function(z) c(y=max(z) + 30, label=length(z))
    ... +
      stat_summary(fun.data = give.n2, geom = "text", fun.y = median, position = position_dodge(width = 0.75))
    

    offset

    "How much" to scale depends on your data. I realize you're using 1.05 in an attempt to do this programmatically, but unfortunately that will yield different shift amounts per-group, whereas aesthetically it is more consistent to shift the same amount per-group. Additionally, shifting from the "middle" of the data is also problematic, usually overlapping; shifting from one end should never overlap.

    Option 2: put on the far-right (or far-left with -Inf), also adding hjust=:

    give.n3 <- function(z) c(y=Inf, label=length(z))
    ... +
      stat_summary(fun.data = give.n3, geom = "text", fun.y = median, position = position_dodge(width = 0.75),
                   hjust = 1.1)
    

    updated, right-side

    In the extreme (red boxplot for cyl==8), larger numbers may overlap part of the boxplot, perhaps mitigated by adjusting the y limits.