Search code examples
rggplot2meanboxplotmedian

R: Displaying mean and median labels on boxplot ggplot


I've just started working with R and trying to find out how to add mean and median labels on a box plot using ggplot.
I have a dataset: Unit, Quarter, # of Days:

dset <- read.table(text='Unit     Quarter  Days   Z  
HH       1Q      25  Y      
PA       1Q      28  N     
PA       1Q      10  Y     
HH       1Q      53  Y
HH       1Q      12  Y
HH       1Q      20  Y
HH       1Q      43  N
PA       1Q      11  Y
PA       1Q      66  Y
PA       1Q      54  Y      
PA       2Q      19  N
PA       2Q      46  Y
PA       2Q      37  Y
HH       2Q      22  Y      
HH       2Q      67  Y      
PA       2Q      45  Y
HH       2Q      48  Y
HH       2Q      15  N
PA       3Q      12  Y               
PA       3Q      53  Y      
HH       3Q      58  Y
HH       3Q      41  N
HH       3Q      18  Y
PA       3Q      26  Y
PA       3Q      12  Y
HH       3Q      63  Y
                   ', header=TRUE)

I need to show data by Unit and Quarter and create a boxplot displaying mean and median values.
My code for a boxplot:

ggplot(data = dset, aes(x = Quarter
                       ,y = Days, fill = Quarter))  +
  geom_boxplot(outlier.shape = NA) + 
  facet_grid(. ~ Unit) + # adding another dimension
  coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
  stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
  geom_text(data = means, aes(label = round(Days, 1), y = Days + 1), size = 3) + #adds average labels
  geom_text(data = medians, aes(label = round(Days, 1), y = Days - 0.5), size = 3) + #adds median labels
  xlab(" ") +
  ylab("Days") +
  ggtitle("Days") +
  theme(legend.position = 'none')

I can use geom_text function to add mean and median labels but only for one dimension ("Quarter") and it requires calculation of mean and median variables beforehand:

means <- aggregate(Days ~  Quarter, dset, mean)
medians <- aggregate(Days ~  Quarter, dset, median)

It works pretty good and I managed to calculate mean and median values by both "Unit" and "Quarter":

means <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), mean)
medians <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), median)

but I do not know how to pass those variables to geom_text function to display lables for the mean and median. Maybe I should calculate mean and median in a different way or there are other options how to add those labels.
Would be grateful for any suggestions!


Solution

  • Looks like the problem is that when you calculate the mean and median values by both "Unit" and "Quarter" the variable the used to be called "Days" is in now called "x". So simply update your geom_text commands to reflect this.

    ggplot(data = dset, aes(x = Quarter, y = Days, fill = Quarter))  +
      geom_boxplot(outlier.shape = NA) + 
      facet_grid(. ~ Unit) + # adding another dimension
      coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
      stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
      geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
      geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
      xlab(" ") +
      ylab("Days") +
      ggtitle("Days") +
      theme(legend.position = 'none')