I've just started working with R and trying to find out how to add mean and median labels on a box plot using ggplot.
I have a dataset: Unit, Quarter, # of Days:
dset <- read.table(text='Unit Quarter Days Z
HH 1Q 25 Y
PA 1Q 28 N
PA 1Q 10 Y
HH 1Q 53 Y
HH 1Q 12 Y
HH 1Q 20 Y
HH 1Q 43 N
PA 1Q 11 Y
PA 1Q 66 Y
PA 1Q 54 Y
PA 2Q 19 N
PA 2Q 46 Y
PA 2Q 37 Y
HH 2Q 22 Y
HH 2Q 67 Y
PA 2Q 45 Y
HH 2Q 48 Y
HH 2Q 15 N
PA 3Q 12 Y
PA 3Q 53 Y
HH 3Q 58 Y
HH 3Q 41 N
HH 3Q 18 Y
PA 3Q 26 Y
PA 3Q 12 Y
HH 3Q 63 Y
', header=TRUE)
I need to show data by Unit and Quarter and create a boxplot displaying mean and median values.
My code for a boxplot:
ggplot(data = dset, aes(x = Quarter
,y = Days, fill = Quarter)) +
geom_boxplot(outlier.shape = NA) +
facet_grid(. ~ Unit) + # adding another dimension
coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
geom_text(data = means, aes(label = round(Days, 1), y = Days + 1), size = 3) + #adds average labels
geom_text(data = medians, aes(label = round(Days, 1), y = Days - 0.5), size = 3) + #adds median labels
xlab(" ") +
ylab("Days") +
ggtitle("Days") +
theme(legend.position = 'none')
I can use geom_text function to add mean and median labels but only for one dimension ("Quarter") and it requires calculation of mean and median variables beforehand:
means <- aggregate(Days ~ Quarter, dset, mean)
medians <- aggregate(Days ~ Quarter, dset, median)
It works pretty good and I managed to calculate mean and median values by both "Unit" and "Quarter":
means <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), mean)
medians <- aggregate(dset[, 'Days'], list('Unit' = dset$Unit, 'Quarter' = dset$Quarter), median)
but I do not know how to pass those variables to geom_text function to display lables for the mean and median. Maybe I should calculate mean and median in a different way or there are other options how to add those labels.
Would be grateful for any suggestions!
Looks like the problem is that when you calculate the mean and median values by both "Unit" and "Quarter" the variable the used to be called "Days" is in now called "x". So simply update your geom_text commands to reflect this.
ggplot(data = dset, aes(x = Quarter, y = Days, fill = Quarter)) +
geom_boxplot(outlier.shape = NA) +
facet_grid(. ~ Unit) + # adding another dimension
coord_cartesian(ylim = c(10, 60)) + #sets the y-axis limits
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red", fill="red") + #adds average dot
geom_text(data = means, aes(label = round(x, 1), y = x + 1), size = 3) + #adds average labels
geom_text(data = medians, aes(label = round(x, 1), y = x - 0.5), size = 3) + #adds median labels
xlab(" ") +
ylab("Days") +
ggtitle("Days") +
theme(legend.position = 'none')