Unfortunately I am not very experienced in R, but I need to solve a problem, which appears quite difficult to me, but probably is quite easy if one knows how to work with boxplots in R. I would be really grateful if you could help me with this:
I need to add additional horizontal lines or dots in a grouped boxplot diagram for the 10th and 90th percentiles. Besides this, the boxplot should entail the common features such as min, max, the box with the usual 25th percentile, median and 75th percentile and outliers.
I tried to adapt several of the solutions posted here, but none of them works for my case. One promising attempt would be similar to the solution below with writing a function - but I need the median not the mean and besides this I would need to display the 10th and 90th percentile additionally not instead. Also, it is important to group the boxes by the variable Col (see sample code below):
If you could give me some ideas how to solve this, I would be really grateful!
dataset_stack <- structure(list(Col = c("Blue", "Blue", "Blue", "Blue", "Blue",
"Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue",
"Green", "Green", "Green", "Green", "Green", "Green", "Green",
"Green", "Green", "Green", "Green", "Green", "Green", "Green",
"Green", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red",
"Red", "Red", "Red", "Red", "Red", "Red", "Red"), TTC = c(0.9,
0.7, 0, 0.1, 0.1, 0.4, 0.9, 0.8, 0.1, 0, 0.7, 0.2, 0.7, 0.2,
0, 0.8, 0.7, 0.8, 0.9, 0.3, 0.9, 0.8, 0.3, 1, 0.6, 0.4, 0.3,
0.3, 0.3, 0.2, 0.2, 0.7, 0.9, 0.9, 0.6, 0.4, 0.1, 0.4, 0.8, 0,
0.7, 0.4, 0.7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-43L))
bp.vals <- function(x, probs=c(0.1, 0.25, 0.75, .9)) {
r <- quantile(x, probs=probs , na.rm=TRUE)
r = c(r[1:2], exp(mean(log(x))), r[3:4])
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
# Sample usage of the function with the built-in mtcars data frame
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
stat_summary(fun.data=bp.vals, geom="boxplot")
You could use stat_summary()
function and adding a fun()
to indicate specific quantile()
and median
as colored points. If your data contains outliers they would be shown for example in orange color:
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
stat_summary(geom = "point", fun = \(x) quantile(x, 0.1,na.rm=T),shape=16, size=4,color="red")+
stat_summary(geom = "point", fun = \(x) quantile(x, 0.9,na.rm=T),shape=16, size=4,color="blue")+
theme_bw()
If you want to show only e.g, mean
in black, median
in dark red, min
, and max
values with e.g grey
color you could use the function stat_summary()
:
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
stat_summary(fun.y="mean", geom="point", shape=16, size=4, color="black") +
stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
stat_summary(fun.y="min", geom="point", shape=16, size=4, color="grey") +
stat_summary(fun.y="max", geom="point", shape=16, size=4, color="grey") +
theme_bw()
adding all together:
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
stat_summary(fun.y="mean", geom="point", shape=16, size=4, color="black") +
stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
stat_summary(fun.y="min", geom="point", shape=16, size=4, color="grey") +
stat_summary(fun.y="max", geom="point", shape=16, size=4, color="grey") +
stat_summary(geom = "point", fun = \(x) quantile(x, 0.1,na.rm=T),shape=16, size=4,color="red")+
stat_summary(geom = "point", fun = \(x) quantile(x, 0.9,na.rm=T),shape=16, size=4,color="blue")+
theme_bw()