give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
box_z_nl <- ggplot(nl_data, aes(x = unique_word, y = z_score, fill = subreddit, color = type)) +
geom_boxplot() +
scale_y_continuous(trans = scales::pseudo_log_trans(base = 10)) +
ggtitle("English-Speaking Subreddits") +
xlab("Hedge/Booster") +
ylab("Z Score (log10)") +
coord_flip() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_color_manual(values = custom_palette) +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
Output
Current output. I would like the counts to be displayed somewhere readable, for example to the right of the furthest outlier. How could I achieve this?
"Somewhere readable" is completely under your control, and you are currently placing the text in the middle of the boxplot.
I'll mimic this with mtcars
.
ggplot(mtcars, aes(x = factor(cyl), y = disp, fill = factor(gear), color = factor(am))) +
geom_boxplot() +
coord_flip() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position = position_dodge(width = 0.75))
give.n2 <- function(z) c(y=max(z) + 30, label=length(z))
... +
stat_summary(fun.data = give.n2, geom = "text", fun.y = median, position = position_dodge(width = 0.75))
"How much" to scale depends on your data. I realize you're using 1.05
in an attempt to do this programmatically, but unfortunately that will yield different shift amounts per-group, whereas aesthetically it is more consistent to shift the same amount per-group. Additionally, shifting from the "middle" of the data is also problematic, usually overlapping; shifting from one end should never overlap.
-Inf
), also adding hjust=
:give.n3 <- function(z) c(y=Inf, label=length(z))
... +
stat_summary(fun.data = give.n3, geom = "text", fun.y = median, position = position_dodge(width = 0.75),
hjust = 1.1)
In the extreme (red boxplot for cyl==8
), larger numbers may overlap part of the boxplot, perhaps mitigated by adjusting the y
limits.