I have a data frame consisting of 3 columns: Site, Program, Result. Here a minimal repro dataset:
> TP <- data.frame(Site = as.factor(c("Coal", "Coal", "Coal", "Coal", "STP", "STP", "STP", "STP")),
Program = as.factor(c("D", "D", "H", "H", "D", "D", "H", "H")),
Result = c(0.65, 0.58, 0.15, 0.10, 0.55, 0.53, 0.48, 0.49))
> TP
Site Program Result
<fct> <chr> <dbl>
1 Coal D 0.65
2 Coal D 0.58
3 Coal H 0.15
4 Coal H 0.10
5 STP D 0.55
6 STP D 0.53
7 STP H 0.48
8 STP H 0.49
In reality there are 70000 rows, made up of 50 sites and two programs.
I have created a geom_boxplot where the x variable is 'Result' and the y variable is 'Site'. For each site, I have two boxplots that contain data from the two different programs (D and H). The Y-axis is currently sorted by the overall median of a particular site, regardless of the program.
> TP$Site <- reorder(TP$Site, TP$Result, FUN = median)
> ggplot(TP, aes(x = Result, y = Site)) + geom_boxplot(aes(fill = as.factor(Program)), outliers = FALSE)
I am trying to alter the graph so that the Y-axis is in descending order of sites that had the highest median for Program D. I would still like the corresponding boxplot for each site in Program H to be immediately below the boxplot for Program D, I just want the sites ordered by Program D. Some sites only have data from Program D, and I would ideally like them ordered appropriately on the y-axis too, even though they do not have data for Program H.
I have seen many solutions on Stack Overflow using order, reorder or arrange(dplyr). I have tried several of these suggestions with no luck.
I have successfully used 'reorder' (stats) in my existing code to order the data frame by the median of the results, but I cannot replicate that result for multiple inputs and orders. I then attempted to use 'order' (base) to overcome this, but I cannot come up with a solution for multiple orders. I then also attempted to use dplyr solutions, using a combination of group_by, mutate and arrange. I cannot get this to work either.
In the supplied repro dataset, 'STP' has a higher overall median than 'Coal'. But 'Coal' has a higher median for Program D, so I would want 'Coal' to be at the top of the boxplot.
Any help is much appreciated. Let me know if I can provide more info.
One option would be to use an ifelse
to reorder the Site
using only values for D
:
library(ggplot2)
ggplot(
TP,
aes(
x = Result,
y = reorder(
Site,
ifelse(Program == "D", Result, NA),
FUN = median,
na.rm = TRUE
)
)
) +
geom_boxplot(
aes(fill = Program),
outliers = FALSE
)