I'm trying to make a bar plot with my data. My data looks like this:
df <- data.frame("sampleID" = c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
"Type" = c("A","B","E","A","B","C","D","A","C","D","F","B","C","E","F"),
"Frequency" = c(10,2,1,5,7,1,6,8,4,3,1,6,5,2,6))
I've plotted this using:
ggplot(data=df, aes(x=sampleID, y=Frequency, fill=Type)) +
geom_bar(stat="identity")
Now I'm trying to figure out how to only leave most two frequent Type
s and Type E
stacks in each of the sample ID
bars, and make the rest of them Other
s.
(For instance, for sampleID == 1
, A
and B
are the two most frequent Type
s and E
is the remaining one, so all three stacks will show up in my final plot. But for sampleID == 2
, B
and D
are the two most frequent Type
s and there's no E
, so B
and D
stacks will show up in my final plot, while A
and C
will be converted to Other
. Or for sampleID == 4
, B
, E
and F
stacks will stay in my final plot, while C
will be converted to Other
.)
I've found other examples of using slice
to keep the top frequencies but I couldn't figure out how to apply that to each sampleID
, not across the whole df
, and I couldn't find any examples of explicitly forcing geom_bar
to show a specific stack. Can anyone provide any suggestions?
If you define this little helper function:
top2_or_e <- function(x) {
x$Type[-unique(c(order(-x$Frequency)[1:2], which(x$Type == "E")))] <- "other"
x
}
Then you can do:
library(ggplot2)
library(RColorBrewer)
ggplot(data = do.call(rbind, lapply(split(df, df$sampleID), top2_or_e)),
aes(x = sampleID, y = Frequency, fill = Type)) +
geom_col(color = "gray50") +
scale_fill_manual(values = c(brewer.pal(6, "Pastel1"), "gray50")) +
theme_bw()