Search code examples
rggplot2geom-bar

geom_bar only show specific stacks of interest


I'm trying to make a bar plot with my data. My data looks like this:

df <- data.frame("sampleID" = c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), 
                 "Type" = c("A","B","E","A","B","C","D","A","C","D","F","B","C","E","F"), 
                 "Frequency" = c(10,2,1,5,7,1,6,8,4,3,1,6,5,2,6))

I've plotted this using:

ggplot(data=df, aes(x=sampleID, y=Frequency, fill=Type)) +
   geom_bar(stat="identity")

which gives me this plot: enter image description here

Now I'm trying to figure out how to only leave most two frequent Types and Type E stacks in each of the sample ID bars, and make the rest of them Others.

(For instance, for sampleID == 1, A and B are the two most frequent Types and E is the remaining one, so all three stacks will show up in my final plot. But for sampleID == 2, B and D are the two most frequent Types and there's no E, so B and D stacks will show up in my final plot, while A and C will be converted to Other. Or for sampleID == 4, B, E and F stacks will stay in my final plot, while C will be converted to Other.)

I've found other examples of using slice to keep the top frequencies but I couldn't figure out how to apply that to each sampleID, not across the whole df, and I couldn't find any examples of explicitly forcing geom_bar to show a specific stack. Can anyone provide any suggestions?


Solution

  • If you define this little helper function:

    top2_or_e <- function(x) {
       x$Type[-unique(c(order(-x$Frequency)[1:2], which(x$Type == "E")))] <- "other"
       x
    }
    

    Then you can do:

    library(ggplot2)
    library(RColorBrewer)
    
    ggplot(data = do.call(rbind, lapply(split(df, df$sampleID), top2_or_e)),
           aes(x = sampleID, y = Frequency, fill = Type)) +
       geom_col(color = "gray50") +
      scale_fill_manual(values = c(brewer.pal(6, "Pastel1"), "gray50")) +
      theme_bw()
    

    enter image description here