Search code examples
rggplot2bar-chartpointerrorbar

Overlay points (and error bars) over bar plot with position_dodge


I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.

I tried the answers here with no success.

The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...

I just want to overlay points and error bars on a bar plot, using ggplot2.

I have a long format data frame that looks like the following:

> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
   scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
   timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
   rep=paste0("rep", rep(1:3, 12)),
   value=runif(36)*100)

I have attempted to get the plot I want the following way:

myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
  ggplot(mydf, aes(cell, value, fill=scientist )) +
  geom_bar(stat="identity", position=position_dodge(.9)) +
  geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
  facet_grid(timepoint~., scales="free_x", space="free_x") +
  scale_y_continuous("% of total cells") +
  scale_fill_manual(values=myPal) +
  scale_color_manual(values=myPal2)
)
dev.off()

But I obtain this:

example

The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).

Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...

Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.

The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).

Any idea how to...

  • ... have the "rep" value points appear in proper order?
  • ... change the value shown by the bars from max to median?
  • ... add error bars with max and min values?

Solution

  • I restructured your plotting code a little to make things easier. The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.

    When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.

    p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
      geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
      geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
      facet_grid(timepoint~., scales="free_x", space="free_x") +
      scale_y_continuous("% of total cells") +
      scale_fill_brewer(palette = "Set2")+
      scale_color_brewer(palette = "Set1")
    
    ggsave(filename = outfile, plot=p, height = 10, width = 10)
    

    gives: enter image description here

    Regarding error bars

    Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.

    ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
      geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
      geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
      geom_errorbar(stat="summary",position=position_dodge())+
      facet_grid(timepoint~., scales="free_x", space="free_x") +
      scale_y_continuous("% of total cells") +
      scale_fill_brewer(palette = "Set2")+
      scale_color_brewer(palette = "Set1")
    

    gives

    enter image description here

    Update after comment

    As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.

    ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
      geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
      geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
      facet_grid(timepoint~., scales="free_x", space="free_x") +
      scale_y_continuous("% of total cells") +
      scale_fill_brewer(palette = "Set2")+
      scale_color_brewer(palette = "Set1")
    

    enter image description here