Search code examples
rggplot2violin-plot

Graphing individual points on a grouped violin plot


After a good amount of looking, I can't seem to find the exact answer to my question, so I figured I'd ask.

I want to make a grouped violin plot of a timecourse I have with two conditions ("Control RNAi" and "mex-6 RNAi") using ggplot2. Each of the data points comes from 3 different replicates ("Worm" factor in dataframe), so the dataframe format I have looks like this (with "mean_mex6" being the plotted Y value):

mean_mex6 RNAi Time Worm
2.4102356 Control RNAi 2hr worm1
0.8332575 Control RNAi 2hr worm1
2.5093177 Control RNAi 2hr worm1
0.8792359 Control RNAi 2hr worm1
1.2570116 Control RNAi 2hr worm1
1.0671826 Control RNAi 2hr worm1

There are many more lines in the dataframe, but the data I showed you above are just some datapoints that came from "worm1" on "Control RNAi" at the "2hr" timepoint.

I want all the individual points plotted in each RNAi group on the violin plot, but I want them plotted so that every datapoint from each "Worm" sample is a different color from the other worms. I have been able to create a grouped violin plot where all the individual points are plotted, but not color coded for each individual worm sample:

library(ggplot2)

ggplot(compiled_allhours, aes(x=Time, y=mean_mex6, colour=RNAi)) + 
  geom_violin(trim=FALSE) +
  scale_x_discrete(limits=c("2hr", "4hr","6hr", "8hr","24hr")) + ##This chooses which data to plot and orders them
  geom_quasirandom(aes(x=Time, y=mean_mex6, colour = RNAi), dodge.width = 0.9, varwidth = TRUE) +
  ggtitle(expression(paste(italic("mex-6"), " nuclear signal", " - WT"))) +
  theme(plot.title = element_text(hjust = 0.5)) + ##Centers the title of the plot
  xlab("Time") +
  ylab(expression(paste("normalized ", italic("mex-6"), " nuclear signal (A.U.)")))

I included an image of the plot that this makes.[Plot 1] https://i.sstatic.net/yvfYW.png

If I try to color the individual points by worm, this is what happens:

library(ggplot2)

ggplot(compiled_allhours, aes(x=Time, y=mean_mex6, colour=RNAi)) + 
  geom_violin(trim=FALSE) +
  scale_x_discrete(limits=c("2hr", "4hr","6hr", "8hr","24hr")) + ##This chooses which data to plot and orders them
  geom_quasirandom(aes(x=Time, y=mean_mex6, colour = Worm), dodge.width = 0.9, varwidth = TRUE) +
  ggtitle(expression(paste(italic("mex-6"), " nuclear signal", " - WT"))) +
  theme(plot.title = element_text(hjust = 0.5)) + ##Centers the title of the plot
  xlab("Time") +
  ylab(expression(paste("normalized ", italic("mex-6"), " nuclear signal (A.U.)")))

[Plot 2] https://i.sstatic.net/aVJwG.png

So basically, I want the second plot, but with all those points merged into the violin plot as is shown in the first graph. Thanks for your help!


Solution

  • The issue is that different groupings are applied for the geom_violin and the geom_quasirandom in your second plot. Therefore the points and the violins are "dodged" differently and do not align with each other.

    To get your desired result you have to make use to the group aesthetic, i.e. group by both Time and RNAi using e.g. interaction.

    Additionally instead of mapping Worm on color I mapped it on fill (Made more sense to me as Worm and RNai are different variables) and used filled points (shape=21) which gives different legends. But feel free to change that by switching back to color.

    Using a more realistic random dataset including "all" options for the different variables try this:

    set.seed(42)
    compiled_allhours <- data.frame(
      mean_mex6 = runif(200),
      RNAi = sample(c("Control RNAi", "mex-6 RNAi"), 200, replace = TRUE),
      Time = sample(c("2hr", "4hr", "6hr", "8hr", "24hr"), 200, replace = TRUE),
      Worm = sample(c("worm1", "worm2", "worm3"), 200, replace = TRUE)
    )
    library(ggplot2)
    library(ggbeeswarm)
    
    ggplot(compiled_allhours, aes(x=Time, y=mean_mex6, group=interaction(RNAi, Time))) + 
      geom_violin(aes(color = RNAi), trim=FALSE) +
      scale_x_discrete(limits=c("2hr", "4hr","6hr", "8hr","24hr")) + ##This chooses which data to plot and orders them
      geom_quasirandom(aes(x=Time, y=mean_mex6, fill = Worm), shape = 21, color = "transparent", dodge.width = 0.9, varwidth = TRUE) +
      ggtitle(expression(paste(italic("mex-6"), " nuclear signal", " - WT"))) +
      theme(plot.title = element_text(hjust = 0.5)) + ##Centers the title of the plot
      xlab("Time") +
      ylab(expression(paste("normalized ", italic("mex-6"), " nuclear signal (A.U.)")))
    

    Created on 2021-01-04 by the reprex package (v0.3.0)