Search code examples
rggplot2t-testviolin-plot

Combine statistical tests with rstatix and color fill per category


I use this code to do violin plots with ggplot2:

Cells1 %>%
  ggplot(aes(x=IR_time, y=AreaShape_Area, fill=IR_time)) +
  geom_violin(col=NA) +
  guides(fill=FALSE) + 
  stat_summary(fun.data=data_summary, col = "black") +
  theme_gray() + 
  ggtitle("Cell area after irradiation (3Gy)") +
  ylab("\nArea (pixels)") + 
  xlab("\nDays after exposure to 3Gy\n") +
  scale_fill_manual(values=wes_palette(n=5, name="Moonrise3")) +
  theme(
    plot.title = element_text(size=16),
    axis.title = element_text(size=12, face="bold"),
    axis.text = element_text(size=12))

That produces this plot:

Violin plot

Now I want to perform a statistic test (here a pairwise t-test) and add the results on the plot. I use the rstatix package. So here's the new code:

stat.test <- Cells1 %>%  pairwise_t_test(AreaShape_Area ~ IR_time, pool.sd=FALSE, p.adjust.method="bonferroni", ref.group="0") %>% 
  add_y_position()

    Cells1 %>%
      ggplot(aes(x=IR_time, y=AreaShape_Area, fill=IR_time)) +
      geom_violin(col=NA) +
      guides(fill=FALSE) + 
      stat_summary(fun.data=data_summary, col = "black") +
      theme_gray() + 
      ggtitle("Cell area after irradiation (3Gy)") +
      ylab("\nArea (pixels)") + 
      xlab("\nDays after exposure to 3Gy\n") +
      scale_fill_manual(values=wes_palette(n=5, name="Moonrise3")) +
      theme(
        plot.title = element_text(size=16),
        axis.title = element_text(size=12, face="bold"),
        axis.text = element_text(size=12)) +
      stat_pvalue_manual(stat.test) 

But it leads to this error:

Error: Aesthetics must be either length 1 or the same as the data (4): fill

Apparently, this comes from the fill=IR_time argument in the aes(). If I replace it by fill="blue" it works, I don't have the error message. But I'd like to have both the colours depending on IR_time and the statistics on the plot.

Do you have an idea about how I can fix it?


Solution

  • I've created a roughly similar data set with the same names:

    set.seed(69)
    
    Cells1 <- data.frame(
                IR_time = factor(rep(c(0, 2, 4, 8, 12), each = 1000)),
                AreaShape_Area = rgamma(5e3, rep((2:6)^1.8, each = 1e3)) * 3e3)
    

    Your code required a function called data_summary, which you didn't include and doesn't seem to be in any common R package that I could find on Google. I'm guessing it's something like this:

    data_summary <- function(x) {
      data.frame(y = mean(x), ymin = mean(x) - sd(x), ymax = mean(x) + sd(x))
    }
    

    You didn't include the packages you are using, so it took a bit of detective work to figure out that we need:

    library(ggplot2)
    library(wesanderson)
    library(rstatix)
    library(ggpubr)
    

    Now we can run your code. As far as I can tell, this should be pretty easy to fix if you change the line stat_pvalue_manual(stat.test) to stat_pvalue_manual(data = stat.test, inherit.aes = FALSE) :

    stat.test <- Cells1 %>%  
      pairwise_t_test(AreaShape_Area ~ IR_time, 
                      pool.sd = FALSE, 
                      p.adjust.method = "bonferroni", 
                      ref.group = "0") %>% 
      add_y_position()
    
    Cells1 %>%
      ggplot(aes(x = IR_time, y = AreaShape_Area, fill = IR_time)) +
      geom_violin(col = NA) +
      guides(fill = FALSE) + 
      stat_summary(fun.data = data_summary, col = "black") +
      theme_gray() + 
      ggtitle("Cell area after irradiation (3Gy)") +
      ylab("\nArea (pixels)") + 
      xlab("\nDays after exposure to 3Gy\n") +
      scale_fill_manual(values = wes_palette(n = 5, name = "Moonrise3")) +
      theme(
        plot.title = element_text(size = 16),
        axis.title = element_text(size = 12, face = "bold"),
        axis.text  = element_text(size = 12)) +
      stat_pvalue_manual(data = stat.test, inherit.aes = FALSE) 
    

    Created on 2020-09-30 by the reprex package (v0.3.0)