Search code examples
rggplot2p-valueggpubr

Remove p-values for groups with a mean lower than specified value


I was wondering if there was a way to specify which p-values get shown on a ggboxplot from the ggpubr package in R but I have been unsuccessfull in trying to do so.

The only option I have found is to filter the data beforehand but I do not want to exclude those groups when I make my graph. i.e. I want to only show p-values for groups with a mean > my reference group but I still want to see all of the data.

In my example here I would like for the "lo" group to hide its level of significance while still remaining on the graph since its mean is lower than the reference group

library(ggplot2)
library(ggpubr)
library(dplyr)

Plant<-c("ref","ref","ref","ref","ref","ref","ref","ref","ref",
         "hi","hi","hi","hi","hi","hi","hi","hi","hi",
         "lo","lo","lo","lo","lo","lo","lo","lo","lo")
Delta.CCI<-c(11.05,11.45,9.65,10.65,10.15,8.95,8.95,12.45,8.95,
             20.56,20.66,19.76,20.36,20.26,20.06,19.16,19.16,19.06,
             2.18,1.58,2.98,1.11,1.91,0.21,1.68,2.11,0.51)
df<-data.frame(Plant,Delta.CCI)

#generate a list of comparisons for the data frame
cdf<-compare_means(Delta.CCI ~ Plant, 
                   data = df, 
                   ref.group = "ref")
#make a new list
my_list <- vector()
#for every other comparison put a 1 in the list
#this is so I can stagger the p-values later since there are so many in my actual data set
for(i in 1:nrow(cdf)+1){
  if(i %% 2 == 0){
    my_list[i] <- 1
  } else {
    my_list[i] <- 0
  }
}

#Make the box plot
p<-ggboxplot(df,
             title = "A",
             x="Plant",
             y="Delta.CCI")+
  #rotate the text so it is legible
  rotate_x_text(angle = 90)+
  #make a line showing where the mean of the LBA/GFP group is
  geom_hline(yintercept = mean(df$Delta.CCI[df$Plant=="ref"]), linetype = 2)+
  #add out p-values to the plot as significance labels
  stat_compare_means(label="p.signif",label.y=my_list+28,ref.group="ref")
p

Solution

  • One option would be to use stat_pvalue_manual. To this end I use the lower level functions from the rstatix package to create the data.frame of test statistics. To this df you could easily add the y positions without the need of a creating a separate vector. Additonally, via the argument detailed=TRUE wilcox_test allows to add an estimate of the difference in means to conditionally replace unwanted labels with an empty string:

    library(ggplot2)
    library(ggpubr)
    library(dplyr, warn=FALSE)
    library(rstatix)
    
    cdf <- df |>
      wilcox_test(Delta.CCI ~ Plant, ref.group = "ref",
                  detailed = TRUE) |>
      add_significance("p") |>
      mutate(
        y.position = 28 + (row_number() %% 2 == 0),
        p.signif = if_else(estimate < 0, p.signif, "")
      )
    
    p <- ggboxplot(df,
      title = "A",
      x = "Plant",
      y = "Delta.CCI"
    ) +
      rotate_x_text(angle = 90) +
      geom_hline(yintercept = mean(df$Delta.CCI[df$Plant == "ref"]), linetype = 2) +
      stat_pvalue_manual(
        cdf,
        label = "p.signif",
        x = "group2"
      )
    p
    

    enter image description here