Search code examples
rplotboxplotp-valueemmeans

R: Add p-values on boxplot/GGPLOT (p-values of emmeans contrasts from "emmeans" function)


I fitted a linear mixed model (with lmer function). "Proportion" as my response variable and "Stage" as a fixed factor (with 5 levels) and Subject as a random factor.

enter image description here

LMM.fit<-lmer(Proportion~Stage+(1|Subject),data=tab)

       Chisq Df Pr(>Chisq)    
Stage 290.07  4  < 2.2e-16 ***

Post hoc pairwise comparisons were carried with the emmeans function to determine where the differences occurred across the Stages:

emmeans(LMM.fit, pairwise ~ Stage)$contrasts

 contrast        estimate     SE     df t.ratio p.value
 Stage1 - Stage2    0.178 0.0505 105087   3.528  0.0038
 Stage1 - Stage3    0.601 0.0363 565787  16.529  <.0001
 Stage1 - Stage4    0.438 0.0496 224258   8.833  <.0001
 Stage1 - Stage5    0.272 0.0497 295762   5.472  <.0001
 Stage2 - Stage3    0.422 0.0514 103631   8.216  <.0001
 Stage2 - Stage4    0.260 0.0615  95267   4.224  0.0002
 Stage2 - Stage5    0.094 0.0620 107696   1.517  0.5516
 Stage3 - Stage4   -0.163 0.0496 229001  -3.279  0.0092
 Stage3 - Stage5   -0.328 0.0491 314661  -6.683  <.0001
 Stage4 - Stage5   -0.166 0.0599 190576  -2.769  0.0446

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 5 estimates

I basically want to add the p-values shown in the emmeans results ON the boxplot shown above (between all the groups two by two in the same figure). I know there is the function stat_pvalue_manual() but I stuggled to know how to use it with emmeans contrasts output


Solution

  • This is what I managed to do (not the most optimized code but it works...)

    source("http://news.mrdwab.com/install_github.R")
    install_github("mrdwab/overflow-mrdwab") 
    install_github("mrdwab/SOfun")
    library(SOfun)
    
    LMM.fit<-lmer(Proportion~Stage+(1|Subject),data=tab)
    
    p.val.test<-pwpm(emmeans(LMM.fit,  ~ Stage),means = FALSE, flip = TRUE,reverse = TRUE)
    p.val.test<-sub("[<>]", "", p.val.test)
    p.matx<-matrix(as.numeric((p.val.test)),nrow = length(p.val.test[,1]),ncol = length(p.val.test[,1])) #if your factor has 5 levels ncol and nrow=5
    rownames(p.matx) <- colnames(p.matx) <-colnames(p.val.test)
    p.matx[upper.tri(p.matx, diag=FALSE)] <- NA
    stat.test<-subset(melt(p.matx),!is.na(value))
    names(stat.test)<-c("group1","group2","p.adj")
    stat.test[stat.test$p.adj<=0.001,"p.adj.signif"]<-"***"
    stat.test[stat.test$p.adj>0.001 & stat.test$p.adj<=0.01,"p.adj.signif"]<-"**"
    stat.test[stat.test$p.adj>0.01 & stat.test$p.adj<=0.05,"p.adj.signif"]<-"*"
    stat.test[ stat.test$p.adj>0.05,"p.adj.signif"]<-"ns"
    stat.test<-mc_tribble(stat.test) # copy & paste the result of this line before the line of ggboxplot!!
    
    stat.test <- tribble(
            ~group1, ~group2, ~p.adj, ~p.adj.signif,
            "Stage2","Stage1",0.0038,"**",
            "Stage3","Stage1",1e-04,"***",
            "Stage4","Stage1",1e-04,"***",
            "Stage5","Stage1",1e-04,"***",
            "Stage3","Stage2",1e-04,"***",
            "Stage4","Stage2",2e-04,"***",
            "Stage5","Stage2",0.5516,"ns",
            "Stage4","Stage3",0.0092,"**",
            "Stage5","Stage3",1e-04,"***",
            "Stage5","Stage4",0.0446,"*")
    
    bxp<-ggboxplot(tab, x = "Stage", y = "Proportion",color = "Stage", palette = "jco",add = "jitter",bxp.errorbar=T) 
    bxp + stat_pvalue_manual(stat.test,
                             y.position = max(tab$Proportion)+sd(tab$Proportion),
                             color = "midnightblue")
    

    enter image description here

    PS: this website helped ("GGPUBR: HOW TO ADD P-VALUES GENERATED ELSEWHERE TO A GGPLOT").