Search code examples
rboxplott-test

Wrong results in calculating boxplot significance levels in R


I am working on creating a basic boxplot with significance level bars (as showed here: How to draw the boxplot with significant level?)

The data I use is as follows:

title1 title2 value
1 A 8.88
2 A 5.84
3 A 13.28
4 A 16.89
1 B 21.39
2 B 20.77
3 B 28.03
4 B 19.78
1 C 28.89
2 C 35.41
3 C 37.47
4 C 50.11
1 D 50.84
2 D 53.21
3 D 46.47
4 D 45.03

With the following code, creating the boxplot works fine. For the significance bars, I want to use paired t tests like title2=A vs. title2=B, with the two rows with title1=1 being a pair and so on.

In R, I entered the following command, but it yields different results for p. For instance, the p-value for A vs. D should be 0.003, but R yields 2.8e-05. What is the correct syntax for a paired t test?


library(ggplot2)
library(ggsignif)

ggplot(bxp, aes(y=value,x=title2)) +
xlab("Behandlung") + 
scale_x_discrete(labels=c("Kontrolle","Stretch","Hyperoxie","Stretch & Hyperoxie")) + 
ylab("Zelluläre Seneszenz (%)") + theme_classic() + 
geom_boxplot(coef = Inf) + 
geom_signif(comparisons=list(c("A","B"),c("A","C"),c("A","D")), test=t.test, map_signif_level=FALSE, step_increase=0.08)

Thanks!


Solution

  • ggsignif is computing the unpaired t-test, and I think you want the paired test. Luckily geom_signif has a test.args argument, which will allow you to pass paired = TRUE to the geom:

    ggplot(bxp, aes(y=value,x=title2)) +
      xlab("Behandlung") + 
      scale_x_discrete(labels=c("Kontrolle","Stretch","Hyperoxie","Stretch & Hyperoxie")) + 
      ylab("Zelluläre Seneszenz (%)") + theme_classic() + 
      geom_boxplot(coef = Inf) + 
      geom_signif(comparisons=list(c("A","B"),c("A","C"),c("A","D")), test=t.test, test.args = list(paired = T), map_signif_level=FALSE, step_increase=0.08)
    

    Data:

    bxp <- structure(list(title1 = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
                                    1, 2, 3, 4), title2 = c("A", "A", "A", "A", "B", "B", "B", "B", 
                                                            "C", "C", "C", "C", "D", "D", "D", "D"), value = c(8.88, 5.84, 
                                                                                                               13.28, 16.89, 21.39, 20.77, 28.03, 19.78, 28.89, 35.41, 37.47, 
                                                                                                               50.11, 50.84, 53.21, 46.47, 45.03)), row.names = c(NA, -16L), class = c("tbl_df", 
                                                                                                                                                                                       "tbl", "data.frame"))
    

    enter image description here