Search code examples
rggplot2significance

Median statistical difference in ggplot


I have a ggplot boxplot like this one:

library(ggplot2)
data(iris)
ggplot(iris, aes(x = "", y = Sepal.Width)) +
    geom_boxplot()

As you can see the median is 3. Say the real value is 3.8 what I would like to know is if there's a statistical difference among the real value 3.8 and the observed value of 3, so what statistical difference method should I use? Can I implement this in R. Also is it possible to plot the real value of 3.8 in the plot?

Thx!

PS: I´m using the iris dataset as an easily reproducible example for my real data.


Solution

  • You are looking for a one-sample Wilcoxon signed rank test:

    wilcox.test(iris$Sepal.Width, mu = 3.8)
    #> 
    #>  Wilcoxon signed rank test with continuity correction
    #> 
    #> data:  iris$Sepal.Width
    #> V = 113, p-value < 2.2e-16
    #> alternative hypothesis: true location is not equal to 3.8
    

    You can add a horizontal line to the boxplot with geom_hline and a text annotation with geom_text

    ggplot(iris, aes(x = "", y = Sepal.Width)) +
      geom_boxplot() + 
      geom_hline(aes(yintercept=3.8), linetype = 2) +
      geom_text(aes(label = "True median", x = 0.5, y = 3.9))
    

    enter image description here