Search code examples
rvisualizationp-valueggplot2

How to add p-values (at a y position less than 1) to a ggplot with a logarithmic axis


I am having trouble adding p-values to a ggplot when the axis is logarithmic and the values to be plotted are all well below 1. It seems that no matter where I tell the function to put the p-value, it always puts it at or above 1, which often ruins my scale.

An MRE:

library(ggplot2)
library(ggpubr)

df <- data.frame("group" = rep(c("A", "B", "C", "D", "E"), each = 5), 
                 "value" = exp(seq(-10,-9, length.out = 25)))

stat_df <- ggpubr::compare_means(formula = value ~ group, data = df, method = "wilcox.test")[1:3,]

p <- ggplot(data = df, aes(x = group, y = value)) +
    geom_boxplot() +
    ggpubr::stat_pvalue_manual(data = stat_df, y.position = 1e-4, step.increase = 0) +
    scale_y_continuous(trans = "log10")

plot(p)

which produces:

boxplot of values showing p-value placed at 1, far above the data

As you can see, even though I have told ggpubr to put the p-value at 1e-4, it put it at 1 (1e0) instead. For values above 1, you can just give it the log10 of the value you want to plot it at (e.g. y.position = 11 plots it at 1e11), but if you try to input a value of 0 or a negative value for y.position, it will not show up; specifically, you get the following:

Warning messages:
1: In self$trans$transform(x) : NaNs produced
2: Transformation introduced infinite values in continuous y-axis 
3: Removed 3 rows containing non-finite values (stat_bracket).

I'm open to using other packages to plot p-values, ggpubr::stat_pvalue_manual has just so far been the most flexible and useful for my purposes. The only workaround I have found for this problem is a very hacky solution using the scales::pseudo_log_trans function and some bizarre trial and error results, but that is far from an ideal solution as it produces different axes than a regular log10 transformation.


Solution

  • I have two solutions for you:

    Solution 1

    Play around with the vjust and bracket.nudge.y argument in stat_pvalue_manual to find the optimal values to use. This solution still transform the axis using scale_y_continuous.

    library(ggplot2)
    
    ggplot(data = df, aes(x = group, y = value)) +
      geom_boxplot() +
      ggpubr::stat_pvalue_manual(data = stat_df, y.position = 1, 
                                 step.increase = 0, vjust = 0.1, 
                                 bracket.nudge.y = -4.9, tip.length = 0.001) +
      scale_y_continuous(trans = "log10")
    

    Solution 2

    This solution abandon the use of scale_y_continuous to log transform the axis, where the transformation is carried out on the value itself. Then use scale_y_continuous to format the y-axis into your desired format.

    ggplot(data = df, aes(x = group, y = log10(value))) +
      geom_boxplot() +
      ggpubr::stat_pvalue_manual(data = stat_df, y.position = log10(1e-04)) +
      scale_y_continuous(labels = \(x) formatC(10^x, format = "e", digits = 1))
    

    Created on 2023-01-14 with reprex v2.0.2