Search code examples
rstatisticsnormal-distributiongenetics

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?


In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example:

z=40
pvalue = 2*pnorm(abs(z), lower.tail = F)

This gives me a zero instead of a very small value which is very significant.


Solution

  • The inability to handle p-values less than about 10^(-308) (.Machine$double.xmin) is not really R's fault, but is rather a generic limitation of any computational system that uses double precision (64-bit) floats to store numeric information.

    It's not hard to solve the problem by computing on the log scale, but you can't store the result as a numeric value in R; instead, you need to store (or print) the result as a mantissa plus exponent.

    pvalue.extreme <- function(z) {
       log.pvalue <- log(2) + pnorm(abs(z), lower.tail = FALSE, log.p = TRUE)
       log10.pvalue <- log.pvalue/log(10) ## from natural log to log10
       mantissa <- 10^(log10.pvalue %% 1)
       exponent <- log10.pvalue %/% 1
       ## or return(c(mantissa,exponent))
       return(sprintf("p value is %1.2f times 10^(%d)",mantissa,exponent))
    }
    

    Test with a not-too-extreme case:

    pvalue.extreme(5)
    ## [1] "p value is 5.73 times 10^(-7)"
    2*pnorm(5,lower.tail=FALSE)
    ## [1] 5.733031e-07
    

    More extreme:

    pvalue.extreme(40)
    ## [1] "p value is 7.31 times 10^(-350)"
    

    There are a variety of packages that handle extremely large/small numbers with extended precision in R (Brobdingnag, Rmpfr, ...) For example,

    2*Rmpfr::pnorm(mpfr(40, precBits=100), lower.tail=FALSE, log.p = FALSE)
    ## 1 'mpfr' number of precision  100   bits 
    ## [1] 7.3117870818300594074979715966414e-350
    

    However, you will pay a big cost in computational efficiency and convenience for working with an arbitrary-precision system.