Search code examples
rlm

fast method to calculate p-values from .lm.fit()


I am running simulations and fitting linear models with .lm.fit(). Although extremely fast, this function does not provide the predictors' p-values. Is there a fast way to compute them (maybe from the values returned by .lm.fit())? I am aware of this method to calculate approximate p-values, but I need the exact ones.

Update:
Dirk Eddelbuettel provided the fastest way to fit the lm and Ben Bolker a method to compute the p-values, by combining both answers we get:

set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)

mdl <- RcppArmadillo::fastLmPure(X, y)

pval <- 2*pt(abs(mdl$coefficients/mdl$stderr), mdl$df.residual, lower.tail=FALSE)

Solution

  • Dirk's answer will be faster, but in case it's convenient here's the implementation in pure R (extracting the bits you need from summary.lm, and assuming there are no issues with non-full-rank model matrices etc.)

    Example:

    set.seed(101)
    X <- cbind(1,matrix(1:10))
    y <- rnorm(10)
    m <- .lm.fit(X,y)
    

    p-value calcs:

    rss <- sum(m$residuals^2)
    rdf <- length(y) - ncol(X)
    resvar <- rss/rdf
    R <- chol2inv(m$qr)
    se <- sqrt(diag(R) * resvar)
    2*pt(abs(m$coef/se),rdf,lower.tail=FALSE)
    

    Compare with:

    coef(summary(lm(y~X-1)))[,"Pr(>|t|)"]