I am running simulations and fitting linear models with .lm.fit()
. Although extremely fast, this function does not provide the predictors' p-values. Is there a fast way to compute them (maybe from the values returned by .lm.fit()
)? I am aware of this method to calculate approximate p-values, but I need the exact ones.
Update:
Dirk Eddelbuettel provided the fastest way to fit the lm and Ben Bolker a method to compute the p-values, by combining both answers we get:
set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)
mdl <- RcppArmadillo::fastLmPure(X, y)
pval <- 2*pt(abs(mdl$coefficients/mdl$stderr), mdl$df.residual, lower.tail=FALSE)
Dirk's answer will be faster, but in case it's convenient here's the implementation in pure R (extracting the bits you need from summary.lm
, and assuming there are no issues with non-full-rank model matrices etc.)
Example:
set.seed(101)
X <- cbind(1,matrix(1:10))
y <- rnorm(10)
m <- .lm.fit(X,y)
p-value calcs:
rss <- sum(m$residuals^2)
rdf <- length(y) - ncol(X)
resvar <- rss/rdf
R <- chol2inv(m$qr)
se <- sqrt(diag(R) * resvar)
2*pt(abs(m$coef/se),rdf,lower.tail=FALSE)
Compare with:
coef(summary(lm(y~X-1)))[,"Pr(>|t|)"]