Search code examples
rlinear-regressiondata-analysis

In R, how to extract just the significant variables after running a Multiple Regression with a large number of variables


After running a multiple regression in R, the regression summary indicates the significant variables with stars. In a dataset that I am working on there are nearly 2000 variables and the significant variables identified by R includes more than 50 variables. Is there some way I can get the list of the significant variables alone, from the regression summary.


Solution

  • This is an example of why you should not be doing what you ask us to do:

    randf <- as.data.frame(matrix(rnorm(800*400), 800, 400))
    names(randf)[1] <- "Y"
    big.mod <- lm(Y ~ ., data=randf)
    sum( summary(big.mod)$coefficients[ ,4] < 0.05 )
    #[1] 22
    

    So we get 22 significant coefficients (some of them "highly significant") just regressing 400 random variables against another random variable.