Search code examples
rglm

Selecting the statistically significant variables in an R glm model


I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100).

After running my glm and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant?


Solution

  • You can get access the pvalues of the glm result through the function "summary". The last column of the coefficients matrix is called "Pr(>|t|)" and holds the pvalues of the factors used in the model.

    Here's an example:

    #x is a 10 x 3 matrix
    x = matrix(rnorm(3*10), ncol=3)
    y = rnorm(10)
    res = glm(y~x)
    #ignore the intercept pval
    summary(res)$coeff[-1,4] < 0.05