Search code examples
rdata.tableglm

Use glm with data.table and a parametric definition of the predictors and the response


I want to do VIF testing running consecutive regressions within a dataset, each time using one variable as the response and the remaining as predictors.

To that end I will put my code within a for loop which will give consecutive values to the index of the column that will be used as the response and leave the remaining as predictors.

I am going to use the data.table package and I will use the mtcars dataset found in base R to create a reproducible example:

data(mtcars)
setDT(mtcars)
# Let i-- the index of the response -- be 1 for demonstration purposes
i <- 1
variables <- names(mtcars)
response <- names(mtcars)[i]
predictors <- setdiff(variables, response)
model <- glm(mtcars[, get(response)] ~ mtcars[, predictors , with = FALSE], family = "gaussian")

However, this results to an error message:

Error in model.frame.default(formula = mtcars[, get(response)] ~ mtcars[, : invalid type (list) for variable 'mtcars[, predictors, with = FALSE]'

Could you explain the error and help me correct the code?

Your advice will be appreciated.

=============================================================================

Edit:

In reproducing the code suggested I got an error message:

> library(car)
> library(data.table)
> 
> data(mtcars)
> setDT(mtcars)
> model <- glm(formula = mpg ~ .,data=mtcars ,  family = "gaussian")
> vif(model)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘vif’ for signature ‘"glm"’

Update:

The code run without problem when I specified explicitly the package, i.e.:

car::vif(model)

Edit 2

I had to amend Fredrik's code as follows to get the coefficients of all the variables:

rhs <- paste(predictors,  collapse ="+")
full_formula <- paste(response, "~", rhs)
full_formula <- as.formula(full_formula)

Solution

  • Another solution is based on the use of glm.fit:

    model <- glm.fit(x=mtcars[, ..predictors], y=mtcars[[response]], family = gaussian())