Search code examples
rlogistic-regression

Using all variables of a data.frame in logistic regression


I am very new to ml in R and am trying to simply add all variables from X_train to predict y_train in the model training. I am running into problem with them not being in the same data.frame. My code is as such:

logitmod <- glm(log_y_train ~ log_X_train, family = "binomial")

log_y_train is a factor of length 200386 and log_X_train is a data.frame of 174 variables and 200386 rows. It is for this reason I cannot simply type all column names.

However I get the following error:

invalid type (list) for variable 'log_X_train'

I thought this was a dataframe but nonetheless tried unlist() when then told me lengths differed. Can anyone help to fix this issue to use both variables in the logit.

Thanks


Solution

  • Solution 1

    Bind log_y_train and log_X_train into a data.frame so that you can use " ~ ." in a formula to represent all variables in log_X_train.

    glm(log_y_train ~ ., family = binomial(), data = cbind(log_y_train, log_X_train))
    

    Solution 2

    Use reformulate() to create a formula with all variables in log_X_train as predictors and log_y_train as response. This one has no need to bind log_y_train and log_X_train.

    glm(reformulate(names(log_X_train), "log_y_train"), family = binomial(), data = log_X_train)