Search code examples
rlogistic-regression

Function for Logistic Regression Training Set


I am trying to create a function to test a logistic regression model developed on a training set.

For example

train <- filter(y, folds != i)
test <- filter(y, folds == i)

I want to be able to use the formula for different data sets. For example, if I were to take y to be a response variable such as “low” in the birthwt data set and x to be the explanatory variables e.g. “age", “race” how would I implement these arguments into glm.train formula without having to type the function separately for different data sets ?

glm.train <- glm(y ~x, family = binomial, data =  train)

Solution

  • You can use reformulate to create a formula based on strings:

    x <- c("age", "race")
    y <- "low"
    
    form <- reformulate(x, response = y)
    # low ~ age + race
    

    Use this formula for glm:

    glm.train <- glm(form, family = binomial, data =  train)