Search code examples
rlogistic-regressionr-caret

Logistic Regression in Caret - No Intercept?


Performing logistic regression in R using the caret package and trying to force a zero intercept such that probability at x=0 is .5. In other forms of regression, it seems like you can turn the intercept off using tunegrid, but that has no functionality for logistic regression. Any ideas?

model <- train(y ~ 0+ x, data = data, method = "glm", family = binomial(link="probit"),
               trControl = train.control)

And yes, I "know" that the probability at x=0 should be .5, and thus trying to force it.


Solution

  • There's a vignette on how to set up a custom model for caret. So in the solution below, you can also see why the intercept persist:

    library(caret)
    glm_wo_intercept = getModelInfo("glm",regex=FALSE)[[1]]
    

    if you look at the fit, there's a line that does:

    glm_wo_intercept$fit
    
    ....
    modelArgs <- c(list(formula = as.formula(".outcome ~ ."), data = dat), theDots)
    ...
    

    So the intercept is there by default. You can change this line and run caret on this modified model:

    glm_wo_intercept$fit = function(x, y, wts, param, lev, last, classProbs, ...) {
      dat <- if(is.data.frame(x)) x else as.data.frame(x)
      dat$.outcome <- y
      if(length(levels(y)) > 2) stop("glm models can only use 2-class outcomes")
    
      theDots <- list(...)
      if(!any(names(theDots) == "family"))
            {
        theDots$family <- if(is.factor(y)) binomial() else gaussian()
                        }
      if(!is.null(wts)) theDots$weights <- wts
      # change the model here
      modelArgs <- c(list(formula = as.formula(".outcome ~ 0+."), data = dat), theDots)
    
      out <- do.call("glm", modelArgs)
      out$call <- NULL
      out
                      }
    

    We fit the model:

    data = data.frame(y=factor(runif(100)>0.5),x=rnorm(100))
    model <- train(y ~ 0+ x, data = data, method = glm_wo_intercept, 
    family = binomial(),trControl = trainControl(method = "cv",number=3))
    
    predict(model,data.frame(x=0),type="prob")
      FALSE TRUE
    1   0.5  0.5