Search code examples
rglmr-factor

R: GLM: factor level not present in data set, but I still want coefficient


When I train a model, and afterwards predict on testing data, sometimes some factor level that is not present in the training data shows up in the testing data, and it gives me an error, because the factor level was not available when training the model.

Working example:

mtcars2<-mtcars
mtcars2$gear<-as.factor(mtcars2$gear)
mtcars_train<-mtcars2[1:10,]
mtcars_test<-mtcars2[11:nrow(mtcars2),]
model<-glm(formula = cyl ~ gear,data = mtcars_train,family=poisson(link="log"))
predict(object = model, newdata = mtcars_test)


Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = >object$xlevels) : 
  factor gear has new levels 5

I want to get coefficients for all the factor levels for the factor I specifiy in my GLM, and if some level is not in the data, I want this factor to be set to 1 in my GLM object. How can I do this?


Solution

  • Warning: This is not a good way of dealing with unseen levels and the result will be off. I do not recommend it.

    Having said that, you could do the following. Namely adding the missing level to the levels in the glm model.

    model$xlevels$gear
    [1] "3" "4"
    

    as you can see the missing is level 5

    # adding level 5
    model$xlevels$gear[3] <- "5"
    
    exp(predict(object = model, newdata = mtcars_test))
              Merc 280C          Merc 450SE          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental   Chrysler Imperial 
                      5                   7                   7                   7                   7                   7                   7 
               Fiat 128         Honda Civic      Toyota Corolla       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
                      5                   5                   5                   7                   7                   7                   7 
       Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa      Ford Pantera L        Ferrari Dino       Maserati Bora 
                      7                   5                   7                   7                   7                   7                   7 
             Volvo 142E 
                      5