Search code examples
rmodelingglmcoefficients

How to use the Predict Function in R after manually altering a GLM's coefficients


I am creating a GLM model with a number of variables. After I obtain my output I am then using the GLM to predict new values.

I have noticed after manually changing a GLM coefficient for one of the categorical variable's levels I am still getting the same Predicted values even though I know some of my data has this level. Some code might help explain my process:

##data frame
df <-data.frame(Account =c("A","B","C","D","E","F","G","H"), 
       Exposure = c(1,50,67,85,250,25,22,89),
       JudicialOrientation=c("Neutral","Neutral","Plaintiff","Defense","Plaintiff","Neutral","Plaintiff","Defense"),
       Freq= c(.008,.5,.05,.34,.7,0,.04,.12),
       Losses = c(100000,100,2500,100000,25000,0,7500,5200),
       LossPerUnit = c(100000,100,2500,100000,25000,0,7500,5200)/c(1,50,67,85,250,25,22,89))


##Variables for modeling
ModelingVars <- as.formula(df$LossPerUnit~df$JudicialOrientation+df$Freq)

##Tweedie GLM
Model <- glm(ModelingVars, family=tweedie(var.power=1.5, link.power = 0),
             weight = Exposure, data = df)
summary(Model)

##Predict Losses with Model coefficients
df$PredictedLossPerUnit <- predict(Model,df, type="response")


##Manually edit a coefficient for one of my categorical variable's levels
Model$coefficients["df$JudicialOrientationNeutral"] <-log(50)

##Predict Losses again to compare
df$PredictedLossPerUnit2 <- predict(Model, df, type ="response")


sum(df$PredictedLossPerUnit)
sum(df$PredictedLossPerUnit2)
View(head(df))
summary(Model)

This code works fine and both PredictedLossPerUnits have different numbers (if the row had an observation of "JudicialOrientationNeutral"). When I go to do something similar on my main data set which has more variables but are in a similar fashion (some continuous, some discrete with multiple bins) I keep getting the same predicted values for my predict function even after I manipulate a coefficient.

Is there anything strange that would cause my predict function to continue to give same results as the original - even after I manually changed a coefficient in my GLM?

EDIT: I Found the answer. In my other data set I was doing: df$PredictedLossPerUnit <- predict(Model,data=df, type="response")

data isnt actually an argument for the predict function, it should have been "newdata". A silly mistake but a good lesson. Thanks to all that helped.


Solution

  • You are using the formula in a manner that detached the meaning from the df object or confused the logic of predict.lm or something. If you instead run the formula creation the way it was intended to be used (without reference to a data object's name ( so using only column names), you get the desired effect:

     ModelingVars <- as.formula(LossPerUnit~JudicialOrientation+Freq)
    
    #----------
    
    > df$PredictedLossPerUnit <- predict(Model,df, type="response")
    > 
    > 
    > ##Manually edit a coefficient for one of my categorical variable's levels
    > Model$coefficients["JudicialOrientationNeutral"] <-log(50)
    > 
    > ##Predict Losses again to compare
    > df$PredictedLossPerUnit2 <- predict(Model, df, type ="response")
    > 
    > df
      Account Exposure JudicialOrientation  Freq Losses  LossPerUnit PredictedLossPerUnit PredictedLossPerUnit2
    1       A        1             Neutral 0.008 100000 100000.00000           1549.56677           40213.38196
    2       B       50             Neutral 0.500    100      2.00000            919.41825           23860.16405
    3       C       67           Plaintiff 0.050   2500     37.31343            169.99221             169.99221
    4       D       85             Defense 0.340 100000   1176.47059            565.49150             565.49150
    5       E      250           Plaintiff 0.700  25000    100.00000             85.29641              85.29641
    6       F       25             Neutral 0.000      0      0.00000           1562.77490           40556.15105
    7       G       22           Plaintiff 0.040   7500    340.90909            171.80535             171.80535
    8       H       89             Defense 0.120   5200     58.42697            714.15870             714.15870
    

    I usually try to keep essential material on screen but here you will need to scroll over to see that the "Neutral" items in the two columns are different.

    Edit: I left the creation of the formula outside since it was the least change possible, but a better strategy would have been to use just your formula without the "as.formula" wrapper, which shouldn't be needed and is going to have a different environment for later evaluation. First run: Model <- glm(LossPerUnit~JudicialOrientation+Freq, family = tweedie(var.power=1.5, link.power = 0), weight = Exposure, data = df) and then do your coefficient violence.