I am using a Poisson GLM on some dummy data to predict ClaimCounts based on two variables, frequency and Judicial Orientation.
Dummy Data Frame:
data5 <-data.frame(Year=c("2006","2006","2006","2007","2007","2007","2008","2009","2010","2010","2009","2009"),
JudicialOrientation=c("Defense","Plaintiff","Plaintiff","Neutral","Defense","Plaintiff","Defense","Plaintiff","Neutral","Neutral","Plaintiff","Defense"),
Frequency=c(0.0,0.06,.07,.04,.03,.02,0,.1,.09,.08,.11,0),
ClaimCount=c(0,5,10,3,4,0,7,8,15,16,17,12),
Loss = c(100000,100,2500,100000,25000,0,7500,5200, 900,100,0,50),
Exposure=c(10,20,30,1,2,4,3,2,1,54,12,13)
)
Model GLM:
ClaimModel <- glm(ClaimCount~JudicialOrientation+Frequency
,family = poisson(link="log"), offset=log(Exposure), data = data5, na.action=na.pass)
Call:
glm(formula = ClaimCount ~ JudicialOrientation + Frequency, family = poisson(link = "log"),
data = data5, na.action = na.pass, offset = log(Exposure))
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7555 -0.7277 -0.1196 2.6895 7.4768
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3493 0.2125 -1.644 0.1
JudicialOrientationNeutral -3.3343 0.5664 -5.887 3.94e-09 ***
JudicialOrientationPlaintiff -3.4512 0.6337 -5.446 5.15e-08 ***
Frequency 39.8765 6.7255 5.929 3.04e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 149.72 on 11 degrees of freedom
Residual deviance: 111.59 on 8 degrees of freedom
AIC: 159.43
Number of Fisher Scoring iterations: 6
I am using an offset of Exposure as well.
I then want to use this GLM to predict claim counts for the same observations:
data5$ExpClaimCount <- predict(ClaimModel, newdata=data5, type="response")
If I understand correctly then the Poisson glm equation should then be:
ClaimCount = exp(-.3493 + -3.3343*JudicialOrientationNeutral + -3.4512*JudicialOrientationPlaintiff + 39.8765*Frequency + log(Exposure))
However I tried this manually(In excel =EXP(-0.3493+0+0+LOG(10)) for observation 1 for example)
and for some of the observations but did not get the correct answer.
Is my understanding of the GLM equation incorrect?
You are right with the assumption about how predict()
for a Poisson GLM works. This can be verified in R:
co <- coef(ClaimModel)
p1 <- with(data5,
exp(log(Exposure) + # offset
co[1] + # intercept
ifelse(as.numeric(JudicialOrientation)>1, # factor term
co[as.numeric(JudicialOrientation)], 0) +
Frequency * co[4])) # linear term
all.equal(p1, predict(ClaimModel, type="response"), check.names=FALSE)
[1] TRUE
As indicated in the comments you probably get the wrong results in Excel because of the different basis of the logarithm (10 in Excel, Euler's number in R).