I'm fitting negative binomial models using mgcv::gam and I've noticed that the null deviance changes from fit to fit. If I use the negbin
family function instead of nb
, the issue goes away.
The following reproduces the issue.
library(mgcv)
set.seed(3)
n <- 400
dat <- gamSim(1, n=n)
g <- exp(dat$f / 5)
## negative binomial data...
dat$y <- rnbinom(g, size=3, mu=g)
## known theta fit ...
# Now fit 3 different models
preds <- c("x1", "x2", "x3")
for (i in 1:length(preds)){
fo <- formula(paste("y ~ x0 +", preds[i]))
#print(fo)
m1 <- gam(fo, data=dat, family=nb(theta=3)) # nb
m2 <- gam(fo, data=dat, family=negbin(3)) # negbin
print(paste(m1$null.deviance, ", ", m2$null.deviance))
}
If I run that, I get the following.
[1] "820.724580736807 , 820.708788014928"
[1] "820.747020281717 , 820.708788014928"
[1] "820.708788454065 , 820.708788014928"
The null.deviance from using nb
varies from 820.71 to 820.75.
In this case, the null deviance only changes slightly, but in another example I have, it changes quite a lot.
What am I missing?
Thanks, Harry
For `extended families' such as 'nb', mgcv is using an approximation to the null deviance, which is easy to compute, even for families such as ordered categorical, but depends on the mean of the location parameter for the response according to the model. The approximation is that the location parameter for the null model will be the average of the observation specific location parameters under the fitted model.
In the next release (1.8-23) this will be replaced by direct minimization of the deviance wrt a single location parameter, to find the null deviance.
Simon Wood (mgcv maintainer)