I'm trying to check that I understand how R calculates the statistic AIC, AICc (corrected AIC) and BIC for a glm()
model object (so that I can perform the same calculations on revoScaleR::rxGlm()
objects - particularly the AICc, which isn't available by default)
I had understood that these were defined as follows:
let p
= number of model parameters
let n
= number of data points
AIC = deviance + 2p
AICc = AIC + (2p^2 + 2p)/(n-p-1)
BIC = deviance + 2p.log(n)
So I tried to replicate these numbers and compare them to the corresponding R function calls. It didn't work:
library(AICcmodavg) # for the AICc() function
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
n <- nrow(glm_a1$data) # 32
p <- glm_a1$rank # 11
dev <- glm_a1$deviance# 147.49
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + 2 * p * log(n)
AIC(glm_a1) # 163.71
my_AIC # 169.49
AICc(glm_a1) # 180.13 (from AICcmodavg package)
my_AICc # 182.69
BIC(glm_a1) # 181.30
my_BIC # 223.74
By using debug(AIC)
I can see that the calculation is different. It's based on 12 parameters (one extra for the estimated dispersion/scale parameter?). Also the log likelihood is obtained using logLik()
which brings back a number -69.85
, which suggests to me that the model deviance would be -2*-69.85 = 139.71
(which it isn't).
Does anyone know what I've done wrong please? Thank you.
in the extractAIC
manual page
Where :
Thus
glm_a1$ranks
returns the number of fitted parameter without accounting for the fitted variance used in gaussian families.
?glm
states
deviance: up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
that's why -2*logLik(glm_a1) - deviance(glm_a1) = 7.78 > 0
summary(glm_a1)
returns the following line Dispersion parameter for gaussian family taken to be 7.023544
approximately the difference between -2 log likelihood and the deviance.
library(AICcmodavg)
#> Warning: package 'AICcmodavg' was built under R version 3.6.2
#> Warning: no function found corresponding to methods exports from 'raster' for:
#> 'wkt'
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
#> Deviance = 147.4944 Iterations - 1
#> Deviance = 147.4944 Iterations - 2
(loglik <- logLik(glm_a1))
#> 'log Lik.' -69.85491 (df=12)
# thus the degrees of freedom r uses are 12 instead of 11
n <- attributes(loglik)$nobs # following user20650 recommendation
p <- attributes(loglik)$df # following user20650 recommendation
dev <- -2*as.numeric(loglik)
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + p * log(n)
BIC(glm_a1)
#> [1] 181.2986
my_BIC
#> [1] 181.2986
AIC(glm_a1)
#> [1] 163.7098
my_AIC
#> [1] 163.7098
AICc(glm_a1)
#> [1] 180.1309
my_AICc
#> [1] 180.1309