Search code examples
rlme4mixed-modelsinformation-theoryglmmtmb

R packages lme4 and glmmTMB produce different AIC for the same model and data


I fitted the same model to the same data using lmer(){lme4} and glmmTMB(){glmmTMB}. The model's AIC differed depending on the function used, and I would like to know why.

I generated the following random-intercept data:

rm(list=ls())
library(lme4)
library(glmmTMB)
library(ggplot2)

# "regression" with an 8-level RE with variable slopes:
set.seed(1)
ii=8 # parameter set size; no. groups
reps <- sample(x=8:12, size=ii, replace=T)
slope <- 3
intercept <- runif(ii, min = 70, max = 140)
group.sd <- runif(ii, min = 5, max = 15)
group.ID <- LETTERS[1:ii]

xx <- 1:15

out.list <- list() # empty list to store simulated data

for (i in 1:ii){
        df00 <- data.frame(Group=rep(group.ID[i], reps[i]*length(xx)),
                                    x=rep(xx, reps[i]))
        df00$y = intercept[i] + df00$x * slope + rnorm(length(df00[,1]), 0, group.sd[i])
        out.list[[i]] = df00
        }

# to turn out.list into a data.frame:
df <- do.call(rbind.data.frame, out.list)

# data visualisation
gg20 <- ggplot(df, aes(x = x, y = y, colour = Group)) 
gg20 + geom_point() + geom_smooth(method="lm") + theme_classic()

enter image description here

If I fit the same random-intercept model with lme4 and glmmTMB the associated AIC differs:

glm4_ri <- lmer(y~x + (1|Group), data=df)
glmmTMB_ri <- glmmTMB(y~x + (1|Group), data=df)

AIC(glm4_ri, glmmTMB_ri)
           df      AIC
glm4_ri     4 8588.399
glmmTMB_ri  4 8590.300

I realise that the discrepancy is small, but why aren't the AIC exactly the same?


Solution

  • tl;dr lmer is using REML by default, glmmTMB is using ML by default. This is surprisingly, and unfortunately, subtle - lmer does make some efforts to disable REML when it makes sense (e.g. in the anova() method), but sometimes it doesn't 'notice'.

    > logLik(glm4_ri)
    'log Lik.' -4290.2 (df=4)
    > logLik(update(glm4_ri, REML = FALSE))
    'log Lik.' -4291.15 (df=4)
    > logLik(glmmTMB_ri)
    'log Lik.' -4291.15 (df=4)
    > logLik(update(glmmTMB_ri, REML = TRUE))
    'log Lik.' -4290.2 (df=4)
    

    If you're going to be comparing models with different fixed-effect components, you must use ML rather than REML.