Search code examples
rgammgcv

How to check for overdispersion in a GAM with negative binomial distribution?


I fit a Generalized Additive Model in the Negative Binomial family using gam from the mgcv package. I have a data frame containing my dependent variable y, an independent variable x, a factor fac and a random variable ran. I fit the following model

gam1 <- gam(y ~ fac + s(x) + s(ran, bs = 're'), data = dt, family = "nb"

I have read in Negative Binomial Regression book that it is still possible for the model to be overdisperesed. I have found code to check for overdispersion in glm but I am failing to find it for a gam. I have also encountered suggestions to just check the QQ plot and standardised residuals vs. predicted residuals, but I can not decide from my plots if the data is still overdisperesed. Therefore, I am looking for an equation that would solve my problem.


Solution

  • A good way to check how well the model compares with the observed data (and hence check for overdispersion in the data relative to the conditional distribution implied by the model) is via a rootogram.

    I have a blog post showing how to do this for glm() models using the countreg package, but this works for GAMs too.

    The salient parts of the post applied to a GAM version of the model are:

    library("coenocliner")
    library('mgcv')
    
    ## parameters for simulating
    set.seed(1)
    locs <- runif(100, min = 1, max = 10)     # environmental locations
    A0 <- 90                                  # maximal abundance
    mu <- 3                                   # position on gradient of optima
    alpha <- 1.5                              # parameter of beta response
    gamma <- 4                                # parameter of beta response
    r <- 6                                    # range on gradient species is present
    pars <- list(m = mu, r = r, alpha = alpha, gamma = gamma, A0 = A0)
    nb.alpha <- 1.5                           # overdispersion parameter 1/theta
    zprobs <- 0.3                             # prob(y == 0) in binomial model
    
    ## simulate some negative binomial data from this response model
    nb   <- coenocline(locs, responseModel = "beta", params = pars,
                       countModel = "negbin",
                       countParams = list(alpha = nb.alpha))
    df <- setNames(cbind.data.frame(locs, nb), c("x", "yNegBin"))
    

    OK, so we have a sample of data drawn from a negative binomial sampling distribution and we will now fit two models to these data:

    1. A Poisson GAM
    m_pois <- gam(yNegBin ~ s(x), data = df, family = poisson())
    
    1. A negative binomial GAM
    m_nb   <- gam(yNegBin ~ s(x), data = df, family = nb())
    

    The countreg package is not yet on CRAN but it can be installed from R-Forge:

    install.packages("countreg", repos="http://R-Forge.R-project.org")
    

    Then load the packages and plot the rootograms:

    library("countreg")
    library("ggplot2")
    
    root_pois <- rootogram(m_pois, style = "hanging", plot = FALSE)
    root_nb   <- rootogram(m_nb, style = "hanging", plot = FALSE)
    

    Now plot the rootograms for each model:

    autoplot(root_pois)
    autoplot(root_nb)
    

    This is what we get (after plotting both using cowplot::plot_grid() to arrange the two rootograms on the same plot)

    enter image description here

    We can see that the negative binomial model does a bit better here than the Poisson GAM for these data — the bottom of the bars are closer to zero throughout the range of the observed counts.

    The countreg package has details on how you can add an uncertain band around the zero line as a form of goodness of fit test.

    You can also compute the Pearson estimate for the dispersion parameter using the Pearson residuals of each model:

    r$>  sum(residuals(m_pois, type = "pearson")^2) / df.residual(m_pois)
    [1] 28.61546
    r$>  sum(residuals(m_nb, type = "pearson")^2) / df.residual(m_nb)
    [1] 0.5918471
    

    In both cases, these should be 1; we see substantial overdispersion in the Poisson GAM, and some under-dispersion in the Negative Binomial GAM.