I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.
So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.
My current model structure looks like:
gam(Response ~
s(Day, bs = "re") +
s(SmoothVar1, bs = "cs") +
s(SmoothVar2, bs = "cs") +
s(SmoothVar3, bs = "cs") +
as.factor(FixedVar),
family=binomial(link="logit"), method = "REML",
data = dat)
I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning). I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).
I would use bam()
for this. You don't need to have big data to fit a with bam()
, you just loose some of the guarantees about convergence that you get with gam()
. bam()
will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho
. This only works for non-Gaussian families if you also set discrete = TRUE
when fitting the model.
You could use gamm()
with family = binomial()
but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.