Search code examples
rgamautocorrelation

How to correct error in gamm for modGI hierarchical model (and thus how to visualise and correct for autocorrelation more than AR1 in time series)?


I have blood biochemistry markers measured in 28 subjects, where samples were collected every month.

I can run a simple modG with gamm and get the acf and pacf plots.

  modG <- gamm(Met1 ~ s(Timepoint, k=7), data=df, method="REML")

I run into problems when trying to run a gamm for a mod GI (gam works fine). I would like to check acf and pacf plots for this and correct for potentially AR or MA greater than order one (bam only allows AR1 correction).

 modGI <- gamm(Met1 ~ s(Timepoint, k=7) + s(Timepoint, by=subject, k=7),
         random(list(subject=~1)), data=df, method="REML")

Error in MEestimate(lmeSt, grps) :
Singularity in backsolve at level 0, block 1

What does this error mean, and how can I overcome it? A similar issue was observed here https://stats.stackexchange.com/questions/197726/different-interactions-specifications-with-gamm, but its not clear how to resolve it.

Edit: Output from draw(modGI) using the bs="sz". The different colours correspond to the different subjects (ID_new).

enter image description here

enter image description here


Solution

  • A singular matrix is one with a determinant of 0 and is not invertible. The operation that raised the error is likely involved with doing that inversion in a numerically accurate way (via a Cholesky decomposition).

    In practical terms, what this likely means is that you have a model matrix (or some statistical transformation downstream of that matrix created during model fitting) that is rank deficient or numerically close to being rank deficient that as far as the computer is concerned it is.

    This likely stems from that fact that you have multiple smooths of Timepoint:

    1. the s(Timepoint, k=7), and
    2. the set of smooths s(Timepoint, by=subject, k=7)

    As there is nothing to stop the smooth from (1) looking like some or all of the smooths from (2), you are likely including concurved terms in the model. Concurvity is that non-linear equivalent of colinearity - effectively you are including the same "effect" in the model multiple times. We call this an "identifiability" issue, but it can also show up as problems elsewhere as rank deficiency.

    What we suggest in the HGAM paper from where you are taking the model names of G and GI is to make the model identifiable by changing the focus of the penalty on the factor by smooths to penalise any deviation from a flat, constant function rather than from a linear function. We do that with the thin plate splines by setting the penalty to be order 1 with m = 1:

    modGI <- gamm(Met1 ~ s(Timepoint, k=7) +
        s(Timepoint, by=subject, k=7, m = 1),
      random(list(subject=~1)), data=df, method="REML")
    

    As 28 subjects isn't that large, if this the model you want to fit, I would actually suggest you moving back to the gam() function and make use of a new basis in {mgcv} that wasn't available at the time we wrote the HGAM paper, namely the "sz" basis:

    modGI <- gam(Met1 ~ s(Timepoint, k = 7) +
        s(Timepoint, subject, bs = "sz", k = 7),
      data = df, method = "REML")
    

    where now the factor-smooth interaction is, by it's construction, orthogonal (uncorrelated) with the main effect s(Timepoint).