Search code examples
interactiongammgcv

Three Way Interaction in mgcv


I am interested in implementing a three-way interaction in mgcv but while there has been some discussion both here and on Cross Validated, I have had trouble finding an answer to how exactly one should code a three-way interaction between two continuous variables and one categorical variable. In my study, I have only four variables (socioeconomic class (Socio), sex, year of death (YOD), and age) for each individual and I am curious how these variables explain the likelihood of someone being buried with burial goods (N=c.12,000).

I have read Pedersen et al. 2019 and have elected to not include global smooths in my model. However, I am not certain if this means I should also not include the lower order interaction terms in my three-way interaction model. For example, should my code be:

mgcv::gam(Goods ~ Socio + te(Age,YOD,by=Socio,k=5), family=binomial(link='logit'), 
        mydata, method='ML')

should I still include the lower order terms within the three-way interaction:

mgcv::gam(Goods ~ Socio + s(Age,by=Socio,k=5) + s(YOD,by=Socio,k=5) + te(Age,YOD,k=5) + 
        ti(Age,YOD,by=Socio, k=5), family=binomial(link='logit'),mydata,method='ML')

or is there a different means of coding this?


Solution

  • Unless you want to perform a test on the interaction, then you are better off not decomposing the te() into main and interaction effects. This is because the models

    y ~ te(x1, x2) # main + interaction in one smooth model
    
    y ~ s(x1) + s(x2) + ti(x1, x2) # decomposed model
    

    aren't exactly equivalent:

    • you have to be very picky about setting k on all the terms such that these models are using the same number of basis functions, and
    • the decomposed model uses more smoothness parameters than the te() version, so is slightly more complex a model even if you get all the bases are roughly of comparable size

    Note you are including a global smooth in the second model: this term te(Age,YOD,k=5) will be an global smooth of Age and YOD over the whole data set.

    The decomposed version of your first model would be:

    Goods ~ Socio +
      s(Age, by = Socio) +
      s(YOD, by = Socio) +
      ti(Age, YOD, by = Socio)
    

    Setting things up to test if you need the factor by terms or not would require more work but I think you'd be better off do that post-hoc on the te() model, where you can compare the fitted surfaces by differencing them.