I'm trying to use hierarchical general additive models in mgcv using the Gaussian location-scale model family. However, this family of functions throws a cryptic error when trying to fit.
Followed the guidance on HGAMs from a recent paper (https://peerj.com/articles/6876/) along with the guidance from the notes in the lmvar package write-up (https://cran.r-project.org/web/packages/lmvar/vignettes/Intro.html).
library(mgcv); library(datasets)
# Using the CO2 dataset for illustration
data <- datasets::CO2
# Changing the 'Plant' factor from ordered to unordered
data$Plant <- factor(data$Plant, ordered = FALSE)
# This model fits with no errors
CO2_modG_1 <- gam(log(uptake) ~ s(log(conc), k = 5, bs = 'tp') + s(Plant, k = 12, bs = 're'), data = data, method = 'REML', family = 'gaussian')
# This model fails with error
CO2_modG_2 <- gam(log(uptake) ~ s(log(conc), k = 5, bs = 'tp') + s(Plant, k = 12, bs = 're'), data = data, method = 'REML', family = gaulss())
This returns the error message:
Error in while (sqrt(mean(dlb/(dlb + lami * dS * rm)) * mean(dlb)/mean(dlb + :
missing value where TRUE/FALSE needed
You have to pass a list of two formula objects to gam()
when fitting a Gaussian location-scale model with the gaulss()
family.
You don't say how you want to model the scale component of the model, so here's an example which should be equivalent to the gaussian()
family where we have a constant term for the scale and a the linear predictor you provided for the mean.
CO2_modG_2 <- gam(list(log(uptake) ~ s(log(conc), k = 5, bs = 'tp') +
s(Plant, k = 12, bs = 're'),
~ 1),
data = data, method = 'REML', family = gaulss())
If you want to allow each plant to have it's own variance, say, then we can add terms to the second formula:
CO2_modG_3 <- gam(list(log(uptake) ~ s(log(conc), k = 5, bs = 'tp') +
s(Plant, k = 12, bs = 're'),
~ s(Plant, k = 12, bs = 're')),
data = data, method = 'REML', family = gaulss())
Importantly, you have to provide a list of two formula objects to this family and the second formula should only have a tilde plus the right-hand-side of the formula specifying the linear predictor for the scale parameter.
list(response ~ x1 + s(x2), # location linear predictor, left & right hand sided
~ x1 + s(x3) # scale linear predictor, right-hand side only
)
hence, as per the first example I showed above, if you want a constant term only in the linear predictor for the scale you need to use R's notation for the intercept or constant term; ~ 1
list(response ~ x1 + s(x2), # location linear predictor, left & right hand sided
~ 1 # scale linear predictor, constant
)