I have been fitting different hierarchical GAMs (hereafter: HGAM) using mgcv
in R. I can extract and plot their predictions for their random effects without problems. Conversely, extracting and plotting their predictions for their fixed effects only works for some models, and I don't know why.
Here is a practical example, which refers to the color spectra of flowers from two species (Taxon
) sampled at various localities (also discussed here):
rm(list=ls()) # wipe R's memory clean
library(pacman) # load packages, installing them from CRAN if needed
p_load(RCurl) # allows accessing data from URL
ss <- read.delim(text=getURL("https://raw.githubusercontent.com/marcoplebani85/datasets/master/flower_color_spectra.txt"))
head(ss)
ss$density <- ifelse(ss$density<0, 0, ss$density) # set spurious negative reflectance values to zero
ss$clr <- ifelse(ss$Taxon=="SpeciesB", "red", "black")
ss <- with(ss, ss[order(Locality, wl), ])
These are the mean color spectra at the population level for the two species (rolling means were used):
Each color refers to a different species. Each line refers to a different locality.
The following model is a HGAM of type G according to Pedersen et al.'s classification (2019) and it does not give any issues:
gam_G1 <- bam(density ~ Taxon # main effect
+ s(wl, by = Taxon, k = 20) # interaction
+ s(Locality, bs="re"), # "re" is short for "random effect"
data = ss, method = 'REML',
family="quasipoisson"
)
# gam.check(gam_G1)
# k.check(gam_G1)
# MuMIn::AICc(gam_G1)
# gratia::draw(gam_G1)
# plot(gam_G1, pages=1)
# use gam_G1 to predict wl by Locality
# dataset of predictor values to estimate response values for:
nn <- unique(ss[, c("wl", "Taxon", "Locality", "clr")])
# predict:
pred <- predict(object= gam_G1, newdata=nn, type="response", se.fit=T)
nn$fit <- pred$fit
nn$se <- pred$se.fit
# use gam_G1 to predict wl by Taxon
# dataset of predictor values to estimate response values for:
nn <- unique(ss[, c("wl",
"Taxon",
"Locality",
"clr")])
nn$Locality=0 # turns random effect off
# after https://stats.stackexchange.com/q/131106/214127
# predict:
pred <- predict(object = gam_G1,
type="response",
newdata=nn,
se.fit=T)
nn$fit <- pred$fit
nn$se <- pred$se.fit
R warns me that factor levels 0 not in original fit
, but it executes the task without issues:
Left panel: gam_G1
predictions at the Locality
level. Right panel: gam_G1
predictions for the fixed effects.
Troublesome models
The following model is a HGAM of type "GI" sensu Pedersen et al. (2019). It produces more accurate predictions at the Locality
level, but I can only get NA
as predictions at the level of fixed effects:
# GI: models with a global smoother for all observations,
# plus group-level smoothers, the wiggliness of which is estimated individually
start_time <- Sys.time()
gam_GI1 <- bam(density ~ Taxon # main effect
+ s(wl, by = Taxon, k = 20) # interaction
+ s(wl, by = Locality, bs="tp", m=1)
# "tp" is short for "thin plate [regression spline]"
+ s(Locality, bs="re"),
family="quasipoisson",
data = ss, method = 'REML'
)
end_time <- Sys.time()
end_time - start_time # it took ~2.2 minutes on my computer
# gam.check(gam_GI1)
# k.check(gam_GI1)
# MuMIn::AICc(gam_GI1)
Attempt at drawing predictions for the fixed effects (Taxon
and wl
) according to gam_GI1
:
# dataset of predictor values to estimate response values for:
nn <- unique(ss[, c("wl",
"Taxon",
"Locality",
"clr")])
nn$Locality=0 # turns random effect off
# after https://stats.stackexchange.com/q/131106/214127
# predict:
pred <- predict(object = gam_GI1,
type="response",
# exclude="c(Locality)",
# # this should turn random effect off
# # (doesn't work for me)
newdata=nn,
se.fit=T)
nn$fit <- pred$fit
nn$se <- pred$se.fit
head(nn)
# wl Taxon Locality clr fit se
# 1 298.34 SpeciesB 0 red NA NA
# 2 305.82 SpeciesB 0 red NA NA
# 3 313.27 SpeciesB 0 red NA NA
# 4 320.72 SpeciesB 0 red NA NA
# 5 328.15 SpeciesB 0 red NA NA
# 6 335.57 SpeciesB 0 red NA NA
Left panel: gam_GI1
predictions at the Locality
level. Right panel (blank): gam_GI1
predictions for the fixed effects.
The following model, which includes a global smoother for all observations, plus group-level smoothers, all with the same "wiggliness", doesn't provide fixed-effect predictions either:
gam_GS1 <- bam(density ~ Taxon # main effect
+ s(wl, by = Taxon, k = 20) # interaction
+ s(wl, by = Locality, bs="fs", m=1),
# "fs" is short for "factor-smoother [interaction]"
family="quasipoisson",
data = ss, method = 'REML'
)
Why don't gam_GI1
and gam_GS1
produce predictions for their fixed effects, and how can I obtain them?
The models can take a few minutes to run. To save time, their output can be downloaded from here as an RData file. My R scripts (which include the code for plotting the figures) are available here.
I think you are conflating several things here; The by
trick to turn off random effects only works for bs = "re"
smooths. Locality
is a factor (otherwise your random effect isn't a random intercept) and setting it to 0
is creating a new level (although it could be creating an NA as 0
isn't among the original levels.
If what you want to do is turn off anything to do with Locality
, you should use exclude
; however you have the invocation wrong. The reason why it's not working is because you are creating a character vector with a single element "c(Locality)"
. This fails for obvious reasons once you realize that c(Locality)
doesn't related to anything in your model. What you need to provide here is a vector of smooth names as printed by summary()
. For example, to exclude the smooth s(Locality, bs = "re")
, {mgcv} knows this as s(Locality)
, so you would use exclude = "s(Locality)"
.
In your case, it is tedious to type out all the "s(wl):LocalityLevelX"
labels for each smooth. As you have only two taxa, it would be easier to use the complimentary argument terms
, where you list smooth labels that you want to include in the model. So you could do terms = c("s(wl):TaxonSpeciesB", "s(wl):TaxonSpeciesC")
or whatever summary()
displays for these smooths.
You also need to include the Taxon
term in terms
, which I think needs to be:
terms = c("TaxonSpeciesB", TaxonSpeciesC",
"s(wl):TaxonSpeciesB", "s(wl):TaxonSpeciesC")
If you install and load my {gratia} package, you can use smooths(gam_GI1)
to list all the smooth labels as far as {mgcv} knows them.
The by
trick works like this:
gam(y ~ x + s(z) + s(id, bs = "re", by = dummy)
where dummy
is set to a numeric value 1
when fitting and to 0
when you are predicting. As this is a numeric by variable you are multiplying the smooth by dummy
and hence why setting it to 0
excludes the term. The reason why your code isn't working is because you really want separate smooths for wl
for each Locality
; Locality
is an actual variable of interest in your data/model, not a dummy variable we create to achieve the aim of excluding a term from the model.
Hopefully now you can see why exclude
and terms
are much better solutions than this dummy
trick.
FYI, in bs = "tp"
, the "tp"
doesn't mean tensor product smooth. It mean thin plate regression spline (TPRS). You only get tensor product smooths through te()
, t2()
, or ti()
terms.