I need to model a smooth term over more than one factor. The by
argument allows me to model one smooth per factor level, but I cannot find how to do that over multiple factors.
I tried solutions akin to the following, but with no success:
data <- iris
data$factor2 <- rep(c("A", "B"), 75)
mgcv::gam(Sepal.Length ~ s(Petal.Length, by = c(Species, factor2)), data = data)
#> Error in model.frame.default(formula = Sepal.Length ~ 1 + Petal.Length + : variable lengths differ (found for 'c(Species, factor2)')
Created on 2021-08-05 by the reprex package (v2.0.0)
Any help is welcomed!
One of the issue created by interaction()
is that it changes the model's matrix, meaning that some variables contained in the model's data are changed:
m <- mgcv::gam(body_mass_g ~ s(flipper_length_mm, by = interaction(species, sex)), data = palmerpenguins::penguins)
head(insight::get_data(m))
#> body_mass_g flipper_length_mm species sex
#> 1 3750 181 Adelie.male male
#> 2 3800 186 Adelie.female female
#> 3 3250 195 Adelie.female female
#> 5 3450 193 Adelie.female female
#> 6 3650 190 Adelie.male male
#> 7 3625 181 Adelie.female female
Created on 2021-08-06 by the reprex package (v2.0.1)
This can lead to some issues when using postprocessing functions, for instance for visualisation.
However, following Gavin's and IRTFM's answers, this can be easily addressed by adding the variables as fixed effects in the model.
Here is a demonstration, also illustrating the differences between two separate smooths and the interaction:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
set.seed(1)
# Create data
data <- data.frame(x = rep(seq(-10, 10, length.out = 500), 2),
fac1 = as.factor(rep(c("A", "B", "C"), length.out = 1000)),
fac2 = as.factor(rep(c("X", "Y"), each = 500)))
data$y <- data$x^2 + rnorm(nrow(data), sd = 5)
data$y[data$fac1 == "A"] <- sign(data$x[data$fac1 == "A"]) * data$y[data$fac1 == "A"] + 50
data$y[data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac1 == "B"]^3, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "C"] <- data$y[data$fac2 == "X" & data$fac1 == "C"] - 100
data$y[data$fac2 == "X" & data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "B"] ^ 2, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "A"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "A"] * -3, c(0, 100))
# Real trends
ggplot(data, aes(x = x, y = y, color = fac1, shape = fac2)) +
geom_point()
# Two smooths
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = fac1) + s(x, by = fac2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
# Interaction
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = interaction(fac1, fac2)), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
Created on 2021-08-06 by the reprex package (v2.0.1)
The last model manages to recover the trends for each of the factors' combination.