There is a question with the same title, however, I hope this question is also of interest [and personally I'd like to know the answer].
First, I will specify a dataset with one continuous and one factor covariate:
set.seed(1)
n <- 50
u1 <- sample(c(1,2), n, replace = TRUE)
u1 <- factor(u1)
u2 <- runif(n)
data <- data.frame(u1, u2)
I don't want to run a gam
model, but only to create the design matrix. I have considered 2 ways.
First,
a<-mgcv::s(u2,k=5,bs="ps",by=u1)
b<- mgcv::smoothCon(a,data=data,absorb.cons=TRUE)
However,
b[[1]]$X[u1==2,]
consists of only 0's.
Second,
a<- mgcv::s(u1,u2,k=5,bs="ad")
b<- mgcv::smoothCon(a,data=data,absorb.cons=TRUE)
However, it gives me an error message. Error in Summary.factor(c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, :
‘min’ not meaningful for factors
How can this problem be resolved? I'd like to have the design matrix that models a smooth term separately for each level of factor u1
[binary, as here, or otherwise].
I don't want to run a
gam
model, but only to create the design matrix.I'd like to have the design matrix that models a smooth term separately for each level of factor
u1
.
library(mgcv)
set.seed(1)
n <- 50
u1 <- sample(c(1,2), n, replace = TRUE)
u1 <- factor(u1)
u2 <- runif(n)
data <- data.frame(u1, u2)
a <- s(u2, k = 5, bs = "ps", by = u1)
b <- smoothCon(a, data = data, absorb.cons = TRUE)
smoothCon
produces a list of length nlevels(u1)
for this factor "by" smooth. A Dummy matrix is created for u1
(with two columns in this example), and each column is multiplied to the design matrix of s(u2, l = 5, bs = "ps")
. So, b[[1]]$X
is the design matrix for the first level, and b[[2]]$X
is the one for the second level. You shouldn't be surprised that b[[1]]$X[u1 == 2]
gives all zeros, as that is the design matrix for u1 == 1
. You want to cbind
those separate matrices together:
cbind(b[[1]]$X, b[[2]]$X)
Note that, because you have set absorb.cons = TRUE
, the spline function is centered. I.e., mean of each level is constrained (not necessarily at 0 but is fixed at some value). A practical effect is that you then need to put u1
in the model otherwise you can not model the group mean. In mgcv
, you have to specify
s(u2, k = 5, bs = 'ps', by = u1) + u1
This is also mentioned in the Q & A you referenced: mgcv: how to specify interaction between smooth and factor?. In your application outside mgcv
you need be aware of this as well. So a complete model matrix should be
cbind(b[[1]]$X, b[[2]]$X, model.matrix(~ u1))