Search code examples
rmgcv

mgcv: how to specify interaction between smooth and factor? part II


There is a question with the same title, however, I hope this question is also of interest [and personally I'd like to know the answer].

First, I will specify a dataset with one continuous and one factor covariate:

set.seed(1)
n <- 50
u1 <- sample(c(1,2), n, replace = TRUE) 
u1 <- factor(u1)
u2 <- runif(n)
data <- data.frame(u1, u2)

I don't want to run a gam model, but only to create the design matrix. I have considered 2 ways.

First,

a<-mgcv::s(u2,k=5,bs="ps",by=u1)
b<- mgcv::smoothCon(a,data=data,absorb.cons=TRUE) 

However,

b[[1]]$X[u1==2,]

consists of only 0's.

Second,

a<- mgcv::s(u1,u2,k=5,bs="ad")
b<- mgcv::smoothCon(a,data=data,absorb.cons=TRUE) 

However, it gives me an error message. Error in Summary.factor(c(1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, : ‘min’ not meaningful for factors

How can this problem be resolved? I'd like to have the design matrix that models a smooth term separately for each level of factor u1 [binary, as here, or otherwise].


Solution

  • I don't want to run a gam model, but only to create the design matrix.

    I'd like to have the design matrix that models a smooth term separately for each level of factor u1.

    library(mgcv)
    
    set.seed(1)
    n <- 50
    u1 <- sample(c(1,2), n, replace = TRUE) 
    u1 <- factor(u1)
    u2 <- runif(n)
    data <- data.frame(u1, u2)
    

    a <- s(u2, k = 5, bs = "ps", by = u1)
    b <- smoothCon(a, data = data, absorb.cons = TRUE)
    

    smoothCon produces a list of length nlevels(u1) for this factor "by" smooth. A Dummy matrix is created for u1 (with two columns in this example), and each column is multiplied to the design matrix of s(u2, l = 5, bs = "ps"). So, b[[1]]$X is the design matrix for the first level, and b[[2]]$X is the one for the second level. You shouldn't be surprised that b[[1]]$X[u1 == 2] gives all zeros, as that is the design matrix for u1 == 1. You want to cbind those separate matrices together:

    cbind(b[[1]]$X, b[[2]]$X)
    

    Note that, because you have set absorb.cons = TRUE, the spline function is centered. I.e., mean of each level is constrained (not necessarily at 0 but is fixed at some value). A practical effect is that you then need to put u1 in the model otherwise you can not model the group mean. In mgcv, you have to specify

    s(u2, k = 5, bs = 'ps', by = u1) + u1
    

    This is also mentioned in the Q & A you referenced: mgcv: how to specify interaction between smooth and factor?. In your application outside mgcv you need be aware of this as well. So a complete model matrix should be

    cbind(b[[1]]$X, b[[2]]$X, model.matrix(~ u1))