Say, I have n subjects and each is repeated t times. If I want to create a selection matrix, it would be as follows
n = 5
t = 3
Select_M = diag(n) %x% matrix(1, t)
Select_M
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 1 0 0 0 0
[3,] 1 0 0 0 0
[4,] 0 1 0 0 0
[5,] 0 1 0 0 0
[6,] 0 1 0 0 0
[7,] 0 0 1 0 0
[8,] 0 0 1 0 0
[9,] 0 0 1 0 0
[10,] 0 0 0 1 0
[11,] 0 0 0 1 0
[12,] 0 0 0 1 0
[13,] 0 0 0 0 1
[14,] 0 0 0 0 1
[15,] 0 0 0 0 1
My interest is to have different time periods for each subject. In other words, the first subject is repeated 7 times, the second subject is repeated 11 times, and so on.
How can I efficiently create a selection matrix for these specific repeats?
You can define a factor column indicating different time periods, and then use model.matrix()
to create a design matrix from it without the intercept term.
For example, the first subject is repeated 2 times, the second subject 3 times, and the third subject 4 times, you can try as follows:
t = c(2, 3, 4)
data.frame(sub = factor(rep(seq_along(t), t))) |>
model.matrix(~ sub - 1, data = _)
# sub1 sub2 sub3
# 1 1 0 0
# 2 1 0 0
# 3 0 1 0
# 4 0 1 0
# 5 0 1 0
# 6 0 0 1
# 7 0 0 1
# 8 0 0 1
# 9 0 0 1