Search code examples
rmatrixsparse-matrix

Create a specific selection matrix


Say, I have n subjects and each is repeated t times. If I want to create a selection matrix, it would be as follows

n = 5
t = 3
Select_M = diag(n) %x% matrix(1, t)

Select_M
      [,1] [,2] [,3] [,4] [,5]
 [1,]    1    0    0    0    0
 [2,]    1    0    0    0    0
 [3,]    1    0    0    0    0
 [4,]    0    1    0    0    0
 [5,]    0    1    0    0    0
 [6,]    0    1    0    0    0
 [7,]    0    0    1    0    0
 [8,]    0    0    1    0    0
 [9,]    0    0    1    0    0
[10,]    0    0    0    1    0
[11,]    0    0    0    1    0
[12,]    0    0    0    1    0
[13,]    0    0    0    0    1
[14,]    0    0    0    0    1
[15,]    0    0    0    0    1

My interest is to have different time periods for each subject. In other words, the first subject is repeated 7 times, the second subject is repeated 11 times, and so on.

How can I efficiently create a selection matrix for these specific repeats?


Solution

  • You can define a factor column indicating different time periods, and then use model.matrix() to create a design matrix from it without the intercept term.

    For example, the first subject is repeated 2 times, the second subject 3 times, and the third subject 4 times, you can try as follows:

    t = c(2, 3, 4)
    
    data.frame(sub = factor(rep(seq_along(t), t))) |>
      model.matrix(~ sub - 1, data = _)
    
    #   sub1 sub2 sub3
    # 1    1    0    0
    # 2    1    0    0
    # 3    0    1    0
    # 4    0    1    0
    # 5    0    1    0
    # 6    0    0    1
    # 7    0    0    1
    # 8    0    0    1
    # 9    0    0    1