Search code examples
pythonnumpypatsy

build design matrix python


Suppose I have a RxC contingency table. This means there are R rows and C columns. I want a matrix, X, of dimension RC × (R + C − 2) that contains the R − 1 “main effects” for the rows and the C − 1 “main effects” for the columns.For example, if you have R=C=2 (R = [0, 1], C = [0, 1]) and main effects only, there are various ways to parameterize the design matrix (X), but below is one way:

1 0
0 1
1 0
0 0

Note that this is 4 x 2 = RC x (R + C - 2), you omit one level of each row and one level of each column.

How can I do this in Python for any value of R and C ie R = 3, C = 4 ([0 1 2] and [0 1 2 3])? I only have the values of R and C, but I can use them to construct arrays using np.arange(R) and np.arange(C).


Solution

  • The following should work:

    R = 3
    C = 2
    
    ir = np.zeros((R, C))
    ir[0, :] = 1
    ir = ir.ravel()
    
    mat = []
    for i in range(R):
        mat.append(ir)
        ir = np.roll(ir, C)
    
    ic = np.zeros((R, C))
    ic[:, 0] = 1
    ic = ic.ravel()
    
    for i in range(C):
        mat.append(ic)
        ic = np.roll(ic, R)
    
    mat = np.asarray(mat).T
    

    and the result is:

    array([[ 1.,  0.,  0.,  1.,  0.],
           [ 1.,  0.,  0.,  0.,  1.],
           [ 0.,  1.,  0.,  1.,  0.],
           [ 0.,  1.,  0.,  0.,  1.],
           [ 0.,  0.,  1.,  1.,  0.],
           [ 0.,  0.,  1.,  0.,  1.]])
    

    Thanks everyone for your help!