Search code examples
pythonarraysnumpytheano

Build block diagonal matrix from many small matrices in numpy / theano


I want to build a block diagonal matrix from a list of square matrices of the same shape.

This answer explains how to do it for a single matrix, but I have a whole bunch of different matrices that need to be made into a block matrix.

I want to perform this operation in Theano on GPU, so performance is a necessity (and the function has to exist in Theano).

Details : The reason for doing this is to accelerate the calculation of eigenvalues/vectors on GPU when there are many small matrices (e.g. around 10000 matrices 7x7). Instead of taking the eigenvalues of each small matrix separately (very slow on GPU), I want to perform a big EVD over the block diagonal matrix (same eigenvalues than the small matrices). Hopefully this will be faster and the sparsity of the matrix will not create much overhead (or maybe EIGH will take advantage of that).


Solution

  • For future readers : I've found a way to do it purely in NumPy, which is slightly faster than the SciPy function :

    def block_diagonal(x):
    " x should be a tensor-3 (#num matrices, n,n) "
    
    shape = x.shape
    n = shape[-1]
    
    indx = np.repeat(np.arange(n),n)
    indy = np.tile(np.arange(n),n)
    
    indx = np.concatenate([indx + k * n for k in range(shape[0])])
    indy = np.concatenate([indy + k * n for k in range(shape[0])])
    
    block = np.zeros((shape[0]*n,shape[0]*n))
    block[(indx,indy)] = x.flatten()
    
    return block
    

    This implementation simply builds up the indices where the blocks will be situated, then fills it !

    Timings :

    matrix_list = [np.arange(i,i+49).reshape((7,7)) for i in range(1000)]
    
    matlist = np.array(matrix_list)
    
    %timeit block_diagonal(matlist)
    25.6 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %timeit scipy.linalg.block_diag(*matrix_list)
    28.6 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    matrix_list = [np.arange(i,i+49).reshape((7,7)) for i in range(5000)]
    matlist = np.array(matrix_list)
    
    %timeit block_diagonal(matlist)
    141 ms ± 75.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %timeit scipy.linalg.block_diag(*matrix_list)
    157 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    Note : The same function in Theano is way slower than its NumPy counterpart, probably because of the need to use scan for the concatenation/shift of the indices. Any idea about how to tackle this problem is welcome !