Search code examples
pythonnumpymatrixsparse-matrixmatrix-multiplication

How to handle large matrices and matrix multiplication in Python


I'm trying to execute a matrix multiplication which has the following scheme:

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

Thus, I want to transpose matrix A, which lead to a N x M matrix with M>>N and multiply with the diagonal matrix which is a M x M matrix. B is the „main diagonal“. The resulting matrix (N x M) should be multiplied with matrix A (M x N) and lead to the N x N matrix C.

The error appears is the following:

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

As the final matrix is N x N, I want to have this matrix as a numpy array. How you see, I try to make the matrix in the mid as a sparsity diagonal matrix which works well. But, I cant understand why Python need this insane large matrix with 1843066024 elements to conduct the multiplication.

Do you have some ideas and/or explanation why this problem appears?


Solution

  • If B is diagonal, you don't need to use sparse to save memory. You can do the calculation with a 1d array, just the diagonal values.

    I'll demonstrate with small dimensions, where making a full B is doesn't cause problems. Others can test this with large dimensions.

    In [5]: A = np.arange(24).reshape(3,8)
    In [7]: B = np.diag(np.arange(8))
    In [8]: B.shape
    Out[8]: (8, 8)
    

    The double matmul:

    In [10]: A@B@A.T
    Out[10]: 
    array([[  784,  1904,  3024],
           [ 1904,  4816,  7728],
           [ 3024,  7728, 12432]])
    

    The equivalent einsum:

    In [12]: np.einsum('ij,jk,lk',A,B,A)
    Out[12]: 
    array([[  784,  1904,  3024],
           [ 1904,  4816,  7728],
           [ 3024,  7728, 12432]])
    

    The 1d diagonal of B:

    In [15]: b = np.diag(B)
    

    A broadcasted multiplication does the same thing as the matmul:

    In [17]: np.allclose(A*b,A@B)
    Out[17]: True
    In [18]: np.allclose(A*b@A.T,A@B@A.T)
    Out[18]: True
    

    Expressing that with einsum:

    In [19]: np.einsum('ij,j,lj',A,b,A)
    Out[19]: 
    array([[  784,  1904,  3024],
           [ 1904,  4816,  7728],
           [ 3024,  7728, 12432]])
    

    Some comparative times:

    In [20]: timeit np.einsum('ij,j,lj',A,b,A)
    10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
    15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    In [22]: timeit A@B@A.T
    7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    In [23]: timeit A*b@A.T
    8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    

    For einsum the 'condensed' version is faster. With matmul the full diagonal is marginally faster.

    But with large arrays where creating a full B might a problem, using b might be faster. Also it's been observed in other SO that iterations on smaller arrays can be faster, due to better memory handling.

    np.array([A[i,:]*b@A.T for i in range(3)])