Search code examples
pythonscipysparse-matrix

Why is adding instances of scipy.sparse.dia_matrix so slow?


I'm writing a numerical code, where I'm using scipy.sparse.dia_matrix. My matrices are quite large (up to about 1000000 x 1000000), but very sparse. Sometimes tridiagonal, sometimes with a some more diagonals.

For various reasons, it is extremely convenient and clear from a coding point of view to just add together several of these matrices (of the same size, of course). However, I have found that adding these sparse matrices is very slow. The following example illustrates what I mean:

import numpy as np
from scipy.sparse import diags, dia_matrix

N = 100000

M1 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])
M2 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])
M3 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])

def simple_add():
    M = M1 + M2 + M3
    
def complicated_add():
    M_ = dia_matrix((N, N))
    for d in [-1, 0, 1]:
        M_.setdiag(M1.diagonal(d) + M2.diagonal(d) + M3.diagonal(d), d)

%timeit simple_add()

%timeit complicated_add()

The output of the timing is:

16.9 ms ± 730 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
959 µs ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I don't understand why adding the matrices together with the + operator is 17 times slower than creating an empty digagonal matrix and explicitly setting the diagonals. Is there anything I can do to speed this up? I would much prefer to keep the simpler expression with the + operator, as it's far more readable, but not at the expense of an order of magnitude increase in computational time.

Update:

I proposed a change in Scipy that would make addition of two instances of dia_matrix faster, and after a bit of discussion I submitted a pull request to Scipy, which has now been merged. So in the future, adding two instances of dia_matrix will no longer convert to csr_matrix.

https://github.com/scipy/scipy/pull/14004


Solution

  • The implementation of _add_sparse(self, other) is return self.tocsr()._add_sparse(other). The extra time is to turn it into a CSR matrix (which has a C extension for addition).

    Could you create a sparse matrix that does what you want? Probably.

    from scipy.sparse import dia_matrix, isspmatrix_dia
    
    class dia_matrix_adder(dia_matrix):
    
        def __add__(self, other):
    
            if not isspmatrix_dia(other):
                return super(dia_matrix_adder, self).__add__(other)
    
            M_ = dia_matrix((self.shape[0], self.shape[1]))
    
            for d in [-1, 0, 1]:
                M_.setdiag(self.diagonal(d) + other.diagonal(d), d)
    
            return M_
    

    I would probably not do that and just write yourself a function:

    def add_dia_matrix(*mats):
    
        if len(mats) == 1:
            return mats[0]
    
        M_ = dia_matrix((mats[0].shape[0], mats[0].shape[1]))
    
        for d in [-1, 0, 1]:
            M_diag = mats[0].diagonal(d).copy()
    
            for i in range(1, len(mats)):
                M_diag += mats[i].diagonal(d)
    
            M_.setdiag(M_diag, d)
    
        return M_
    

    This should be as readable as a bunch of + without having to deal with a new class.

    %timeit simple_add()
    30.3 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %timeit complicated_add()
    1.28 ms ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit add_dia_matrix(M1, M2, M3)
    1.22 ms ± 4.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)