Why is adding instances of scipy.sparse.dia_matrix so slow?

I'm writing a numerical code, where I'm using scipy.sparse.dia_matrix. My matrices are quite large (up to about 1000000 x 1000000), but very sparse. Sometimes tridiagonal, sometimes with a some more diagonals.

For various reasons, it is extremely convenient and clear from a coding point of view to just add together several of these matrices (of the same size, of course). However, I have found that adding these sparse matrices is very slow. The following example illustrates what I mean:

import numpy as np
from scipy.sparse import diags, dia_matrix

N = 100000

M1 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])
M2 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])
M3 = diags(diagonals = [np.random.random(N-1), np.random.random(N), np.random.random(N-1)], offsets = [-1, 0, 1])

def simple_add():
    M = M1 + M2 + M3
    
def complicated_add():
    M_ = dia_matrix((N, N))
    for d in [-1, 0, 1]:
        M_.setdiag(M1.diagonal(d) + M2.diagonal(d) + M3.diagonal(d), d)

%timeit simple_add()

%timeit complicated_add()

The output of the timing is:

16.9 ms ± 730 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
959 µs ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I don't understand why adding the matrices together with the + operator is 17 times slower than creating an empty digagonal matrix and explicitly setting the diagonals. Is there anything I can do to speed this up? I would much prefer to keep the simpler expression with the + operator, as it's far more readable, but not at the expense of an order of magnitude increase in computational time.

Update:

I proposed a change in Scipy that would make addition of two instances of dia_matrix faster, and after a bit of discussion I submitted a pull request to Scipy, which has now been merged. So in the future, adding two instances of dia_matrix will no longer convert to csr_matrix.

https://github.com/scipy/scipy/pull/14004

Solution

The implementation of _add_sparse(self, other) is return self.tocsr()._add_sparse(other). The extra time is to turn it into a CSR matrix (which has a C extension for addition).

Could you create a sparse matrix that does what you want? Probably.

from scipy.sparse import dia_matrix, isspmatrix_dia

class dia_matrix_adder(dia_matrix):

    def __add__(self, other):

        if not isspmatrix_dia(other):
            return super(dia_matrix_adder, self).__add__(other)

        M_ = dia_matrix((self.shape[0], self.shape[1]))

        for d in [-1, 0, 1]:
            M_.setdiag(self.diagonal(d) + other.diagonal(d), d)

        return M_

I would probably not do that and just write yourself a function:

def add_dia_matrix(*mats):

    if len(mats) == 1:
        return mats[0]

    M_ = dia_matrix((mats[0].shape[0], mats[0].shape[1]))

    for d in [-1, 0, 1]:
        M_diag = mats[0].diagonal(d).copy()

        for i in range(1, len(mats)):
            M_diag += mats[i].diagonal(d)

        M_.setdiag(M_diag, d)

    return M_

This should be as readable as a bunch of + without having to deal with a new class.

%timeit simple_add()
30.3 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit complicated_add()
1.28 ms ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit add_dia_matrix(M1, M2, M3)
1.22 ms ± 4.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)