Scipy sparse matrix exponentiation: a**16 is slower than aaaaaaaaaaaaaaa*a?

I'm doing a simple sparse matrix exponentiation, a**16, using scipy-0.17. (Note, not element-wise multiplication). However, on my machines (running Debian stable and Ubuntu LTS) this is ten times slower than using a for loop or doing something silly like a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a. This doesn't make sense, so I assume I'm doing something wrong, but what?

import scipy.sparse
from time import time

a=scipy.sparse.rand(2049,2049,.002)

print ("Trying exponentiation (a**16)")
t=time()
x=a**16
print (repr(x))
print ("Exponentiation took %f seconds\n" % (time()-t))

print ("Trying expansion (a*a*a*...*a*a)")
t=time()
y=a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
print (repr(y))
print ("Expansion took %f seconds\n" % (time()-t))

print ("Trying a for loop (z=z*a)")
t=time()
z=scipy.sparse.eye(2049)
for i in range(16):
    z=z*a
print (repr(z))
print ("Looping took %f seconds\n" % (time()-t))

# Sanity check, all approximately the same answer, right? 
assert (abs(x-z)>=1e-9).nnz==0
assert (abs(x-y)>=1e-9).nnz==0

Solution

@hpaulj's comment about the number of nonzeros is important. As you compute higher powers of a, the number of nonzero elements increases. For sparse matrices, the time to compute a matrix product increases with the number of nonzero elements.

The algorithm used to compute a**16 is, in effect:

a2 = a*a
a4 = a2*a2
a8 = a4*a4
a16 = a8*a8

Now look at the number of nonzero elements in those matrices for a = sparse.rand(2049, 2049, 0.002):

matrix      nnz    fraction nnz
  a        8396       0.0020
  a2      34325       0.0082
  a4     521593       0.1240
  a8    4029741       0.9598

In the last product, a16 = a8*a8, the factors are 96% nonzero. Computing that product using sparse matrix multiplication is slow. That last step takes up 97% of the time to compute a**16.

On the other hand, when you compute a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a, a sparse matrix multiplication is performed 15 times, but one of the factors in each product always has a small fraction (0.002) of nonzero values, so each product can be performed reasonably efficiently.

This suggests that there is probably an optimal strategy for computing the product, balancing the number of multiplications against the sparsity of the factors. For example, computing a2 = a*a; a16 = a2*a2*a2*a2*a2*a2*a2*a2 is faster than a16 = a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a:

In [232]: %timeit a2 = a*a; a4 = a2*a2; a8 = a4*a4; a16 = a8*a8
14.4 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [233]: %timeit a16 = a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
1.77 s ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [234]: %timeit a2 = a*a; a16 = a2*a2*a2*a2*a2*a2*a2*a2
1.42 s ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Or, since you know the final result will be dense, switch to standard numpy arrays, either from the beginning or at some intermediate step at which dense matrix multiplication is more efficient than sparse matrix multiplication.

Scipy sparse matrix exponentiation: a**16 is slower than a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a?

Scipy sparse matrix exponentiation: a**16 is slower than aaaaaaaaaaaaaaa*a?