There are some posts on SO discussing sparse matrix multiplication performance, but they don't seem to answer my question here.
Here is the benchmark code,
# First, construct a space matrix
In [1]: from scipy.sparse import dok_matrix
In [2]: M = dok_matrix((100, (1<<32)-1), dtype=np.float32)
In [3]: rows, cols = M.shape
In [5]: js = np.random.randint(0, (1<<32)-1, size=100)
In [6]: for i in range(rows):
...: for j in js:
...: M[i,j] = 1.0
...:
# Check out sparsity
In [7]: M.shape
Out[7]: (100, 4294967295)
In [8]: M.count_nonzero()
Out[8]: 10000
# Test csr dot performance, 36.3 seconds
In [9]: csr = M.tocsr()
In [10]: %time csr.dot(csr.T)
CPU times: user 36.3 s, sys: 1min 1s, total: 1min 37s
Wall time: 1min 46s
Out[10]:
<100x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
The above csr.dot
costs 36.3s, which is quite long IMHO.
In order to speed up, I coded up a naive for-loop dot function as follows,
def lil_matmul_transposeB(A, B):
rows_a, cols_a = A.shape
rows_b, cols_b = B.shape
assert cols_a == cols_b
C = np.zeros((rows_a, rows_b))
for ra in range(rows_a):
cols_a = A.rows[ra]
data_a = A.data[ra]
for i, ca in enumerate(cols_a):
xa = data_a[i]
for rb in range(rows_b):
cols_b = B.rows[rb]
data_b = B.data[rb]
pos = bs(cols_b, ca)
if pos!=-1:
C[ra,rb] += data_b[pos] * xa
return C
# Test dot performance in LiL format,
In [25]: lil = M.tolil()
In [26]: %time A = F.lil_matmul_transposeB(lil, lil)
CPU times: user 1.26 s, sys: 2.07 ms, total: 1.26 s
Wall time: 1.26 s
The above function only costs 1.26s, much faster than the built-in csr.dot
.
So I wonder if I made some mistakes here to do the sparse matrix multiplication?
That very large 2nd dimension is giving problems, even though the sparsity is quite small.
In [12]: Mr = M.tocsr()
In [20]: Mr
Out[20]:
<100x4294967295 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Row format>
Transpose just turns the csr
into csc
, without changing the arrays. That indptr
for both is just (101,).
In [21]: Mr.T
Out[21]:
<4294967295x100 sparse matrix of type '<class 'numpy.float32'>'
with 10000 stored elements in Compressed Sparse Column format>
But when I do [email protected]
, I get an error when it tries to convert that Mr.T
to `csr. That is, the multiplication requires the same format:
In [22]: Mr.T.tocsr()
Traceback (most recent call last):
File "<ipython-input-22-a376906f557e>", line 1, in <module>
Mr.T.tocsr()
File "/usr/local/lib/python3.8/dist-packages/scipy/sparse/csc.py", line 138, in tocsr
indptr = np.empty(M + 1, dtype=idx_dtype)
MemoryError: Unable to allocate 32.0 GiB for an array with shape (4294967296,) and data type int64
It's trying to make a matrix with a indptr
that's (4294967296,) long. On my limited RAM machine that produces an error. On your's it must be hitting some sort of memory management/swap task that slowing it way down.
So it's the extreme dimension that's making this slow even though the nnz
is small.