Search code examples
pythonnumpyscipy

How can I divide these two sparse matrices together?


I am trying to move dense matrix operations to be sparse. I was using numpy broadcasting to divide an array of shape (432,) to (591, 432) when they were dense, but how can do I this with sparse matrices?

<591x432 sparse matrix of type '<class 'numpy.int64'>'
    with 3876 stored elements in Compressed Sparse Column format>


<1x432 sparse matrix of type '<class 'numpy.int64'>'
    with 432 stored elements in COOrdinate format>

When I try with this dummy data below...

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

matrix = CountVectorizer().fit_transform(raw_documents=["test sentence.", "test sent 2.").T
max_w = np.max(matrix, axis=0)
matrix / max_w

I get ValueError: inconsistent shapes. How can I divide these ?


Solution

  • If you really want to, you can divide by multiplying by the reciprocal.

    import numpy as np
    from scipy.sparse import csc_matrix, coo_matrix
    A = csc_matrix([[3, 4], [5, 6]])
    B = A.max(axis=0)
    res = A.multiply(B.power(-1.))
    ref = A/B.todense()
    np.allclose(res.todense(), ref)  # True
    

    But in your case, there may not be a speed advantage compared to dividing by B.todense().

    import numpy as np
    from scipy.sparse import csc_matrix, coo_matrix
    rng = np.random.default_rng(452349345693456)
    
    # generate arrays like yours
    shape = (591, 432)
    nnz = 3876
    A = rng.random(size=shape)
    b = np.partition(A.ravel(), nnz)[nnz]
    A[A >= b] = 0
    A = csc_matrix(A)
    assert A.nnz == nnz
    B = A.max(axis=0)
    
    # compare solutions
    res = A.multiply(B.power(-1.))
    ref = A/B.todense()
    np.allclose(res.todense(), ref)  # True
    
    %timeit A.multiply(B.power(-1.))
    # 1.3 ms ± 734 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit A/B.todense()
    # 306 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)