Search code examples
pythonmatrixscipysparse-matrix

Division of sparse matrix


I have a scipy.sparse matrix with 45671x45671 elements. In this matrix, some rows contain only '0' value.

My question is, how to divide each row values by the row sum. Obviously, with for loop it's work, but I look for an efficient method...

I already tried :

  • matrix / matrix.sum(1) but I have MemoryError issue.
  • matrix / scs.csc_matrix((matrix.sum(axis=1))) but ValueError: inconsistent shapes
  • Other wacky things...

Moreover, I want to skip rows with only '0' values.

So, if you have any solution...

Thank you in advance !


Solution

  • I have an M hanging around:

    In [241]: M
    Out[241]: 
    <6x3 sparse matrix of type '<class 'numpy.uint8'>'
        with 6 stored elements in Compressed Sparse Row format>
    In [242]: M.A
    Out[242]: 
    array([[1, 0, 0],
           [0, 1, 0],
           [0, 0, 1],
           [0, 1, 0],
           [0, 0, 1],
           [1, 0, 0]], dtype=uint8)
    In [243]: M.sum(1)            # dense matrix
    Out[243]: 
    matrix([[1],
            [1],
            [1],
            [1],
            [1],
            [1]], dtype=uint32)
    In [244]: M/M.sum(1)      # dense matrix - full size of M
    Out[244]: 
    matrix([[ 1.,  0.,  0.],
            [ 0.,  1.,  0.],
            [ 0.,  0.,  1.],
            [ 0.,  1.,  0.],
            [ 0.,  0.,  1.],
            [ 1.,  0.,  0.]])
    

    That will explain the memory error - if M is so large that M.A produces a memory error.


    In [262]: S = sparse.csr_matrix(M.sum(1))
    In [263]: S.shape
    Out[263]: (6, 1)
    In [264]: M.shape
    Out[264]: (6, 3)
    In [265]: M/S
    ....
    ValueError: inconsistent shapes
    

    I'm not entirely sure what is going on here.

    Element wise multiplication works

    In [266]: M.multiply(S)
    Out[266]: 
    <6x3 sparse matrix of type '<class 'numpy.uint32'>'
        with 6 stored elements in Compressed Sparse Row format>
    

    So it should work if I construct S as S = sparse.csr_matrix(1/M.sum(1))

    If some of the rows sum to zero, you have a division by zero problem.


    If I modify M to have 0 row

    In [283]: M.A
    Out[283]: 
    array([[1, 0, 0],
           [0, 1, 0],
           [0, 0, 0],
           [0, 1, 0],
           [0, 0, 1],
           [1, 0, 0]], dtype=uint8)
    In [284]: S = sparse.csr_matrix(1/M.sum(1))
    /usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
      #!/usr/bin/python3
    In [285]: S.A
    Out[285]: 
    array([[  1.],
           [  1.],
           [ inf],
           [  1.],
           [  1.],
           [  1.]])
    In [286]: M.multiply(S)
    Out[286]: 
    <6x3 sparse matrix of type '<class 'numpy.float64'>'
        with 5 stored elements in Compressed Sparse Row format>
    In [287]: _.A
    Out[287]: 
    array([[ 1.,  0.,  0.],
           [ 0.,  1.,  0.],
           [ 0.,  0.,  0.],
           [ 0.,  1.,  0.],
           [ 0.,  0.,  1.],
           [ 1.,  0.,  0.]])
    

    This isn't the best M to demonstrate this on, but it suggests a useful approach. The row sum will be dense, so you can clean up its inverse using the usual dense array approaches.