Search code examples
pythonnumpymatrixscipysparse-matrix

Scipy sparse matrix assignment using only stored elements


I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):

globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid

where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.

The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.

edit: In response to the comments below, here is a example to illustrate what I mean:

First setup the matrices:

M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

m.todense()
matrix([[ 1.,  0.,  0.],
    [ 0.,  1.,  0.],
    [ 0.,  0.,  1.]])

Then assign:

M[1:4, 1:4] = m

Now the result is:

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  0.,  0.,  2.],
    [ 2.,  0.,  1.,  0.,  2.],
    [ 2.,  0.,  0.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

Whereas I need the result to be:

matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  2.,  2.,  2.],
    [ 2.,  2.,  1.,  2.,  2.],
    [ 2.,  2.,  2.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

Solution

  • Should this line

    The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

    be changed to?

    The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

    Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.

    Let's try this with real matrices.

    In [13]: M=sparse.lil_matrix((10,10))
    In [14]: m=sparse.eye(3)
    In [15]: M[4:7,5:8]=m
    In [16]: m
    Out[16]: 
    <3x3 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements (1 diagonals) in DIAgonal format>
    In [17]: M
    Out[17]: 
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in LInked List format>
    In [18]: M.data
    Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
    In [19]: M.rows
    Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)
    

    M does not have any unnecessary 0's.

    If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them

    M.tocsr().tolil()
    

    csr format also has an inplace .eliminate_zeros() method.


    So your concern is with over writing the nonzeros of the target array.

    With dense arrays, the use of nonzero (or where) takes care of this:

    In [87]: X=np.ones((10,10),int)*2
    In [88]: y=np.eye(3)
    In [89]: I,J=np.nonzero(y)
    In [90]: X[I+3,J+2]=y[I,J]
    In [91]: X
    Out[91]: 
    array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
    

    Trying the sparse equivalent:

    In [92]: M=sparse.lil_matrix(X)
    In [93]: M
    Out[93]: 
    <10x10 sparse matrix of type '<class 'numpy.int32'>'
        with 100 stored elements in LInked List format>
    In [94]: m=sparse.coo_matrix(y)
    In [95]: m
    Out[95]: 
    <3x3 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in COOrdinate format>
    In [96]: I,J=np.nonzero(m)
    In [97]: I
    Out[97]: array([0, 1, 2], dtype=int32)
    In [98]: J
    Out[98]: array([0, 1, 2], dtype=int32)
    In [99]: M[I+3,J+2]=m[I,J]
    ...
    TypeError: 'coo_matrix' object is not subscriptable
    

    I could have used the sparse matrix own nonzero.

    In [106]: I,J=m.nonzero()
    

    For coo format, this is the same as

    In [109]: I,J=m.row, m.col 
    

    In which case I can also use the data attribute:

    In [100]: M[I+3,J+2]=m.data
    In [101]: M.A
    Out[101]: 
    array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)
    

    The code for m.nonzero may be instructive

        A = self.tocoo()
        nz_mask = A.data != 0
        return (A.row[nz_mask],A.col[nz_mask])
    

    So you need to be careful to make sure that index and data attributes of the sparse matrix match.

    And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.

    Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.