I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):
globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid
where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.
edit: In response to the comments below, here is a example to illustrate what I mean:
First setup the matrices:
M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]])
m.todense()
matrix([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Then assign:
M[1:4, 1:4] = m
Now the result is:
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 0., 0., 2.],
[ 2., 0., 1., 0., 2.],
[ 2., 0., 0., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Whereas I need the result to be:
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 2., 2., 2.],
[ 2., 2., 1., 2., 2.],
[ 2., 2., 2., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Should this line
The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
be changed to?
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d]
indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.
Let's try this with real matrices.
In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)
M
does not have any unnecessary 0's.
If there are unnecessary 0's in a sparse matrix, a round trip to csr
format should take care of them
M.tocsr().tolil()
csr
format also has an inplace .eliminate_zeros()
method.
So your concern is with over writing the nonzeros of the target array.
With dense arrays, the use of nonzero
(or where
) takes care of this:
In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
Trying the sparse equivalent:
In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable
I could have used the sparse matrix own nonzero
.
In [106]: I,J=m.nonzero()
For coo
format, this is the same as
In [109]: I,J=m.row, m.col
In which case I can also use the data
attribute:
In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)
The code for m.nonzero
may be instructive
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
So you need to be careful to make sure that index and data attributes of the sparse matrix match.
And also pay attention as to which sparse formats allow indexing. lil
is good for changing values. csr
allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo
has this nice pairing of indices and data, but doesn't allow indexing.
Another subtle point: in constructing a coo
you may repeat coordinates. When converted to csr
format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local
matrix was constructed, and know whether it is a 'clean' representation of the data.