I'm trying to fill parts of sparse lil matrix from another with this code:
adj_mat = sp.dok_matrix((self.n_users + self.m_items, self.n_users + self.m_items), dtype=np.float32)
adj_mat = adj_mat.tolil()
R = self.UserItemNet.tolil()
When I try to fill that with this code:
adj_mat[:self.n_users, self.n_users:] = R
adj_mat[self.n_users:, :self.n_users] = R.T
My process is killed, because of exceeding RAM memory (240Gi). My dataset is big:
adj_mat
:
<1374194x1374194 sparse matrix of type '<class 'numpy.float32'>'
with 0 stored elements in List of Lists format>
R
:
<940696x433498 sparse matrix of type '<class 'numpy.float64'>'
with 24053124 stored elements in List of Lists format>
self.n_users
= 940696
Is there some more efficient way to fill list of lists like that?
Best regards
Here's a bmat
approach to constructing your composite matrix (assuming I've deduced the correct layout):
Make a matrix. bmat
will combine coo
attributes, so lets start with that:
In [389]: R = sparse.coo_matrix([[0,1],[2,0],[0,0],[3,4]])
In [390]: R
Out[390]:
<4x2 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in COOrdinate format>
In [391]: R.A
Out[391]:
array([[0, 1],
[2, 0],
[0, 0],
[3, 4]])
And define the 'blank' filler matrices:
In [392]: Z1 = sparse.coo_matrix((4,4),dtype=int)
In [393]: Z2 = sparse.coo_matrix((2,2),dtype=int)
Now join them:
In [394]: M = sparse.bmat([[Z1,R],[R.T,Z2]])
In [395]: M
Out[395]:
<6x6 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
In [396]: M.A
Out[396]:
array([[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 4],
[0, 2, 0, 3, 0, 0],
[1, 0, 0, 4, 0, 0]])
This will avoid the densification that the default assignment apparently does.
block_diag
uses the other diagonal:
In [398]: sparse.block_diag([R,R.T])
Out[398]:
<6x6 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
In [399]: _.A
Out[399]:
array([[0, 1, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[3, 4, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 3],
[0, 0, 1, 0, 0, 4]])
This block_diag
code would be a good model if you want to write your own version. The v1.6 release notes claims it is much more efficient than the previous version (which I believe worked through bmat
).
In response to @CJR's comments about lil
memory inefficiency, I looked at some alternatives.
Make a large coo
matrix:
In [10]: M=sparse.random(10000,10000, .2, 'coo')
Conversion to lil
is slower than conversion to csr
:
In [11]: timeit M.tocsr()
1.43 s ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [12]: timeit M.tolil()
3.69 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So how does the block assignment compare? (using a much smaller block than the OP):
In [13]: Ml=M.tolil(); Mr=M.tocsr()
In [14]: timeit Ml[:100,:100]=np.eye(100)
1.07 ms ± 341 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15]: timeit Mr[:100,:100]=np.eye(100)
/usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
14.1 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
csr
assignment is quite a bit slower, while coo
assignment doesn't even work.
In [16]: timeit M[:100,:100]=np.eye(100)
Traceback (most recent call last):
....
TypeError: 'coo_matrix' object does not support item assignment
So if you must to block assignment, lil
isn't a bad choice, provided the blocks aren't too big. But constructing the matrix directly from blocks via bmat
is a better. As the lil
docs say, use coo
if you are constructing large matrices.