I want to assign a large scale scipy.sparse.identity
to a slice of scipy.sparse.csr_matrix
but am failing to do so. In this case, m = 25000000
and p=3
. Tc_temp
is the csr_matrix
of size 25000000 x 75000000
.
Tc_temp = csr_matrix((m, p * m))
Tc_temp[0: m, np.arange(j, p * m + j, p)] = identity(m, format='csr')
The error traceback I get is:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\_index.py", line 116, in __setitem__
self._set_arrayXarray_sparse(i, j, x)
File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 816, in _set_arrayXarray_sparse
self._zero_many(*self._swap((row, col)))
File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 932, in _zero_many
i, j, M, N = self._prepare_indices(i, j)
File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 882, in _prepare_indices
i = np.array(i, dtype=self.indices.dtype, copy=False, ndmin=1).ravel()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 233. GiB for an array with shape (62500000000,) and data type int32
The sparse.identity
is somehow getting converted to dense matrix.
Let's examine the action for a smaller matrix:
The identity - in coo format:
In [67]: I = sparse.identity(10,format='coo')
In [68]: I.row
Out[68]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [69]: I.col
Out[69]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
The "blank" csr:
In [70]: M = sparse.csr_matrix((10,30))
In [71]: M.indptr
Out[71]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
In [72]: M.indices
Out[72]: array([], dtype=int32)
The assignment. I'm using slice notation here rather than your arange
, but the effect is the same (even in timings):
In [73]: M[0:10, 0:30:3] = I
/usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:116: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray_sparse(i, j, x)
The resulting matrix:
In [74]: M.indptr
Out[74]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=int32)
In [75]: M.indices
Out[75]: array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], dtype=int32)
And look at the coresponding coo
attributes:
In [76]: M.tocoo().row
Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
In [77]: M.tocoo().col
Out[77]: array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], dtype=int32)
The row
is the same as for I
, while the col
is just your arange
indexing:
In [78]: np.arange(0,30,3)
Out[78]: array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27])
So you could create the same matrix with:
M1 = sparse.csr_matrix((np.ones(10),(np.arange(10), np.arange(0,30,3))),(10,30))