Search code examples
pythonscipysparse-matrix

Assigning maxtrix from scipy.sparse.identity to csr_matrix


I want to assign a large scale scipy.sparse.identity to a slice of scipy.sparse.csr_matrix but am failing to do so. In this case, m = 25000000 and p=3. Tc_temp is the csr_matrix of size 25000000 x 75000000.

Tc_temp = csr_matrix((m, p * m))
Tc_temp[0: m, np.arange(j, p * m + j, p)] = identity(m, format='csr')

The error traceback I get is:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\_index.py", line 116, in __setitem__
    self._set_arrayXarray_sparse(i, j, x)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 816, in _set_arrayXarray_sparse
    self._zero_many(*self._swap((row, col)))
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 932, in _zero_many
    i, j, M, N = self._prepare_indices(i, j)
  File "C:\Users\kusari\Miniconda3\envs\cvxpy_env\lib\site-packages\scipy\sparse\compressed.py", line 882, in _prepare_indices
    i = np.array(i, dtype=self.indices.dtype, copy=False, ndmin=1).ravel()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 233. GiB for an array with shape (62500000000,) and data type int32

The sparse.identity is somehow getting converted to dense matrix.


Solution

  • Let's examine the action for a smaller matrix:

    The identity - in coo format:

    In [67]: I = sparse.identity(10,format='coo')
    In [68]: I.row
    Out[68]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
    In [69]: I.col
    Out[69]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
    

    The "blank" csr:

    In [70]: M = sparse.csr_matrix((10,30))
    In [71]: M.indptr
    Out[71]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
    In [72]: M.indices
    Out[72]: array([], dtype=int32)
    

    The assignment. I'm using slice notation here rather than your arange, but the effect is the same (even in timings):

    In [73]: M[0:10, 0:30:3] = I
    /usr/local/lib/python3.8/dist-packages/scipy/sparse/_index.py:116: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
      self._set_arrayXarray_sparse(i, j, x)
    

    The resulting matrix:

    In [74]: M.indptr
    Out[74]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int32)
    In [75]: M.indices
    Out[75]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)
    

    And look at the coresponding coo attributes:

    In [76]: M.tocoo().row
    Out[76]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
    In [77]: M.tocoo().col
    Out[77]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27], dtype=int32)
    

    The row is the same as for I, while the col is just your arange indexing:

    In [78]: np.arange(0,30,3)
    Out[78]: array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])
    

    So you could create the same matrix with:

    M1 = sparse.csr_matrix((np.ones(10),(np.arange(10), np.arange(0,30,3))),(10,30))