Search code examples
numpyscipysparse-matrix

What is the default indexing type of scipy.sparse.csr_matrix?


scipy.sparse.csr_matrix has data, indices, and indptr attributes.

What are the default dtype of indices and indptr?

For numpy, the default indexing type is numpy.intp, but that doesn't match the dtype of indices of a scipy.sparse.csr_matrix.

Documentation of scipy.sparse.csr_matrix

For my laptop:

import numpy as np
import scipy.sparse as ss
a = ss.csr_matrix(np.arange(12).reshape(3,4), dtype=float)
print(a.indices.dtype)
print(np.intp)

Result:

int32
<class 'numpy.int64'>

Solution

  • sparse.compressed._cs_matrix __init__ has

                idx_dtype = get_index_dtype(maxval=max(M,N))
                self.data = np.zeros(0, getdtype(dtype, default=float))
                self.indices = np.zeros(0, idx_dtype)
                self.indptr = np.zeros(self._swap((M,N))[0] + 1, dtype=idx_dtype)
    

    sparse.compressed.get_index_dtype chooses between np.int32 and np.int64 depending on the shape of the matrix. If too big to index with 32 it uses 64. But check that function for details.


    In [789]:  np.iinfo(np.int32).max
    Out[789]: 2147483647
    In [790]: a=sparse.csr_matrix((1,2147483646))
    In [791]: a
    Out[791]: 
    <1x2147483646 sparse matrix of type '<class 'numpy.float64'>'
        with 0 stored elements in Compressed Sparse Row format>
    In [792]: a.indices.dtype
    Out[792]: dtype('int32')
    In [793]: a=sparse.csr_matrix((1,2147483648))
    In [794]: a.indices.dtype
    Out[794]: dtype('int64')