Search code examples
pythonmemoryscipysparse-matrix

Why is scipy sparse matrix memory usage indifferent of the number of elements in the matrix?


I have two scipy matrices 'a' and 'b' with boolean values in them. 'a' is way bigger than 'b': 765565 values against just 3 values.

In [211]: a
Out[211]: <388839x8455 sparse matrix of type '<class 'numpy.bool_'>'
           with 765565 stored elements in Compressed Sparse Row format>
In [212]: b
Out[212]: <5x3 sparse matrix of type '<class 'numpy.bool_'>'
           with 3 stored elements in Compressed Sparse Row format>

But when I check their sizes in terms of memory usage, I see that they are both just 56 bytes:

In [213]: from sys import getsizeof
          'Size of a: {}. Size of b: {}'.format(getsizeof(a), getsizeof(b))
Out[213]: 'Size of a: 56. Size of b: 56'

How come these matrices' sizes are the same, while matrix 'a' has to store over 200 thousand times more values than matrix 'b'?


Solution

  • Here is a small demo:

    from scipy import sparse
    
    M = sparse.random(10**4, 10**3, .001, 'csr')
    
    def sparse_memory_usage(mat):
        try:
            return mat.data.nbytes + mat.indptr.nbytes + mat.indices.nbytes
        except AttributeError:
            return -1
    

    In [140]: sparse_memory_usage(np.random.rand(100, 100))
    Out[140]: -1
    
    In [141]: M = sparse.random(10**4, 10**3, .001, 'csr')
    
    In [142]: sparse_memory_usage(M)
    Out[142]: 160004
    
    In [144]: M
    Out[144]:
    <10000x1000 sparse matrix of type '<class 'numpy.float64'>'
            with 10000 stored elements in Compressed Sparse Row format>